Platform for global regulatory information updates

Automated data retrieval and updates by more than 180 webcrawlers

With a huge and growing number of regulatory documents from an increasing number of data sources in a variety of languages, Tarius decided to develop a new and flexible online platform to manage their business model and to meet future requirements for performance and scalability.



Solution


The new platform was established on a Sitecore platform due to Sitecore's ability to scale and handle large volumes of data as well as its event model.


Using advanced web crawlers, the extensive regulatory content is retrieved from multiple data sources and stored in Sitecore CMS. During this process, automatic metadata tagging takes place, i.e. the web crawlers generate English metadata such as country of origin, language, subject area, etc. regardless of document language. Using Sitecore’s workflow, documents are sent to Tarius’ editors who verify and refine the metadata.


All documents are published from Sitecore CMS using Sitecore Search Solution, thus making them searchable for customers. As part of the publication, documents are classified automatically according to search term defined taxonomies. Depending on customer subscription type, the taxonomies define and limit the searches that customers can make and what documents they can see.


In individual customer portals, Sitecore Search Solution helps users find the information that is relevant to them in a fast and easy manner, for example by facetted search, equivalent to filtering by tagging relevant keywords.


Due to the English metadata, customers can search in English and find documents in any language.



Result


Tarius’ new content platform was launched in November 2012, providing Tarius’ customers with a powerful tool to keep abreast of rules within an exact subject area, a geographic areas, or right down to a legislative authority.


The 180 web crawlers that collect and screen content in different formats from more than 75 countries, regions and international organizations and in almost any spoken language, carry out scheduled database updates of about 500,000 XML documents between 1 and 20 MB. A web crawler typically examines 300-10,000 documents at the time with associated XML files, checking for updates.


This automated extraction of data takes place to update the more than 100 customers' individual portals, either periodically, several times a day or when new regulations or amendments are issued (event-based). Alert messages are transmitted based on defined areas of interest.The newest feature is a Sitecore-based webshop that offers purchase of a newsletter with real-time updates from the 'Scientific Advisory Committee'.


It is of continuous focus for Alpha Solutions to develop the solution to make daily work of the Tarius editors easier. This is done partly by adding more web crawlers, more sophisticated metadata tagging and higher automation by using rules and workflows in Sitecore.



Learn more about Sitecore Search Solution as well as our Sitecore capabilities.


Email klp@alpha-solutions.dk or call +45 2269 5960 to find out more.


Related Cases