Data plays a central role in the life of every people and enterprise. Procurement data is especially important when we want to control government spending. That is why collecting and sharing this information about the public money spending of the governmental institutions is very important for the society.
These collected datasets are very useful to investigative journalists, as by their articles the public could have a deeper insight in the financial functioning of state institutions, like hospitals, councils, and other government departments.
Our successful cooperation with a research team of the CEU is a great example of this.
The aim of the project was to make the Hungarian procurement data accessible in a structured form. The collected data has been published in searchable form on the kozbeszerzes.ceu.hu portal.
This massive and successful project used the unique solutions of Precognox’s Text Analytics System and was fulfilled for the order of the Central European University. We cooperated with Adam Szeidl and Miklos Koren, who are both professors of the Department of Economics and Business.
Procurement data has been released in unstructured documents by the government, so it was hard to get useful information from the texts. The archive of Közbeszerzési Értesítő – consisting of over 140,000 text bulletins between 1997 and 2013 – contained the required information – e.g. the announcer, the winner and the amount of the procurements – in a semi-structured form only.
Precognox has developed a special text mining solution – TAS Data Collector – that extracts the relevant information from text files and stores them in a structured database which can be analyzed by researchers.
The site is simple and functional, and it is even robot friendly, so one can automatically harvest procurement data using kozbeszerzes.ceu.hu.
„We were looking for a company that was able to build a structured database from text files, meeting all data-quality criteria in a short time-span. That is why we chose Precognox.
Precognox was already pleasant to cooperate with during the specification of the task and signing the contract. Following a personal needs analysis, together, we designed the schematics of the future database, the value and the method to monitor data quality. After rendering the documents into a uniform scheme we asked them to validate a number of data fields, i.e. amounts, dates, company names and addresses.
The product – shipped on deadline – exceeded our expectations. The accuracy of each data field was measured between 89 and 95 per cent, i.e. the value of correspondence between the ones entered by our researchers and the ones found and validated by the algorithm of Precognox – in a random pattern of a hundred items. We had never thought such accuracy was possible by automatic processing only.
They answered our further questions fast and flexibly, as a real, agile team. We would be happy to rely on their services in the future, too.” – said Miklós Koren, professor of CEU.
The above mentioned cooperation has created the technological background of the transparent spending of public money. The presentation of public procurement data in a structured form has not only great importance to data journalists, but also provides the public the opportunity to gain insight into the spending of public funds through the publications they produce. This makes it understandable how important this project is.