Data Collector

A TAS Data Collector adatgyűjtő használatával lehetséges az interneten elérhető domain összes strukturálatlan és strukturált adatának összegyűjtése. Az összegyűjtött adatok felhasználhatóak akár nyers formában is, vagy hasznosíthatóak a TAS szöveganalitikai rendszer további szolgáltatásainak segítségével.

What is TAS Data Collector?

By TAS Data Collector the user can download unstructured data (textual content) from the Internet by structuring the content, making it accessible to other information systems, and suitable for further processing, analysis or visualization.
The content collected by the TAS Data Collector can be utilized immediately or can serve as a basis for text analysis workflows that can be implemented with other build-in modules of the TAS Platform.

Data collection workflow

  • data (textual content) of webpages (or subassemblies) specified by the customer are collected by the service
  • further steps (data cleaning, data enrichment, validation) are implemented under the supervision of our specialists
  • as a result, a structured database is created that can be used for further data processing (analysis, visualization) or serve as a basis for further text analytics solutions
  • providing and transferring the collected, properly formatted content to the customer (even through an authenticated, password protected channel)

Features of the TAS Data Collector

  • TAS Data Collector is able to extract the visible data, metadata (tags, picture description) or pagination from a website.
  • Sites, subpages, login-required pages, even hierarchical sites or pages with a slideshow component or with multilingual content also cause no problem for TAS Data Collector.
  • When data is recognized as hidden, we offer a screenshot solution (the original exact look of the data).
  • In some cases it is forbidden by robots.txt to collect data. We respect this; however, this data is also possible to collect.
  • We can extract texts from a lot of different documents and image formats (PDF, spreadsheet, diagram or image file formats).
  • We are prepared to produce and deliver any required output format, even ones that require software development.

Important! Please consider that we are not responsible for the further utilization of the collected data.

What can the collected content be used for?

  • research and development projects
  • new content and publications
  • service, information, thematic sites, blogs, public interest and open data portals
  • analyzes, statistics, visualizations
  • enterprise processes / operations, data backup
  • competitor and media monitoring
  • searchable databases
  • artificial intelligence, machine learning processes
  • data change monitoring

Appearance of TAS Data Collector

The TAS Data Collector GUI provides the ability to monitor the downloading stream. The appearance of the interface matches the corporate identity of the TAS Platform.

The interface provides information about:

  • resources overview: which are wired, how many records are received
  • the number of valid and broken records
  • overview of the total number of records
  • the date of the data collection

TAS Data Collector technical description (requirements, integration, open source software used)

Initial Resource Requirement (On Premise – For Onsite Installation):
x86_64 CPU with at least 4 cores
at least 16GB RAM
35GB hard drive (storage may increase in some cases)
64-bit Linux, Windows, or macOS – 64-bit JDK 1.8 or higher

Accessibility and platform support for developers

Cloud API – On Premise API – Java SDK available

Integration with other services

TAS Platform

Tableau
Rapid Miner
PowerBI
Google Data Studio
IBM SPSS

Learn more about TAS Data Collector and read the related Use Case on our TAS Product site. If you need a data collector solution, please contact us!

You must accept cookies to continue using this website. Learn more

Cookie settings are enabled on this site for the best user experience. If you use the website without changing the setting or if you click the "Accept" button, you agree to the use of cookies.

Close