Entity extraction

Entity extraction provides a solution for the automated recognition and grouping of entities in text documents, which also serves as a basis for additional procedures such as sentiment analysis, topic extraction, or other techniques related to natural language processing (NLP). In the process, different entity types (personal names, organizations, events, places, dates, and other main and subtypes) are extracted from the text bodies.

Importance of entity extraction

Extracting and grouping entities in textual content is of paramount importance for many areas. Entity recognition facilitates the implementation of PR, HR and marketing tasks, and also plays an important role in due diligence, intelligence and forecasting processes. Extraction of entities supports also the realization of advanced and sophisticated enterprise search.

 

Entity extraction and enterprise search

Entity extraction supports the effective usage of search engines, as TAS Enterprise Search, developed by Precognox.
The extracted entity groups may function as facets (filtering options) to narrow down the list of the results. In addition, the entities recognized in the text content can be shown grouped by entity types by the result.

Contact us

Would you like to make your searches more efficient with an integrated entity extraction solution or to learn more about text analytics solutions of Basis Technology and Precognox? Write us or send a message using the contact form at the bottom of this page.

Endre Jofoldi
General Manager
endre.jofoldi@precognox.com

Fine-tuning of entity tags

Automatically extracted entities can function as tags. TAS Tagger tagging solution developed by Precognox, in addition to its many functions, enables the fine-tuning of these extracted entity tags.

Advanced entity extracting solution integrated

The collaboration between US-based Basis Technology and Precognox dates back years. As the official Hungarian reseller and product integrator of Basistech’s Rosette text analytics platform in Hungary, as a matter of course our company also applies these solutions in own products. Basistech’s entity extracting solution, Rosette Entity Extractor (REX) is integrated in TAS Tagger developed by Precognox.

In close cooperation

In addition to being the official system integrator and reseller of the Rosette API, as part of the collaboration Precognox has also participated in the development of Basistech’s text analytics solutions. An important milestone in the collaboration was a joint presentation of Basis Technology and Precognox at the embassy of the United Kingdom, Budapest (Hungary).

Rosette Entity Extractor (REX)

Entities (e.g., organizations, people, places, products, dates) are significant parts of texts. Basistech’s entity extraction solution, Rosette Entity Extractor (REX) is built on a flexible hybrid of statistical or deep neural network, exact match and pattern matching processors using these techniques in order to maximize the precision and recall for each entity type. Therefore the solution is extremely effective and is able to identify 29 entity types and more than 450 subtypes.

Trained by quality data

Rosette trains its models on a carefully curated corpus based on millions of news articles, social media content, and blog posts. The data is always annotated thoroughly by native speakers and the tags are cross-checked for consistency.

Sample images, source: Rosette product page

 

Try it out

Would you like to test the knowledge and effectiveness of Rosette Entity Extractor? Try the free demo. Just type or copy some text in one of the available languages.

Product highlights

  • 21 prebuilt language models
  • 29 entity types and 450+ subtype available out-of-the-box
  • entity linking to knowledge bases
  • coreference resolution
  • hybrid of techniques, including deep learning models
  • confidence scores for each result
  • Cloud or enterprise deployments
  • fast and scalable
  • industrial-strength support
  • active development with a minimum of six updates per year

Technical specification

Availability and platform support

Deployment availability:

  • Rosette Cloud
  • Rosette Server
  • Java SDK

Plugins and integration:

Bindings:

  • cURL
  • Python
  • PHP
  • Java
  • R
  • Ruby
  • C#
  • Node.js

Entity types

  • person
  • nationality
  • location
  • organization
  • product
  • language
  • title
  • URL
  • number
  • date
  • time
  • religion
  • phone
  • Lat/Long
  • ID Number
  • money
  • E-Mail
  • anatomy
  • Credit Card
  • activity
  • food
  • substance
  • disease
  • event
  • species
  • measure
  • MISC
  • distance
  • transport

Supported languages

  • Arabic
  • English
  • Italian
  • Portuguese
  • French
  • Japanese
  • German
  • Spanish
  • Russian
  • Chinese, Simplified
  • Chinese, Traditional
  • Hungarian
  • Pashto
  • Urdu
  • Dutch
  • Indonesian
  • Persian
  • Vietnamese
  • Korean
  • Hebrew
  • Malay
  • Swedish

Contact us

Would you like to make your searches more efficient with the world’s leading name matching solution or to learn more about text analytics solutions of Basis Technology and Precognox? Write us or send a message using the contact form below.

Endre Jofoldi
General Manager
endre.jofoldi@precognox.com