Text analytics

From Wikipedia, the free encyclopedia

The term text analytics describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques. The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems. These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.

A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics use natural language processing techniques that focus on specialized domains.

Typical subtasks are:

Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
Relationship Extraction: extraction of named relationships between entities in text

[edit] See also

[edit] Software and Applications

[edit] Commercial Software and Applications

AeroText - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
AlchemyAPI - web-based text analytics API: document categorization, language identification, term extraction, named entities, etc. Multi-lingual support.
IBM LanguageWare is the IBM suite for Text Analytics (Tools and Runtime).
Infonic provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used within algorithmic trading systems by several major trading banks. Infonic also develops unique document summarization and textual navigation technologies that aid in Knowledge Management.
SPSS - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
Execware - publisher of Reason, PC program with patented automated data tables for visually detecting connections - text/numeric data about anything, i.e. objects, events, people, places, or anything else.

[edit] Open-Source Software and Applications

GATE - General Architecture for Text Engineering, an open-source toolbox for natural language processing
UIMA - Unstructured Information Management Architecture
RapidMiner - open-source software for data and text mining

[edit] External links

Text analytics

From Wikipedia, the free encyclopedia

Contents

[edit] See also

[edit] Software and Applications

[edit] Commercial Software and Applications

[edit] Open-Source Software and Applications

[edit] External links

Views

Personal tools

Navigation

Search

Interaction

Toolbox

Languages