Text analytics
From Wikipedia, the free encyclopedia
This article may need to be wikified to meet Wikipedia's quality standards. Please help by adding relevant internal links, or by improving the article's layout. (January 2008) |
Please help improve this article or section by expanding it. Further information might be found on the talk page. (January 2008) |
The introduction to this article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. |
The term text analytics describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques. The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems. These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.
A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics use natural language processing techniques that focus on specialized domains.
Typical subtasks are:
- Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
- Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
- Relationship Extraction: extraction of named relationships between entities in text
Contents |
[edit] See also
- Noisy text analytics
- Information extraction
- Computational linguistics
- Natural language processing
- Named entity recognition
- Identity resolution
- Text mining
- News analytics
[edit] Software and Applications
[edit] Commercial Software and Applications
- AeroText - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
- AlchemyAPI - web-based text analytics API: document categorization, language identification, term extraction, named entities, etc. Multi-lingual support.
- IBM LanguageWare is the IBM suite for Text Analytics (Tools and Runtime).
- Infonic provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used within algorithmic trading systems by several major trading banks. Infonic also develops unique document summarization and textual navigation technologies that aid in Knowledge Management.
- SPSS - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
- Execware - publisher of Reason, PC program with patented automated data tables for visually detecting connections - text/numeric data about anything, i.e. objects, events, people, places, or anything else.
[edit] Open-Source Software and Applications
- GATE - General Architecture for Text Engineering, an open-source toolbox for natural language processing
- UIMA - Unstructured Information Management Architecture
- RapidMiner - open-source software for data and text mining