Text analytics

From Wikipedia, the free encyclopedia

Jump to: navigation, search

The term text analytics describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques. The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems. These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.

A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics use natural language processing techniques that focus on specialized domains.

Typical subtasks are:

Contents

[edit] See also

[edit] Software and Applications

[edit] Commercial Software and Applications

  • AeroText - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
  • AlchemyAPI - web-based text analytics API: document categorization, language identification, term extraction, named entities, etc. Multi-lingual support.
  • IBM LanguageWare is the IBM suite for Text Analytics (Tools and Runtime).
  • Infonic provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used within algorithmic trading systems by several major trading banks. Infonic also develops unique document summarization and textual navigation technologies that aid in Knowledge Management.
  • SPSS - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
  • Execware - publisher of Reason, PC program with patented automated data tables for visually detecting connections - text/numeric data about anything, i.e. objects, events, people, places, or anything else.

[edit] Open-Source Software and Applications

  • GATE - General Architecture for Text Engineering, an open-source toolbox for natural language processing
  • UIMA - Unstructured Information Management Architecture
  • RapidMiner - open-source software for data and text mining

[edit] External links

Personal tools
Languages