Enterprise search
From Wikipedia, the free encyclopedia
Enterprise search is the practice of identifying and enabling specific content across the enterprise to be indexed, searched, and displayed to authorized users.
Contents |
[edit] Enterprise search summary
The term Enterprise search is used to describe the application of search technology to information within an organization. This is in contrast to the other two main type of horizontal search environment: web search and desktop search.
The major challenge faced by Enterprise search is the need to index documents from a variety of sources such as: file systems, intranets, document management systems, e-mail, and databases and then present a consolidated list of relevance ranked documents from these various sources. In addition, many applications require the integration of structured data as part of the search criteria and when presenting results back to the users. And of course access controls are vital if users are to be restricted to documents to which they are granted access by the various document repositories within the enterprise. These major challenges are unique to enterprise search.
[edit] Differences from web search
Beyond the difference in the kinds of materials being indexed, enterprise search systems also typically include functionality that is not associated with the mainstream web search engines. These include:
- Adapters to index content from a variety of repositories, such as databases and content management systems.
- Federated search, which consists of
(1) transforming a query and broadcasting it to a group of disparate databases or external content sources with the appropriate syntax, (2) merging the results collected from the databases, (3) presenting them in a succinct and unified format with minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort the merged result set.
- Entity extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
- Faceted search, a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information.
- Access control, usually in the form of an Access control list (ACL), is often required to restrict access to documents based on individual user identities. There are many types of access control mechanisms for different content sources making this a complex task to address comprehensively in an enterprise search environment.
- Text clustering, which groups the top several hundred search results into topics that are computed on the fly from the search-results descriptions, typically titles, excerpts (snippets), and meta-data. This technique lets users navigate the content by topic rather than by the meta-data that is used in faceting. Clustering compensates for the problem of incompatible meta-data across multiple enterprise repositories, which hinders the usefulness of faceting.
- User interfaces, which in web search are deliberately kept simple in order not to distract the user from clicking on ads, which generates the revenue. Although the business model for enterprise search could include showing ads, in practice this is not done. To enhance end user productivity, enterprise vendors continually experiment with rich UI functionality which occupies significant screen space, which would be problematic for web search.
[edit] Relevance factors for enterprise search
The factors that determine the relevance of search results within the context of an enterprise overlap with but are different from those that apply to web search. In general, enterprise search engines cannot take advantage of the rich link structure as is found on the web's hypertext content, because hyperlinking is largely absent within the enterprise. Algorithms like PageRank exploit hyperlink structure to assign authority to documents, and then use that authority as a query-independent relevance factor. In contrast, enterprises typically have to use other query-independent factors, such as a document's recency or popularity, along with query-dependent factors traditionally associated with information retrieval algorithms. Also, the rich functionality of enterprise search UIs, such as clustering and faceting, diminish reliance on ranking as the means to direct the user's attention.