User agent
From Wikipedia, the free encyclopedia
A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web. Other systems, such as Session Initiation Protocol (SIP), use the term user agent to refer to both end points of a phone call, server and client.[1]
Web user agents range from web browsers and e-mail clients to search engine crawlers ("spiders"), as well as mobile phones, screen readers and braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-Agent: (case does not matter) and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.
The user-agent string is one of the criteria by which web crawlers can be excluded from certain pages or parts of a website using the "Robots Exclusion Standard" (robots.txt). This allows webmasters who feel that certain parts of their website should not be included in the data gathered by a particular crawler, or that a particular crawler is using up too much bandwidth, to request that crawler not to visit those pages.
Contents |
[edit] User agent spoofing
At various points in its history, use of the Web has been dominated by one browser to the extent that many websites are designed to work only with that particular browser, rather than according to standards from bodies such as the W3C and IETF. Such sites often include "browser sniffing" code, which alters the information sent out depending on the User-Agent string received. This can mean that less popular browsers are not sent complex content, even though they might be able to deal with it correctly, or in extreme cases refused all content. Thus various browsers "cloak" or "spoof" this string, in order to identify themselves as something else to such detection code; often, the browser's real identity is then included later in the string.
The earliest example of this is Internet Explorer's use of a User-Agent string beginning "Mozilla/<version> (compatible; MSIE <version>...", in order to receive content intended for Netscape Navigator, its main rival at the time of its development. This was not a reference to the open-source Mozilla browser, which was developed much later, but to the original codename for Navigator, which was also the name of the Netscape company mascot. This format of User-Agent string has since been copied by other user agents, partly because Explorer, in turn, came to dominate.
When Internet Explorer became the dominant web browser, rivals such as Firefox, Safari, and Opera implemented systems whereby the user could select a false User-Agent string to send, such as that of a recent version of Explorer. Some – e.g. Firefox and Safari – duplicate the User-Agent string they are trying to spoof exactly; others – e.g. Opera – duplicate the User-Agent string but add the genuine browser name to the end. This latter approach, of course, leads to a string containing three names and versions: first, the user agent claims to be "Mozilla" (i.e. Netscape Navigator); then, "MSIE" (Internet Explorer); and finally, the actual browser, such as "Opera". Opera also offers a full masking as Internet Explorer or Firefox, which hides "Opera" completely.
Beside browsers, other programs using the HTTP protocol, like most download managers and offline browsers, also had the ability to change the user agent string sent to servers to user's liking. This is presumably done in an effort to maintain compatibility with certain servers (some servers refused to serve those programs right away because they are mostly used carelessly, thus burdening the server).
Some web developers have started a "Viewable With Any Browser" campaign[2] which encourages developers to design webpages that work regardless of the browser used.
One result of user agent spoofing is that the usage share of Internet Explorer, the user agent browsers typically spoof, is probably overestimated, and the usage share of other browsers may be underestimated. User agent spoofing can also provide a security issue by spoofing search engine bots and bypassing key parts in a website.
[edit] User agent sniffing
The term user agent sniffing refers to the practice of websites showing different content when viewed with a certain user agent. On the Internet, this will result in a different site being shown when browsing the page with a specific browser (e.g. Microsoft Internet Explorer). An infamous example of this is Microsoft Exchange Server 2003's Outlook Web Access feature. When viewed with IE, much more functionality is displayed compared to the same page in any other browser. User agent sniffing is mostly considered poor practice for Web 2.0 web sites, since it encourages browser specific design. Many webmasters are recommended to create an HTML markup that is as standardised as possible, to allow correct rendering in as many browsers as possible.
Websites specifically targeted towards mobile phones, like NTT DoCoMo's I-Mode or Vodafone's Vodafone Live! portals, often rely heavily on user agent sniffing, since mobile browsers often differ greatly from each other. Many developments in mobile browsing have been made in the last few years, while many older phones that do not possess these new technologies are still heavily used. Therefore, mobile webportals will often generate completely different markup code depending on the mobile phone used to browse them. These differences can be small (e.g. resizing of certain images to fit smaller screens), or quite extensive (e.g. rendering of the page in WML instead of XHTML).
There are a number of ways to perform user agent sniffing within web applications, including using public domain scripts and commercial products.
[edit] Encryption strength "U" / "I" / "N"
Web browsers created in the United States, such as Netscape Navigator, Internet Explorer, and some others, use one of these three letters to specify the browser's encryption strength. Since the US government formerly would not allow encryption higher than 40-bit to be exported from the country, different versions were released with different encryption strengths. "U" stands for "USA" (for the version with 128-bit encryption), "I" stands for "International" (the browser has 40-bit encryption and can be used anywhere in the world), "N" stands for "None" (no encryption). Originally the "U" version of such browsers was only available for download to those in the USA. The US government has since loosened its policy and exporting high encryption is now permitted to most countries (see Export of cryptography for more information). Now Netscape and Mozilla distribute their browsers only in a "U" version, supporting up to 256-bit encryption, since an international version is no longer required.
This information can be seen in Mozilla Firefox browser by typing about:config and searching for the string general.useragent.security.
[edit] See also
- Robots Exclusion Standard
- Web crawler
- Wireless Universal Resource File (WURFL)
- User Agent Profile (UAProf)
[edit] References
- ^ RFC 3261, SIP: Session Initiation Protocol, IETF, The Internet Society (2002)
- ^ "Viewable with Any Browser" campaign