URL redirection

From Wikipedia, the free encyclopedia

Jump to: navigation, search

URL redirection, also called URL forwarding, domain redirection and domain forwarding, is a technique on the World Wide Web for making a web page available under many URLs.

Contents

[edit] Purposes

There are several reasons to use URL redirection:

[edit] Similar domain names

A web browser user might mis-type a URL -- for example, "exampel.com" and "exmaple.com". Organizations often register these "mis-spelled" domains and re-direct them to the "correct" location: example.com. For example: the addresses example.com and example.net could both redirect to a single domain, or web page, such as example.org. This technique is often used to "reserve" other TLDs with the same name, or make it easier for a true ".edu" or ".net" to redirect to a more recognizable ".com" domain.

[edit] Moving a site to a new domain

Why redirect a web page?

  • A web site might need to change its domain name.
  • An author might move his or her pages to a new domain.
  • Two web sites might merge.

With URL redirects, incoming links to an outdated URL can be sent to the correct location. These links might be from other sites that have not realized that there is a change or from bookmarks/favorites that users have saved in their browsers.

The same applies to search engines. They often have the older/outdated domain names and links in their database and will send search users to these old URLs. By using a "moved permanently" redirect to the new URL, visitors will still end at the correct page. Also, in the next search engine pass, the search engine should detect and use the newer URL.

[edit] Logging outgoing links

The access logs of most web servers keep detailed information about from where visitors came and how they browsed the hosted site. They do not, however, log which links visitors left by. This is because the visitor's browser has no need to communicate with the original server when the visitor clicks on an outgoing link.

This information can be captured in several ways. One way involves URL redirection. Instead of sending the visitor straight to the other site, links on the site can direct to a URL on the original website's domain that automatically redirects to the real target. This technique bears the downside of the delay caused by the additional request to the original website's server. As this added request will leave a trace in the server log, revealing exactly which link was followed, it can also be a privacy issue.[1]

The same technique is also used by some corporate websites to implement a statement that the subsequent content is at another site, and therefore not necessarily affiliated with the corporation. In such scenarios, displaying the warning causes an additional delay.

[edit] Short, meaningful, persistent aliases for long or changing URLs

Currently, web engineers tend to pass descriptive attributes in the URL to represent data hierarchies, command structures, transaction paths and session information. This results in a URL that is aesthetically unpleasant and difficult to remember. Sometimes the URL of a page changes even though the content stays the same.

[edit] Manipulating search engines

Some years ago, redirect techniques were used to fool search engines. For example, one page could show popular search terms to search engines but redirect the visitors to a different target page. There are also cases where redirects have been used to "steal" the page rank of one popular page and use it for a different page, usually involving the 302 HTTP status code of "moved temporarily." [2] [3]

Search engine providers noticed the problem and took appropriate actions. Usually, sites that employ such techniques to manipulate search engines are punished automatically by reducing their ranking or by excluding them from the search index.

As a result, today, such manipulations usually result in less rather than more site exposure.

[edit] Satire and criticism

In the same way that a Google bomb can be used for satire and political criticism, a domain name that conveys one meaning can be redirected to any other web page, sometimes with malicious intent.

[edit] Manipulating visitors

URL redirection is sometimes used as a part of phishing attacks that confuse visitors about which web site they are visiting.

[edit] Techniques

There are several techniques to implement a redirect. In many cases, Refresh meta tag is the simplest one. However, there exist several strong opinions discouraging this method.

[edit] Manual redirect

The simplest technique is to ask the visitor to follow a link to the new page, usually using an HTML anchor as such:

Please follow <a href="http://www.example.com/">link</a>!

This method is often used as a fall-back for automatic methods — if the visitor's browser does not support the automatic redirect method, the visitor can still reach the target document by following the link.

[edit] HTTP status codes 3xx

In the HTTP protocol used by the World Wide Web, a redirect is a response with a status code beginning with 3 that induces a browser to go to another location.

The HTTP standard defines several status codes for redirection:

  • 300 multiple choices (e.g. offer different languages)
  • 301 moved permanently
  • 302 found (e.g. temporary redirect)
  • 303 see other (e.g. for results of cgi-scripts)
  • 307 temporary redirect

All of these status codes require that the URL of the redirect target be given in the Location: header of the HTTP response. The 300 multiple choices will usually list all choices in the body of the message and show the default choice in the Location: header.

Within the 3xx range, there are also some status codes that are quite different from the above redirects (they are not discussed here with their details):

  • 304 not modified
  • 305 use proxy

This is a sample of a HTTP response that uses the 301 "moved permanently" redirect:

HTTP/1.1 301 Moved Permanently
Location: http://www.example.org/
Content-Type: text/html
Content-Length: 174
 
<html>
<head>
<title>Moved</title>
</head>
<body>
<h1>Moved</h1>
<p>This page has moved to <a href="http://www.example.org/">http://www.example.org/</a>.</p>
</body>
</html>

[edit] Using server-side scripting for Redirection

Often, web authors don't have sufficient permissions to produce these status codes: The HTTP header is generated by the web server program and not read from the file for that URL. Even for CGI scripts, the web server usually generates the status code automatically and allows custom headers to be added by the script. To produce HTTP status codes with cgi-scripts, one needs to enable non-parsed-headers.

Sometimes, it is sufficient to print the "Location: 'url'" header line from a normal CGI script. Many web servers choose one of the 3xx status codes for such replies.

Frameworks for server-side content generation typically require that HTTP headers be generated before response data. As a result, the web programmer who is using such a scripting language to redirect the user's browser to another page must ensure that the redirect is the first or only part of the response. In the ASP scripting language, this can also be accomplished using the methods response.buffer=true and response.redirect "http://www.example.com". Using PHP, one can use header("Location: http://www.example.com");.

According to the HTTP protocol, the Location header must contain an absolute URI[4]. When redirecting from one page to another within the same site, it is a common mistake to use a relative URI. As a result most browsers tolerate relative URIs in the Location header, but some browsers display a warning to the end user.

[edit] Using .htaccess for Redirection

When using the Apache web server, directory-specific .htaccess files (as well as apache's main configuration files) can be used. For example, to redirect a single page:

Redirect 301 /oldpage.html http://www.example.com/newpage.html

To change domain names:

RewriteEngine On
 
RewriteCond %{HTTP_HOST} ^.*oldwebsite\.com$ [NC]
RewriteRule ^(.*)$ http://www.preferredwebsite.net/$1 [R=301,L]

Use of .htaccess for this purpose usually does not require administrative permissions, though it can be disabled. When you have access to apache main config file (such as httpd.conf), it is best to avoid the use of .htaccess files.

[edit] Refresh Meta tag and HTTP refresh header

Netscape introduced a feature to refresh the displayed page after a certain amount of time. This method is often called meta refresh. It is possible to specify the URL of the new page, thus replacing one page after some time by another page:

A timeout of 0 seconds means an immediate redirect.

This is an example of a simple HTML document that uses this technique:

<html><head>
  <meta http-equiv="Refresh" content="0; url=http://www.example.com/">
</head><body>
  <p>Please follow <a href="http://www.example.com/">link</a>!</p>
</body></html>
  • This technique is usable by all web authors because the meta tag is contained inside the document itself.
  • The meta tag must be placed in the "head" section of the HTML file.
  • The number "0" in this example may be replaced by another number to achieve a delay of that many seconds.
  • This is a proprietary extension to HTML, introduced by Netscape but supported by most web browsers. The manual link in the "body" section is for users whose browsers do not support this feature.

This is an example of achieving the same effect by issuing a HTTP refresh header:

HTTP/1.1 200 ok
Refresh: 0; url=http://www.example.com/
Content-type: text/html
Content-length: 78
 
Please follow <a href="http://www.example.com/">link</a>!

This response is easier to generate by CGI programs because one does not need to change the default status code. Here is a simple CGI program that effects this redirect:

#!/usr/bin/perl
print "Refresh: 0; url=http://www.example.com/\r\n";
print "Content-type: text/html\r\n";
print "\r\n";
print "Please follow <a href=\"http://www.example.com/\">link</a>!"

Note: Usually, the HTTP server adds the status line and the Content-length header automatically.

This method is considered by the W3C to be a poor method of redirection, since it does not communicate any information about either the original or new resource, to the browser (or search engine). The W3C's Web Content Accessibility Guidelines (7.4) discourage the creation of auto-refreshing pages, since most web browsers do not allow the user to disable or control the refresh rate. Some articles that they have written on the issue include W3C Web Content Accessibility Guidelines (1.0): Ensure user control of time-sensitive content changes and Use standard redirects: don't break the back button!

[edit] JavaScript redirects

JavaScript offers several ways to display a different page in the current browser window. Quite frequently, they are used for a redirect. However, there are several reasons to prefer HTTP header or the refresh meta tag (whenever it is possible) over JavaScript redirects:

  • Security considerations
  • Some browsers don't support JavaScript
  • many web crawlers don't execute JavaScript.

[edit] Frame redirects

A slightly different effect can be achieved by creating a single HTML frame that contains the target page:

<frameset rows="100%">
  <frame src="http://www.example.com/">
</frameset>
<noframes>
  <body>Please follow <a href="http://www.example.com/">link</a>!</body>
</noframes>

One main difference to the above redirect methods is that for a frame redirect, the browser displays the URL of the frame document and not the URL of the target page in the URL bar.

This technique is commonly called cloaking. This may be used so that the reader sees a more memorable URL or, with fraudulent intentions, to conceal a phishing site as part of website spoofing.[5]

[edit] Redirect loops

It is quite possible that one redirect leads to another redirect. For example, the URL http://www.wikipedia.com/wiki/URL_redirection (note the differences in the domain name) is first redirected to http://www.wikipedia.org/wiki/URL_redirection and again redirected to the correct URL: http://en.wikipedia.org/wiki/URL_redirection. This is appropriate: the first redirection corrects the wrong domain name. The second redirection selects the correct language section. Finally, the browser displays the correct page.

Sometimes, however, a mistake can cause the redirection to point back to the first page, leading to an infinite loop of redirects. Browsers usually break that loop after a few steps and display an error message instead.

The HTTP standard states:

A client SHOULD detect infinite redirection loops, since such loops generate network traffic for each redirection.

Previous versions of this specification recommended a maximum of five redirections; some clients may exist that implement such a fixed limitation.

[edit] Services

There exist services that can perform URL redirection on demand, with no need for technical work or access to the webserver your site is hosted on.

[edit] URL redirection services

A redirect service is an information management system, which provides an internet link that redirects users to the desired content. The typical benefit to the user is the use of a memorable domain and a reduction in the length of the URL or web address. A redirecting link can also be used as a permenant address for content that frequently changes hosts, similarly to the Domain Name System.

Hyperlinks involving URL redirection services are frequently used in spam messages directed at blogs and wikis. Thus, one way to reduce spam is to reject all edits and comments containing hyperlinks to known URL redirection services; however, this will also remove legitimate edits and comments and may not be an effective method to reduce spam.

Recently, URL redirection services have taken to using AJAX as an efficient, user friendly method for creating shortened URLs.

[edit] History

The first redirect services took advantage of top-level domains (TLD) such as ".to" (Tonga), ".at" (Austria) and ".is" (Iceland). The first mainstream redirect service was V3.com that boasted 4 million users at its peak in 2000. V3.com success was attributed to having a wide variety of short memorable domains including "go.to", "i.am", "come.to" and "start.at". V3.com was acquired by FortuneCity.com, a large free web hosting company, in early 1999. As the sales price of top level domains started falling from $70.00 per year to less than $10.00, the demand for short urls or web redirection services eroded. However, with the 140 character limit on the extremely popular Twitter.com service, these short url services have seen a resurgence. Today, many short url services are still in operation including V3.com, now known as go.to, xr.com, cjb.net, tinyurl.com, its.my and dozens of others. A major drawback of some URL redirection services is the use of delay pages, or frame based advertising, to generate revenue.

[edit] URL obfuscation services

There exist redirection services for hiding the referrer using META refresh, such as Anonymity.com and Anonym.to.

This is very easy to do with PHP, such as in this example.

<?php
/* This code is placed into the public domain */
/* Will redirect a URL */
 
$url = $_GET['url'];
?>
<html>
<head><title>Redirect</title>
<meta http-equiv="refresh" content="0; URL=<?php echo($url); ?>">
</head>
<body>You should be able to be redirected to <a href="<?php echo($url); ?>"><?php echo($url); ?></a>.</body>
</html>

This code can then be accessed by example,

http://example.org/redirect.php?url=http://www.google.com

The above example code may not work correctly with URLs containing variables, unless the input is first encoded, or code is added that loops across the $_GETs and pieces together the final URL.

[edit] See also

[edit] References

  1. ^ "Google revives redirect snoopery". blog.anta.net. 2009-01-29. ISSN 1797-1993. http://blog.anta.net/2009/01/29/509/. Retrieved on 2009-01-30. 
  2. ^ Google's serious hijack problem
  3. ^ Stop 302 Redirects and Scrapers from Hijacking Web Page PR - Page Rank
  4. ^ R. Fielding, et al., Request for Comments: 2616, Hypertext Transfer Protocol — HTTP/1.1, published 1999-07, §14.30 "Location", fetched 2008-10-07
  5. ^ Anti-Phishing Technology", Aaron Emigh, Radix Labs, 19 January 2005

[edit] External links

Personal tools