TCP Offload Engine
From Wikipedia, the free encyclopedia
TCP Offload Engine or TOE is a technology used in network interface cards to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and 10 gigabit Ethernet, where processing overhead of the network stack becomes significant.
The term, TOE, is often used to refer to the NIC itself, although it more accurately refers only to the integrated circuit included on the card which processes the TCP headers. TOEs are often suggested as a way to reduce the overhead associated with new protocols like iSCSI.
Contents |
[edit] Purpose
Originally TCP was designed for unreliable low speed networks (such as early dial-up modems) but with the growth of the Internet in terms of internet backbone transmission speeds (Optical Carrier, gigabit Ethernet and 10 gigabit Ethernet links) and faster and more reliable access mechanisms (such as Digital Subscriber Line and cable modems) it is now sometimes used in datacenters and desktop PC environments at speeds over 1 gigabit per second. The TCP software implementations on host systems require extensive computing power. Full duplex gigabit TCP communication using software processing alone is enough to consume more than 80% of a 2.4 GHz Pentium 4 processor (see Freed Up CPU Cycles), resulting in little or no processing resources left for the applications to run on the system.
As TCP is a connection-oriented protocol, this adds to the complexity and processing overhead of the protocol. These aspects include:
- Connection establishment using the three way handshake, this involves a number of messages passing between the connection initiator and the connection responder prior to any data flowing between the two endpoints.
- Acknowledgment of packets as they are received by the far end, adding to the message flow between the endpoints and thus the protocol load.
- Checksum and sequence number calculations - again a burden on a general purpose CPU to perform.
- Sliding window calculations for packet acknowledgement and congestion control.
- Connection termination.
Moving some or all of these functions to dedicated hardware, a TCP Offload Engine, frees the system's main CPU for other tasks. As of 2008, very few consumer network interface cards support TOE. However, the number of servers with either a TOE enabled network interface card or mother board TOE enabled chip is increasing.
[edit] Freed Up CPU Cycles
A generally accepted rule of thumb is that 1 hertz of CPU processing is required to send or receive 1 bit of TCP/IP [1]. For example 5 Gbit/s (625 MB/s) of network traffic requires 5 GHz of CPU Processing. This implies that 2 entire cores of a 2.5 GHz multi-core processor will be required to handle the TCP/IP processing associated with 5 Gbit/s of TCP/IP traffic. Since Ethernet (10Ge in this example) is bidirectional it is possible to send and receive 10 Gbit/s (for an aggregate throughput of 20 Gbit/s). Using the 1 Hz/ bit rule this equates to 8 - 2.5 GHz cores. (Few if any current day servers have a requirement to move 10 Gbit/s in both directions but not so long ago 1 Gbit/s full duplex was thought to be more than enough bandwidth.)
Many of the CPU cycles used for TCP/IP processing are "freed up" by TCP/IP offload and may be used by the CPU (usually a server CPU) to perform other tasks such a file system processing (in a file server) or indexing (in a backup media server). In other words, a server with TCP/IP offload can do more server work than a server without TCP/IP Offload NICs.
[edit] Reduction of PCI traffic
In addition to the protocol overhead that TOE can address, it can also address some architectural issues that affect a large percentage of host based (Server and PC) endpoints. Currently most end point hosts are PCI bus based, which provides a standard interface for the addition of certain peripherals such as Network Interfaces to Servers and PCs. PCI is inefficient for transferring small bursts of data from host memory across the PCI bus to the network interface ICs but its efficiency improves as the data burst size increases. Within the TCP protocol, a large number of small packets are created (e.g acknowledgements) and as these are typically generated on the host CPU and transmitted across the PCI bus and out the network physical interface, this impacts the host computer IO throughput.
A TOE solution, located on the network interface, is located on the other side of the PCI bus from the CPU host so it can address this I/O efficiency issue, as the data to be sent across the TCP connection can be sent to the TOE from the CPU across the PCI bus using large data burst sizes with none of the smaller TCP packets having to traverse the PCI bus.
[edit] History
One of the first recorded patents (USPTO #5,355,453 [2] ) for the concept of Network Stack Offload was issued to Auspex Systems in the early 1990 under the name 'Parallel I/O network file server architecture category' This became known as Functional Multi-Processing (FMP). Under FMP, Network Processing, File Processing and Storage Processing are each executed on a separate Functional Processing Card, as opposed to symmetric multiprocessing which executes all three functions, together with user applications and all other processor tasks, on (increasing numbers of) general-purpose processors. The Auspex Network Processor performed full UDP Offload. (UDP is much simpler than TCP but the basic concept of Network Stack Offload still applied). The founder of Auspex Systems, Larry Boucher, and a number of Auspex engineers founded Alacritech in 1997 with the idea of extending the concept of network stack offload to TCP and implementing it in custom silicon.
Alacritech introduced the first Parallel Stack Full Offload network card in early 1999. The company’s SLIC (Session Layer Interface Card) was the predecessor to its current TOE offerings. Alacritech holds 26 patents in the area of TCP/IP Offload. #6,247,060[3] “Passing a Communication Block from Host to a Local Device such that a message is processed on the Device” was issued on 6/12/2001. In 2005 Microsoft licensed Alacritech's patent base and along with Alacritech created the Partial TCP Offload Architecture which has become know as TCP Chimney Offload (See Types of TCP/IP Offload). TCP Chimney Offload centers on the Alacritech "Communication Block Passing Patent". At the same time, Broadcom also obtained a license to build TCP Chimney Offload Chips.
An original TOE implementation was developed and a patent applied for (USPTO #0040042487[4] ) by Valentin Ossman, who later founded Tehuti Networks Ltd. based on his patented technology. A patent ("United States Patent: 6996070". http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=6,996,070.PN.&OS=PN/6,996,070&RS=PN/6,996,070. Retrieved on 2008-02-20.) was granted on Dec. 25th, 2007 titled "System and method for TCP/IP offload independent of bandwidth delay product" and another one ("United States Patent: 7313623". http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=7,313,623.PN.&OS=PN/7,313,623&RS=PN/7,313,623. Retrieved on 2008-02-20.) on Feb. 7th, 2006 titled "TCP/IP offload device with reduced sequential processing". Valentin Ossman is also credited with the introduction of the acronym TOE.
[edit] Types of TCP/IP Offload
[edit] Parallel Stack Full Offload
Parallel Stack Full Offload gets its name from the concept of two parallel TCP/IP Stacks. The first is the main host stack which is included with the host OS. The second or "parallel stack" is connected between the Application Layer (using the Internet protocol suite naming conventions) and the Transport Layer (TCP) using a "vampire tap". The vampire tap intercepts TCP connection requests by applications and is responsible for TCP connection management as well as TCP data transfer. Many of the criticisms in the following section relate to this type of TCP offload.
[edit] HBA Full Offload
HBA Full Offload is found in iSCSI Host Bus Adapters which present themselves as Disk Controllers to the Host System while connecting (via TCP/IP) to an iSCSI Storage Device. This type of TCP Offload not only offloads TCP/IP processing but it also offloads the iSCSI Initiator Function. Because the HBA appears to the host as a Disk Controller it can only be used with iSCSI devices and is not appropriate for general TCP/IP Offload.
[edit] TCP Chimney Partial Offload
TCP Chimney Offload addresses the major security criticism of Parallel Stack Full Offload. In Partial Offload the main System Stack controls all connections to the host. After a connection has been established between the local host (usually a server) and a foreign host (usually a client) the connection and its state are passed to the TCP offload engine. The heavy lifting of data transmit and receive is handled by the offload device. Almost all TCP offload engines use some type of TCP/IP hardware implementation to perform the data transfer without host CPU intervention. When the connection is closed, the connection state is returned from the offload engine to the main system stack. Maintaining control of TCP connections allows the main system stack to implement and control connection security.
[edit] Lack of support in Linux
Stock Linux does not support TOE. There are third party patches from the hardware manufacturers (such as Chelsio) that add support; however, the kernel developers are opposed to this technology. Some of the cited reasons include[5]:
- Security - because TOE is implemented in hardware, patches must be applied to the TOE firmware, instead of just software, to address any security vulnerabilities found in a particular TOE implementation. This is further compounded by the newness and vendor-specificity of this hardware, as compared to a well tested TCP/IP stack as is found in an operating system that does not use TOE.
- Limitations of hardware - because connections are buffered and processed on the TOE chip, resource starvation can more easily occur as compared to the generous cpu and memory available to the operating system.
- Complexity - TOE breaks the assumption that kernels make about having access to all resources at all times - details such as memory used by open connections are not available with TOE. TOE also requires very large changes to a networking stack in order to be supported properly, and even when that is done, features like QoS and packet filtering typically do not work.
- Proprietary - TOE is implemented differently by each hardware vendor. This means more code must be rewritten to deal with the various TOE implementations, at a cost of the aforementioned complexity and, possibly, security. Furthermore, TOE firmware cannot be easily modified since it is closed-source.
Note that these claims are considered rather questionable outside of the Linux kernel developers community - they are not backed by any research or security track records, while the performance improvements observed in other open source systems, such as FreeBSD are easily measurable.
[edit] Suppliers
Much of the current work on TOE technology is by manufacturers of 10 Gigabit Ethernet interface cards, such as Alacritech, Broadcom Corporation, Chelsio Communications, LeWiz Communications, Neterion Technologies, NetXen Inc. and Tehuti Networks Ltd.
[edit] See also
[edit] References
- ^ TCP performance re-visited
- ^ United States Patent: 5355453 "Parallel I/O network file server architecture category"
- ^ United States Patent: 6247060 "Passing a Communication Block from Host to a Local Device such that a message is processed on the Device"
- ^ United States Patent Application: 20040042487 "Network traffic accelerator system and method"
- ^ Net:TOE Explanation why Linux doesn't support TOE
[edit] External links
- Article: TCP Offload to the Rescue by Andy Currid at ACM Queue
- Patent Application 20040042487
- Mogul, Jeffrey C. (2003). "TCP offload is a dumb idea whose time has come". Proceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, USENIX Association. Retrieved 23 July 2006
- Introduction to the TCP/IP offload Engine