Incast-free TCP for Data Center Networks

Kulkarni, Santosh

View/Open

skulkarni_Incast-free-TCP.pdf.txt (181.3Kb)

skulkarni_Incast-free-TCP.pdf (1.868Mb)

Date

2012-10-03

Author

Kulkarni, Santosh

Type of Degree

dissertation

Department

Computer Science

Metadata

Show full item record

Abstract

Cloud Computing is a computing paradigm that involves delivering hosted services over the Internet, based on a `pay-per-use' approach. This new style of computing, promises to revolutionize the IT industry by making computing available over the Internet, in a fashion similar to other utilities like water, electricity, gas and telephony. Growing adoption of Cloud Computing, by both the IT industry and the general public, is driving the service providers into creating new data centers. Data centers are facilities that typically host tens of thousands of servers. These servers communicate with each other over high speed network interconnects. With growing application deployments, data centers utilize a multi-tiered model where several servers work together to service a single client request. As a result, the overall application performance in a data center, largely depends on the efficiency of its underlying communication fabric. There are essentially two high level choices for building communication fabric for data centers. The first option leverages specialized hardware and communication protocols like Infiniband, FibreChannel or Myrinet; the second leverages off-the-shelf commodity products like Ethernet based switches and routers. Cost and compatibility reasons persuade many data centers to consider the second option for their baseline communication fabric. Until a few years ago, Ethernet speeds inside data centers averaged around 100 Mbps. However, evolution of IEEE 802.3 standards led to the development of 1 Gbps and 10 Gbps Ethernet networks. The sudden jump in Ethernet speeds from 100 Mbps to 1 Gbps and 10 Gbps requires proportional scaling for TCP/IP processing, so that the network intensive applications can ultimately benefit from the increased network bandwidth. Although IP is expected to scale well with Ethernet, there are some legitimate questions about TCP. TCP is a mature technology that has survived the test of time. However, the unique workloads, speed and scale of modern data centers violate some of the basic assumptions that TCP was originally based upon. As a result, when TCP is utilized in high-bandwidth, low-latency data center environments, we discover new shortcomings in the protocol. One such shortcoming is referred to as the `Incast' problem. TCP Incast is a catastrophic collapse in TCP's throughput that occurs in high bandwidth, low latency network environments when multiple senders communicating with a single receiver, collectively send enough data to surpass the buffering abilities of the receiver's Ethernet switch. The problem arises from a subtle interaction between limited Ethernet switch buffer sizes, TCP's loss recovery mechanisms and the many-to-one synchronized traffic patterns. Unfortunately, such traffic patterns occur frequently in many data center applications and services. Hence, a feasible solution that addresses the Incast problem is urgently needed. Our objective in this dissertation, is to address TCP's Incast problem by providing transport layer solutions that are both practical and backward compatible. We approach this goal in two steps. First, we derive an analytical model of TCP Incast. Such a model is essential to understand the reasons behind TCP's throughput collapse. The analytical model provides a closed form equation, which can be used to compute throughput at the client for various synchronized workloads. We verify the accuracy of our model against measurements taken from ns-2 simulations. Next, we discuss some solutions that were designed to address TCP Incast at the transport layer. Specifically, we develop transport layer solutions that improve TCP's performance under Incast traffic, by either proactively detecting network congestion through probabilistic retransmission or by dynamically resizing TCP's segments in order to avoid incurring timeout penalty. We evaluate the merits of the aforementioned solutions using ns-2 simulations. Results show that each of our suggested techniques outperforms standard TCP under various experimental conditions.

URI

http://hdl.handle.net/10415/3365