Welcome to the SLAB Project Page!

Cloud datacenters, such as Amazon AWS, Microsoft Azure, and Google Cloud, have become a primary platform for managing users' computation, storage, and communication requirements. According to Gartner, the cloud computing market had a revenue of over $204 billion in 2016.

Today, a modern datacenter hosts tens of thousands of servers that are interconnected by a high bandwidth network. The performance and reliability of the underlying network plays a key role in meeting the desired goals of a cloud datacenter. Central to any datacenter is the ability to utilize and share cloud network resources efficiently. A critical component in any datacenter is a network load balancer that aims to evenly balance traffic across the entire network so that users see high performance while avoiding congestion hotspots.

Designing a scalable and high performing network load balancer is challenging for two key reasons: (a) frequent link and/or device failures and (b) volatility and burstiness of datacenter traffic. Due to the widespread use of commodity off-the-shelf (COTS) equipment, failures are common in datacenters. Studies show that on average, large datacenters can experience up to 40 link failures per day. Thus, an efficient network load balancer must perform well in case of failures. Unfortunately, existing state-of-the-art load balancing schemes perform poorly in such scenarios.

In this project, we propose SLAB (SoftwAre Defined Agile Load Balancing), a network load balancing architecture for datacenters that uses recent advances in Software-Defined Networking (SDN) to achieve efficient, failure-resilient, agile, and deployable load balancing solution. SLAB has three key components:

  • A fine-grained per-packet load balancing mechanism at the switches to evenly spread packets across multiple paths
  • A mechanism for detecting different types of failures (e.g., partial link failures and full link failures)
  • A SDN based logically centralized control plane for dynamically adapting load balancing in case of failures and topological asymmetries

These components are driven by three very crucial design goals of the product:

  • High Throughput: Given enough traffic demand, SLAB must allow flows to fully utilize the available bisection bandwidth.
  • Robust to Asymmetry: SLAB should be able to achieve high performance in the face of both complete and partial failures.
  • Compatible with OpenFlow Switches: Given the popularity of OpenFlow, we aim for a design that is compatible with existing OpenFlow switches and does not require any additional changes to switches.

SLAB has been evaluated on (a) a small-scale custom-built datacenter and (b) large-scale packet-level simulations. The datacenter has been built using SDN/OpenFlow switches. We have evaluated SLAB over widely used datacenter topologies (e.g., 2-stage Clos) as well as over real application workloads such as web search and data mining.

More information can be found here.