Communication is a key element in data center performance and cost, both capital and operating. Presently, electronic packet-based Top-of-Rack (ToR) switches are interconnected by layers similar switches, but nearly all links are fiber-optic for reasons of distance and data rate, requiring expensive optical transceivers and electrical-optical-electrical (EOE) conversion per packet hop.
Oftentimes, most of the data bits are part of bulk transfers, insensitive to packet-level latency. Based on this observation, hybrid networks have been proposed: a conventional packet-switched network carries latency-sensitive packets, while the bulk traffic is carried by a circuit-switched network employing dumb physical-layer optical switches directly interconnecting the ToR switches on demand; every ToR switch is connected to both networks via a fixed subset of its ports. Key shortcomings of this approach are 1) possible overload of the central controller, 2) if paths traverse multiple optical switches, a need to tear down and reroute existing connections in order to accommodate a new one, and 3) a fixed partitioning of ToR switch ports between the two networks.
RotorNet was proposed (UCSD) as a scalable low-complexity optical non-adaptive circuit-switched datacenter network for bulk transfers. It comprises a set of Rotor switches, each capable of providing only cyclic shift permutations among the ToR switches. The Rotors are switched cyclically in a fixed known round-robin manner. RotorNet provides a direct connection between every ToR switch pair, but only during a small fraction of time. In order to enable full-bandwidth connectivity between ToR switch pairs, RotorNet permits two hops, with ToR switches also serving as intermediate nodes at which packets await the availability of a direct connection to their destination. RotorNet's additional hop penalty is countered by automatic load balancing and control simplicity. However, the fixed ToR switch port partitioning remains.
Our goal in this work is to offer a single unified network for all traffic, while retaining the benefits of separate networks. Specifically, we adopt the RotorNet idea, and adapt it to handle both bulk and latency-sensitive traffic on the same fabric. Moreover, we introduce a memory usage consideration for bulk traffic and address it in our solution.
Our key observation is that, at any given time, the Rotor switches can jointly provide a connected static topology among the ToR switches, with each Rotor providing a set of links. We still limit bulk traffic to two hops, so it sees a RotorNet and conserves link usage, but latency-sensitive packets are routed over multi-hop paths through the then-relevant "static" graph for minimum latency, and see a small-diameter graph.
Having numerous unrelated routing tables, one per time slot in the rotation schedule, is problematic. Instead, based on finite-field isomorphism, we designed a parametric sequence of isomorphic graphs, each comprising a set of cyclic-shift permutations among the ToR switches. Each optical switch implements such a permutation in each time slot, and we moreover stagger the switching times in order to provide smooth incremental topology transitions. The parameter value is chosen at design time in order to jointly obtain small diameter graphs for the latency-sensitive packets and a short waiting time (and resulting buffer memory requirement) for bulk traffic at the intermediate ToR switches. Additional salient features include bidirectional links for protocol support, minimum-path-routing flexibility, a degree of fault tolerance, and simple, use-and update-friendly routing tables. We compare our work to Opera, a similar approach carried out in parallel to our work.
In summary, we believe that our research paves the way to a practical, efficient, unified data center network featuring flexible resource allocation and efficiently supporting all types of traffic.
* M.Sc. student under the supervision of Prof. Yitzhak (Tsahi) Birk
Zoom invitation link: https://technion.zoom.us/j/92257320373