The Pluris Massively Parallel Router (MPR)

Vadim Antonov, David Bernstein
Pluris Inc.

	The Pluris Massively Parallel Router (MPR) is a collection of
single-board computers (processing nodes) and a proprietary data
interconnect. Each processing node has 16 or more megabytes of DRAM
and a 100+ MHz general-purpose microprocessor, sufficient to route IP
packets at OC-3c speed (155 Mbps). The processing nodes, each with its
own copy of the forwarding table, are connected via low-speed lines to
a number of synchronous multiplexers that combine low-speed data
streams into high-speed streams on backbone circuits. The first, and
most significant, advantage is that Pluris MPR technology is the only
technology known today that makes building a global terabit-per-second
network possible. Other technologies do not achieve high speeds
(conventional IP routing), or do not build truly global networks (ATM).
	The Pluris MPR is a very high-performance machine, but it is
composed entirely from off-the-shelf integrated circuits, making it a
low-cost and very reliable device. The maximal capacity of MPR is
limited solely by the maximal length of coaxial cables interconnecting
parts of the machine. The present design is capable of housing 16K
processing nodes in 64 open racks arranged in 4 rows, to achieve the
aggregate routing capacity of 2.4 Tbps (or 7 billion packets per
second). One or several dedicated processing nodes equipped with
64-256 Mb of DRAM are used for performing routing protocols.  When
several such nodes are used, the output of every protocol engine is
broadcasted to all forwarding nodes, so if a protocol engine node
fails or is removed, the operation will continue. The failure of a
forwarding node only causes reduction of throughput, but not
interruption of service.
	The Pluris approach is based on the observation that although
aggregate data rates of Internet traffic are skyrocketing, the
bandwidth of individual communication sessions remains relatively
small (in fact, it cannot grow faster than the performance of host
computers). This means that a high aggregate routing capacity can be
achieved by distributing the paths of packets in those connections
between a large number of medium-performance routing engines. The
Pluris process performs two steps for each packet: firstly, the exit
high-speed communication line is determined, and secondly, one of the
low-speed lines corresponding to the exit high-speed line is
selected. The Pluris data interconnect is a patent-pending
Self-Healing Butterfly Switch based on 1.2Gbps serial communication
lines. Unlike the well-known butterfly and Benes switches, the Pluris
switch is fault-tolerant, so the packets are automatically rerouted in
case of failures in links or routing elements.
	"Naive" second-step selection techniques such as random
selection and round-robin causes the unacceptable reordering of
packets. To alleviate this problem, selection is made by computing a
hash function from the packet's source and destination addresses and,
optionally, port numbers.  The use of the hash function from the
values of the packet's fields, which are invariant for all packets
within a single TCP (or any other transport protocol) session,
guarantees that all those packets will follow the same path, and
therefore will not be reordered. Hashing effectively randomizes packet
routes, so the load is uniformly distributed between all participating
processing nodes and low-speed lines.
	Together with linear scalability of the data interconnect,
hashing means that the aggregate capacity of the massively parallel
packet router can be increased nearly indefinitely by the simple
addition of processing nodes. With circuitry that is much simpler and
cheaper than hardware implementations of IP routing or ATM switching,
Pluris routers treat high-speed backbone links as quantities of
parallel low-speed circuits, a number of parallel multiplexed
high-speed lines can be combined into a single very high-speed
communication line. In other words, the capacity of a network built
using massively parallel routers is not limited by the capacity of any
physical component.
	An interesting property of a massively parallel packet router
is that it can be configured to form a number of independent routers
interconnected with a very fast "LAN", and thus can be used as a
scalable platform for Internet Exchange Points (IXPs) or Network
Access Points (NAPs). After the MPRs are deployed, future customer
access connections may be fanned out from OC-3c trunks with cheap
low-end ATM switches, routers, and xDSL access racks. This allows ISPs
to select the cheapest customer access technology. Unlike dedicated
backbone routers, Pluris MPR can be configured to service thousands of
such fan-out devices, making it ideal not only for backbone switch
sites, but also for central office installations.
	Any backbone site must have at least two conventional backbone
routers to achieve redundant operation. That means that the number of
hops in the network is increased by the intra-POP hops over the
cluster LAN, thus increasing variance of network latency. The highly
redundant design of MPR eliminates any need to have more than one
router per POP, making convergence times smaller than 0.1 second
realistic and thus eliminating any need for link-layer
redundancy. This will allow ISPs to load existing hot-spare fibers
with user traffic, effectively reducing the cost of transmission by
nearly 50%.
	Unlike hardware-assisted IP routers, MPR routing engines are
completely programmable, and therefore routers will not need any
hardware upgrades to support new protocols. Programmability also means
that a MPR machine can eventually be equipped with additional
processing nodes interfacing with mass-storage devices and programmed
to perform services, such as Web hosting and video-on-demand, or
performing functions of large-scale caching proxy servers, effectively
eliminating the communication bottleneck between servers and the backbone.
	Essentially, Pluris MPR technology is future-proof, allowing
ISP operators to start building infrastructure capable of growing and
adapting to the new requirements far better than any other known
networking technology.