Host-based Routing Using Peer DMA
	       Joe Touch, Anne Hutton, and Simon Walton
			       USC/ISI

	Host-based routing is used in the research community both to
provide heterogeneous connectivity where no router interfaces exist,
and to support more flexible control of routing functions. Host
interfaces additionally track technology advances more closely than
router interfaces, because they dominate overall interface
sales. Hosts are also preferred platforms for Active Networks
research, to support packet-based programs to modify router functions.
The development of efficient, high-performance host-based routing
requires modification of the packet processing routines. Here we
describe our efforts to provide fast host routing using peer DMA, to
minimize shared backplane bandwidth use, to reduce backplane access
contention, and to reduce CPU load, while providing increased routing
throughput. We increased forwarding bandwidth by over 40%, while
reducing backplane bus utilization by 50%, supporting host forwarding
bandwidth at rates near the loopback bandwidth of the interfaces.
	USC/ISI maintains a production 1.2 Gbps LAN, based on Myricom
components, as the primary network connection for approximately 50 Sun
SPARC hosts in its Computer Networks Division. This technology lacks a
router-based network interface, and so relies on host-based routing
for its interconnection to other LANs and the wide-area, currently
supporting both ATM and fast Ethernet interconnection.
	We performed experiments using 3 sources and 3 sinks, through
a single host-based router based on a Pentium-Pro 200Mhz, using
PCI-based network interface cards. Our initial experiments tested
routing between address-distinct Myrinet LANs, in which each host can
source and sink 300 Mbps of un-checksummed 8 Kbyte UDP packets
(checksums drop this by 30 Mbps, to 270 Mbps). FreeBSD kernel-based IP
packet forwarding provides 320-335 Mbps, which was increased to
441-482 Mbps using peer DMA forwarding. This is near the speed of
loopback transmission for Myrinet host interfaces, which achieve 500
Mbps when packets do not leave the interface at all.
	The increase was achieved by modifying the kernel packet
forwarding algorithm. In conventional kernel-based forwarding, packets
are DMA'd upon arrival into the host (kernel) memory. Processing of
the header is performed on this kernel copy, and the entire packet is
recopied back out to the desired output interface. The packet
traverses the shared PCI backplane twice, wasting backplane
bandwidth. Packet processing is delayed until the entire packet is
copied in, and packet emission is delayed until the entire packet is
copied out, both times incurring a 'packet-copy-time' of latency.
	In peer DMA, only the packet header is copied into the kernel
upon arrival; the data remains in a buffer on the input interface. The
kernel processes the header as before, and copies it back into the
input interface, overwriting the packet's original header. The kernel
then configures a DMA directly from the input interface to the desired
output interface. The packet data traverses the backplane exactly
once, and the processing is delayed only so long as to wait for the
header to arrive. This can substantially reduce backplane bandwidth
and packet processing latency.
	In our measurements, the CPU was saturated when using peer
DMA. This is the result of the additional packet processing load
afforded by the increased bandwidth available by avoiding the extra
backplane copy. At this time, it appears interrupt processing is the
cause of this load (additional results are expected by the time of the
workshop). These results indicate that peer DMA routing substantially
increases the effective throughput of host-based routers. Ongoing work
is examining the impact of these results on host, backplane, and host
interface architecture, and on the organization of kernel-based
routing functions.

Ref: "High-performance IP Forwarding Using Host Interface Peering",
A. Hutton, et. al, in preparation for the 1998 LAN-MAN Workshop.