Network Processor Design, Benchmarking and Evaluation.

Research is being pursued in the area of active networks and, in particular, in the design of network processors. Initial work has been done on the development of a benchmark of programs, called CommBench, oriented towards testing and evaluating such processors. CommBench contains eight programs evenly divided into two sets with one set oriented towards header processing and the other towards payload processing. Extensive statistics (e.g., miss rates, etc.) have been collected on the benchmark programs. Using this data, research on network processor design is currently focusing on the issues of determining the optimal number of processors and associated cache sizes for a given chip size and technology, and the problems of packet scheduling to avoid high cold cache miss rates. Optimal designs can be obtained using a variety of performance metrics including chip area, chip power and packet processing rate. See references for more details.

Workshop on Network Processors and Applications

The Workshop on Network Processors was first held in February 2002. The goal was to bring together researchers and practitioners from the academic and industry communities to present their research on network processors and exchange information and ideas related to this developing area. The workshop was a great success with about eighty people attending. Subsequent workshops have been held in 2003 and 2004. The workshops are held in cooperation with the HPCA (High Performance Computer Architecture) symposium. They have resulted in a series of books, "Network Processor Design" Volumes 1, 2 and 3 published by Morgan-Kaufmann Pub. In addition to containing selected workshop papers, the books contain industry derived articles on current Network Processor products and usage techniques. Volume 3 also contains several tutorial articles in the area. The books have been edited by Mark Franklin, Patrick Crowley, Haldun Hadimioglu and Peter Onufryk.

Parallel & Distributed Computer Architectures.

Research is being pursued in the areas of computer architecture, parallel processing, performance evaluation, and VLSI design. Current research can be divided into the following topical areas: Architecture issues: interconnection network design, design of synchronization networks, design of networked computers, parallel computer design, checkpointing and recovery in networked computers. Basic hardware design questions: asynchronous versus clocked pipeline design, clock distribution, pin limitations. Applications issues: using parallel machines on computationally demanding problems such as medical image reconstruction, statistical optimization, simulation (discrete and continuous), simulated annealing, N-body problems. Investigations into issues of load balancing, task allocations and latency masking.

Interconnection Networks for Parallel Processors.

Optical technology has long been considered appealing for constructing high-speed interconnects in digital systems. However, although optical interconnects have significantly increased bandwidths, the complexity and cost of such systems coupled with the inability of processor interfaces to cope with high optical data rates usually negates any expected bandwidth advantages. Optical component designs have been driven largely by the needs of the long distance telecommunication industry and generally do not have high levels of integration and are ill suited to the board level distances, electronic VLSI design techniques, and input-output considerations associated with microprocessors. Our research program aims at taking advantage of new developments in optical technology and exploiting these developments in the design of a multiprocessor system. We are currently designing and implementing a message-passing, optically-interconnected multiprocessor. Our focus is on bandwidth limited applications which can effectively use our GEMINI inteconnection network. This dual optical and electrical network provides high bandwidth optical paths for long data messages and a lower congestion, low latency electrical paths for short data and synchronization messages. The design goal is to exploit high integration and low cost options becoming available through the use of integrated optical switches and polymer waveguides. Finally, a significant aspect of the research concerns matching memory and interconnect bandwidths through use of a ``scatter-gather'' engine connected to an interleaved memory system.

Timing Issues in Digital Design.

Issues associated timing synchronization have been attacked from two directions. First, the question of how to optimally distribute global clock signals has been studied. Second, we have examined how to maintain timing synchronization by employing asynchronous, delay insensitive design techniques. Current research focuses on comparing the asynchronous and clocked design methodologies when used in the control of interconnection networks and RISC instruction pipelines.

Applications Issues.

The following basic applications oriented parallel processing issues are being investigated: load balancing and task allocation, process-level time synchronization, latency hiding, and reliability and checkpointing. The load balancing problem has been examined in the context of solving large, continuous systems, in terms of the parallel logic and discrete-event simulation problem, and in terms of the N-body simulation problem.

Synchronization times have been experimentally measured for a number of applications implemented on parallel processors and networked computers. Results show that these times can dominate overall execution times. Research is being pursued on how to reduce these delays by developing special purpose synchronization hardware.

Latency delays are a major performance problem in distributed computing. For a class of applications, (synchronous iterative algorithms) we have examined approaches to hiding latency through use of speculative computation techniques.