Developing a PC Cluster using Commoditized Network Technology Oriented towards Database Applications *

Masato Oguchi, Takahiko Shintani, Takayuki Tamura, and Masaru Kitsuregawa
Institute of Industrial Science, The University of Tokyo
oguchi@tkl.iis.u-tokyo.ac.jp

Massively parallel computer systems are recently moving away from proprietary components such as microprocessor, memory, and disks to commodity parts. In order to reduce cost, computer manufacturer need to employ off-the-shelf commodity parts as much as possible. In addition to such components, the PC (Personal Computer) itself is becoming a commodity, thus PC cluster has come to be developed. These days, only the interconnection network has not yet been commoditized. Commercial massively parallel computers so far employ dedicated proprietary networks which have much higher bandwidth and shorter latency compared with ordinary LAN technologies.

However, with networks such as ATM becoming standards for high speed communications, future parallel systems will undoubtedly move towards commodity networks as well. Already ATM switches and NIC (Network Interface Cards) are becoming cheaper and cheaper, increasing their cost performance ratio as a result. While other high performance network standards are also accepted, ATM networks are widely used from local area to widely distributed environments. This seamless structure, scalability, and its quality control mechanisms are among the merits of ATM technology, compared with other high speed networks. Although it has been sometimes said that ATM may not be suitable for pure data transmission purposes and/or does not fit with traditional computer communication protocols such as TCP/IP, recent dramatical computer and NIC technology improvements are expected to solve these problems.

From the viewpoint of applications, we believe that data intensive applications such as ad-hoc query processing and data mining are very important for massively parallel processors, in addition to the conventional scientific applications. At present, according to the number of installation sites, massively parallel processors are becoming more popular in business applications than in scientific research. Thus investigating the feasibility of implementing database applications over an ATM connected PC cluster is very meaningful.

We have constructed a very large scale PC cluster, consisting of 100 PCs, connected through an ATM network. Commercial 200MHz Pentium Pro PCs are used for the nodes of the cluster. HITACHI's AN1000 is used as the ATM switch. Since this switch has more than 100 ports, 155Mbps UTP-5, all nodes can be connected directly with each other, hence no need for cascade configuration. On this PC cluster, we have found that over 110Mbps throughput achievable in the case of point-to-point communication, even with the so-called ``heavy'' TCP/IP protocol.

We have also measured latency of point-to-point communication on this PC cluster. The resulting round trip average latency was a little less than 500 musec, which is still longer than massively parallel processors' proprietary networks. Although latency sensitive applications such as scientific calculations may suffer from the higher latency of communication amongst the processors, we consider that a larger range of applications including database applications would benefit from its higher throughput and the improved price performance ratio.

Some parallel database applications have been implemented on the cluster. Data mining program, for example, was parallelized using the HPA(Hash Partitioned Apriori) algorithm, and preliminarily implemented on the 100 node PC cluster. Each node of the cluster has a transaction data file on its own hard disk. The size of the transaction data is about 80Mbytes in total. The message block size is 16Kbyte and the disk I/O block size is 64Kbytes in our experiments, which seems to be most suitable value for the system. Although only preliminary evaluation has been executed so far, we achieved reasonably good performance improvement for the parallel data mining program, using up to 100 PCs.

* This project is supported by NEDO (New Energy and Industrial Technology Development Organization in Japan).


[Up] [Back] [Forward]
[TCGN] [ComSoc] [IEEE]
Last updated 6 March 1997
James P.G. Sterbenz <jpgs@ieee.org>