You are here

HPC FARM | USER GUIDE

What is Farm >>

Firstly, a short historical excursus: supercomputers or computers with high-level computational capacity compared to a general-purpose computer, emerged in the 1960's and by the end of 20th century massively parallel supercomputers become the norm. However, twenty years ago the real supercomputers were extremely expensive. Moreover, each vendor proposed his own architecture and by once choosing a vendor, the user was locked into a specific solution. A computer cluster or a farm is a cluster of identical, commodity-grade computers networked into a small local area network with installed libraries and programs, allowing the processing to be shared among them. The result is a high-performance parallel computing cluster comprising inexpensive personal computer hardware. Over the years this technology has developed and the cluster network is now so fast that it successfully competes with real supercomputers, such as Cray or IBM Blue Gene.

The fact that it is a collection of individual computers (compute nodes), the cluster allows for different types of workload. The first type is a real High Performance Computing (HPC), in which the code uses Message Passing Interface (MPI) libraries and runs over many compute nodes. MPI allows the user to utilize the number of CPUs and memory that otherwise unavailable. The second type is High Throughput Computing (HTC). The HTC code runs many times and the results are statistically analyzed. The large number of CPUs allows the user to perform the Monte-Carlo integration and similar problems faster. The third type is multithreading, in which the code can fully utilize one compute node. The advantage for the user is in the fact that present-time compute nodes usually have more CPUs that a typical PC.

There are two factors that convert a collection of compute nodes into a cluster. One is shared storage and the second is the workload manager. Shared storage allows any compute node to access the same files and the same programs. On the other hand, shared storage allows the user identical environment on any compute node in a cluster. The workload manager allocates specific parts of hardware or other resources to run a program. It ensures that every user will be allocated the requested resource without disturbing other users. The user request for the resources and to run the program, is identified by the workload manager as a “job”. Since the cluster is a public resource and it has usage policies, all jobs are organized in queues of different priority and with different amount of resources. The workload manager sorts and runs jobs from different queues.