Now that we understand what an HPC cluster is, let us go into some details of its architecture: Most of the compute nodes in our cluster are dual-socket servers. That means that each server has two multi-core CPU packages as below:
Each such package contains several "cores", which are also referred to as CPUs. In our cluster there are 6 to 12 CPUs per package, so up to 24 per node. Note the frequent use of the word CPU for both the packages and the cores. We will use CPU to denote CPU core, i.e. one CPU and one core are the same. Here a real image of an 18 core CPU package:
A typical program runs on a CPU and processes some data. The data is located in Random Access Memory (RAM) or simply "memory".
When a user submits a job, this is interpreted as a request for computational resources. The minimal "chunk" of resources has to contain at least one CPU (core) and several megabytes of memory. You may ask for several identical chunks. However, if you request for a resource is physically impossible to fulfill, your job will never start. Other resource details besides CPUs and memory are described in the following section...