A CPU Node is used for general-purpose computing tasks. It is suitable for non-GPU intensive workloads, such as business applications, database management, and web hosting. CPU nodes perform well when the tasks are sequential and don’t need the massive parallel processing capabilities of GPUs.
CPU: Intel 8358, 32 Cores, 2.6 GHz — A powerful multi-core processor designed for high-performance computing tasks.
GPU: NA — This node does not have a GPU because it's not designed for GPU-intensive tasks.
RAM: 256 GB — Sufficient memory for general computing workloads.
Ethernet: 4 x 1G — Basic network connectivity for general tasks.
InfiniBand: 1 x PX100G — Provides high-speed, low-latency network communication between servers in a data center.
HDD/SSD: 1 x 960 GB — Used for storage of system files, applications, and data.
2. GPU Node
A GPU Node is designed for GPU-accelerated tasks that require heavy parallel processing, such as AI/ML model training, data science, and deep learning. These nodes use powerful GPUs to accelerate the computation of complex tasks, significantly reducing processing time.
CPU: Intel 6336Y, 24 Cores, 2.4 GHz — Provides multi-core performance to support GPU-intensive workloads.
GPU: 2 x NVIDIA A100 80GB — Optimized for deep learning, machine learning, and HPC workloads.
RAM: 256 GB — Allows the system to efficiently handle large datasets and complex models.
Ethernet: 4 x 1G — Standard Ethernet connectivity for general network access.
InfiniBand: 1 x PX100G — High-speed network interface for fast data transfer across systems.
HDD/SSD: 1 x 960 GB — Storage for OS, applications, and general usage.
3. Master Node
The Master Node is the central management server in a distributed computing environment. It schedules and manages tasks across the cluster and ensures smooth system operation.
CPU: Intel 6336Y, 24 Cores, 2.4 GHz — Adequate processing power for coordination tasks.
GPU: NA — Not required for management operations.
RAM: 256 GB — Sufficient memory for job scheduling and resource management.
Ethernet: 4 x 1G — Network connectivity for coordination and job distribution.
InfiniBand: 1 x PX100G — Ensures high-speed communication with compute nodes.
HDD/SSD: 4 x 960 GB — Storage for configuration, logs, and cluster data.
4. Login Node
A Login Node provides users with access to the cluster. It handles job submission, authentication, and user interaction without affecting compute performance.
GPU: 8 x NVIDIA HGX A100 80GB — Ideal for large-scale deep learning and distributed training.
RAM: 1024 GB — Supports extremely large datasets and memory-heavy operations.
Ethernet: 4 x 1G — General connectivity.
InfiniBand: 2 x PX100G — Enables ultra-fast multi-GPU and multi-node communication.
HDD/SSD: 4 x 3.84 TB — High-capacity SSD storage for datasets and models.
6. NVIDIA A100 80GB GPU
The NVIDIA A100 80GB GPU is a high-performance accelerator designed for AI, deep learning, and HPC workloads. It includes Tensor Cores and 80GB HBM2e memory for massive model training.
Special Features: Tensor Cores, MIG (Multi-Instance GPU), 80GB HBM2e memory.
Use Case: Deep learning, HPC workloads, AI model training, scientific computing.
7. NVIDIA HGX A100 80GB GPU
The NVIDIA HGX A100 80GB is a multi-GPU platform designed for extreme-scale AI model training and HPC computation. It uses NVLink for ultra-fast inter-GPU communication.
Special Features: Multi-GPU integration, NVLink, optimized for distributed parallel computing.
Use Case: Large AI model training, deep learning clusters, scientific high-performance computing.