Server Specifications – Tejaswi HPC

Complete Server Specifications

SI No	Workload	Qty	Model	CPU	GPU	RAM (GB)	ETH	IB	HDD/SSD
1	CPU Node	102	HPE DL360 G10+	Intel 8358, 32C, 2.6GHz	NA	256	4×1G	1×PX100G	1×960GB
2	GPU Node	8	HPE DL360 G10+	Intel 6336Y, 24C, 2.4GHz	2×NVIDIA A100 80GB	256	4×1G	1×PX100G	1×960GB
3	Master Node	2	HPE DL360 G10+	Intel 6336Y, 24C, 2.4GHz	NA	256	4×1G	1×PX100G	4×960GB
4	Login Node	2	HPE DL360 G10+	Intel 6336Y, 24C, 2.4GHz	NA	256	4×1G	1×PX100G	4×960GB
5	AI/ML Node	1	HPE A6500 G10+	Intel 7543, 32C, 2.8GHz	8×NVIDIA HGX A100 80GB	1024	4×1G	2×PX100G	4×3.84TB

1. CPU Node

A CPU Node is used for general-purpose computing tasks. It is suitable for non-GPU intensive workloads, such as business applications, database management, and web hosting. CPU nodes perform well when the tasks are sequential and don’t need the massive parallel processing capabilities of GPUs.

CPU: Intel 8358, 32 Cores, 2.6 GHz — A powerful multi-core processor designed for high-performance computing tasks.
GPU: NA — This node does not have a GPU because it's not designed for GPU-intensive tasks.
RAM: 256 GB — Sufficient memory for general computing workloads.
Ethernet: 4 x 1G — Basic network connectivity for general tasks.
InfiniBand: 1 x PX100G — Provides high-speed, low-latency network communication between servers in a data center.
HDD/SSD: 1 x 960 GB — Used for storage of system files, applications, and data.

2. GPU Node

A GPU Node is designed for GPU-accelerated tasks that require heavy parallel processing, such as AI/ML model training, data science, and deep learning. These nodes use powerful GPUs to accelerate the computation of complex tasks, significantly reducing processing time.

CPU: Intel 6336Y, 24 Cores, 2.4 GHz — Provides multi-core performance to support GPU-intensive workloads.
GPU: 2 x NVIDIA A100 80GB — Optimized for deep learning, machine learning, and HPC workloads.
RAM: 256 GB — Allows the system to efficiently handle large datasets and complex models.
Ethernet: 4 x 1G — Standard Ethernet connectivity for general network access.
InfiniBand: 1 x PX100G — High-speed network interface for fast data transfer across systems.
HDD/SSD: 1 x 960 GB — Storage for OS, applications, and general usage.

3. Master Node

The Master Node is the central management server in a distributed computing environment. It schedules and manages tasks across the cluster and ensures smooth system operation.

CPU: Intel 6336Y, 24 Cores, 2.4 GHz — Adequate processing power for coordination tasks.
GPU: NA — Not required for management operations.
RAM: 256 GB — Sufficient memory for job scheduling and resource management.
Ethernet: 4 x 1G — Network connectivity for coordination and job distribution.
InfiniBand: 1 x PX100G — Ensures high-speed communication with compute nodes.
HDD/SSD: 4 x 960 GB — Storage for configuration, logs, and cluster data.

4. Login Node

A Login Node provides users with access to the cluster. It handles job submission, authentication, and user interaction without affecting compute performance.

CPU: Intel 6336Y, 24 Cores, 2.4 GHz — Handles multiple user sessions efficiently.
GPU: NA — Not required for login or scheduling operations.
RAM: 256 GB — Supports large numbers of interactive user sessions.
Ethernet: 4 x 1G — Standard connectivity for users.
InfiniBand: 1 x PX100G — Ensures low-latency communication with compute nodes.
HDD/SSD: 4 x 960 GB — Storage for user files, logs, and job script processing.

5. AI/ML Node

The AI/ML Node is designed specifically for large-scale AI and ML workloads requiring high GPU performance and very large memory resources.

CPU: Intel 7543, 32 Cores, 2.8 GHz — Supports preprocessing and large-data operations.
GPU: 8 x NVIDIA HGX A100 80GB — Ideal for large-scale deep learning and distributed training.
RAM: 1024 GB — Supports extremely large datasets and memory-heavy operations.
Ethernet: 4 x 1G — General connectivity.
InfiniBand: 2 x PX100G — Enables ultra-fast multi-GPU and multi-node communication.
HDD/SSD: 4 x 3.84 TB — High-capacity SSD storage for datasets and models.

6. NVIDIA A100 80GB GPU

The NVIDIA A100 80GB GPU is a high-performance accelerator designed for AI, deep learning, and HPC workloads. It includes Tensor Cores and 80GB HBM2e memory for massive model training.

Special Features: Tensor Cores, MIG (Multi-Instance GPU), 80GB HBM2e memory.
Use Case: Deep learning, HPC workloads, AI model training, scientific computing.

7. NVIDIA HGX A100 80GB GPU

The NVIDIA HGX A100 80GB is a multi-GPU platform designed for extreme-scale AI model training and HPC computation. It uses NVLink for ultra-fast inter-GPU communication.

Special Features: Multi-GPU integration, NVLink, optimized for distributed parallel computing.
Use Case: Large AI model training, deep learning clusters, scientific high-performance computing.

Back to HPC Home

Centre for Information Resource Management

Tejaswi HPC – Server Specifications