Triton and CUDA Compatibility

Triton

Triton is a deep learning framework and a compiler developed by OpenAI. Its primary goal is to enable the development of high-performance machine learning models, particularly for GPU (Graphics Processing Unit) acceleration, without having to write complex CUDA code.

Triton allows developers to write custom kernels (small, specialized programs that run on the GPU) in a high-level, Python-like syntax, making it easier to work with GPU hardware without needing deep knowledge of the low-level CUDA programming. Some key features of Triton include:

Optimized Kernels: It can automatically optimize custom kernels to efficiently use GPU resources, offering significant speedups.
Custom Operators: Triton enables you to define your own machine learning operators (functions or layers) that are more efficient than standard ones.
Memory and Thread Management: It simplifies the complex process of memory management and thread synchronization that is typically required for efficient GPU computation.

Essentially, Triton focuses on custom high-performance GPU programming for machine learning tasks, where optimizing specific operations can result in substantial performance improvements.

Triton requires features that are found only in newer GPUs, specifically those built after the Pascal architecture (post-2016).

Several nodes in our cluster currently have older GPUs with a Compute Capability of 6.1 or lower, which users reported is causing the compiler to fail on those systems.

CUDA capability of GPUs on the cluster where “Compute Capability” shows the Cuda versions it supports.

NODE(s)               | GPU Card                   | Version 
mind-1-1, 3, 5        | NVIDIA GeForce GTX TITAN X | 5.2
mind-1-7, 9, 11, 19   | NVIDIA TITAN X (Pascal)    | 6.1
mind-1-13             | NVIDIA GeForce GTX 1080 Ti | 6.1
mind-0-18, 20, 22, 24 | NVIDIA GeForce RTX 2080 Ti | 7.5
mind-1-24             | NVIDIA TITAN RTX           | 7.5
mind-0-26             | NVIDIA GeForce RTX 3090    | 8.6
mind-0-28             | NVIDIA RTX A5000           | 8.6
mind-1-15             | NVIDIA L40S                | 8.9

You should target nodes for jobs that require Triton will run without issues on nodes equipped with GPUs that have a Compute Capability of 7.5 or higher.

Compatible Nodes with CUDA Compute Capability 7.5 or newer:

mind-0-20 (RTX 2080 Ti, CUDA 7.5)
mind-0-24 (RTX 2080 Ti, CUDA 7.5)
mind-0-18 (RTX 2080 Ti, CUDA 7.5)
mind-0-22 (RTX 2080 Ti, CUDA 7.5)
mind-1-24 (TITAN RTX, CUDA 7.5)
mind-0-26 (RTX 3090, CUDA 8.6)
mind-0-28 (RTX A5000, CUDA 8.6)
mind-1-15 (L40S, CUDA 8.9)

In your SLURM request for nodes use the nodelist option to target the nodes that support Triton:

--nodelist=mind-0-20,mind-0-24,mind-0-18,mind-0-22,mind-1-24,mind-0-26,mind-0-28,mind-1-15

Updated on July 9, 2025

Was this article helpful?

Yes No

Need Help?

Can't find the answer you're looking for?

Contact NI Support

Triton

CUDA capability of GPUs on the cluster where “Compute Capability” shows the Cuda versions it supports.

Compatible Nodes with CUDA Compute Capability 7.5 or newer:

Related Articles