S. Roy

Blog Post·2024-06-20·9 min read

CUDA Fundamentals for ML Engineers

CUDA exposes GPU parallelism through a three-level thread hierarchy: grid, block, and warp. Understanding how these map to hardware — SMs, register files, shared memory — is the prerequisite for writing fast kernels.

gpu cuda kernels inference optimization

Tag: cuda

CUDA Fundamentals for ML Engineers