Articles tagged ‘optimization’
-
Increasing Gromacs Throughput with CUDA MPS
2022-05-26
Worthwhile gains from running multiple instances per GPU -
Performance of CUDA-based Substitute BLAS
2022-01-11
Various measurements of NVBLAS and ESSL on V100s -
High Performance GEMM in Plain C
2022-01-03
Assembler or intrinsics may not be essential [updated] -
GCC Code Generation v. Manual Vectorization and Proprietary Compilers
2021-11-05
Doing better than manual vectorization and competing with proprietary compilers [updated]