Talk

Code Reordering for Compute-bound Tasks Using CUDA as an Example: Limitations and Workarounds

  • In Russian

Optimizing code performance is the main task of a GPU programmer. In this report we will consider the use of instruction reordering methods to speed up compute-bound tasks on CUDA.

We will consider methods and algorithms of instruction reordering in compilers, try to do it ourselves in assembler, and also try to get the compiler to achieve better reordering without assembler insertions.

We will need a quality benchmark for such manipulations, so we will also touch on the topic of increasing benchmark accuracy.

The goal of the talk is to discuss how to squeeze some more performance out of a device when all algorithmic and “standard” optimizations have been applied and various tricks are being used.

Speakers

Talks