Abhilash Majumder
Company: Intel
LLMs and generative models have become the mainstream deep learning architectures for industries globally and with customized optimizations there is a lot of developments among deep learning compilers. However, the majority of the frameworks supporting exascale model training/finetuning, such as PyTorch or Jax, has extensive device specific compiler runtime codes which are performant on a single specific hardware type. To democratize deep learning models and benchmark them across different runtime devices, there is a need to support a device-agnostic compiler backend which can be run on Nvidia/AMD or Intel (other ISAs of x86 CPU or LLVM/clang supported GPU). This talk focuses on how to create such backends using SYCL (originally from Khronos) and induce platform specific optimizations.
The talk would mainly focus on 3 primary agendas:
Since the expanse of unified compiler runtime will increase owing to standardization followed by several communities (such as Triton backend), this would enable engineers to write custom code without having to worry about IR translations across devices. This would also imply standard practises of C++ being introduces as a brush-up, as the entire framework is built on top of it.
Company: Intel