Flash Attention - I

This chapter assumes that you know about attention mechanism. If not please see this video, which provides a lot of info about how to model, train a GPT-2 from ground up, Andrej Karpathy video. This chapter compromises of:

  1. CuBLAS.
  2. Some Functions And Classes To Know.
  3. Finding Maximum And Sum Of An Array.

CuBLAS

The official documentation of Cuda is very detailed and well explained. I would request everyone to go through it. I believe it is self sufficent.

To read more click the link below. Since writing math using plain html is time consuming, I use jupyter-book for wrtiing my blogs. (To read comfortably please toggle the side bar)

https://yogheswaran-a.github.io/cuda-notes/03-flash-attention-I.html

More Cuda blogs:
https://yogheswaran-a.github.io/cuda-notes/00-landing-page.html