Flash Attention - I
This chapter assumes that you know about attention mechanism. If not please see this video, which provides a lot of info about how to model, train a GPT-2 from ground up, Andrej Karpathy video. This chapter compromises of:
- CuBLAS.
- Some Functions And Classes To Know.
- Finding Maximum And Sum Of An Array.
CuBLAS
The official documentation of Cuda is very detailed and well explained. I would request everyone to go through it. I believe it is self sufficent.
To read more click the link below. Since writing math using plain html is time consuming, I use jupyter-book for wrtiing my blogs. (To read comfortably please toggle the side bar)
https://yogheswaran-a.github.io/cuda-notes/03-flash-attention-I.html
More Cuda blogs:
https://yogheswaran-a.github.io/cuda-notes/00-landing-page.html