🏗🚢🚀 FA2: Flash Attention

👋 Hey, AIM community!

Next Wednesday, Dr. Greg & The Wiz 🪄 will explore the concepts and code behind On-Prem Agentic RAG!

Last Wednesday, they explored FA2: Next-Level Attention. They dug all the way down into the "shadow of the warp groups" on GPU hardware. It was epic. S/o to @Allan Tan with the awesome community recap.

🧰 Resources

🧑‍🏫 Concepts: Slides
🧑‍💻 Code: Flash Attention - AIM Event
📜 Papers: FA, FA2, FA3

🔭 Coming Up!

AG2: AutoGen, Evolved

December 4, 2024

The co-creators of AutoGen have officially launched AG2. New features just dropped this week that we'll check out, including SwarmAgent and CaptainAgent! We'll explore both live!

RSVP

vLLM: Virtual LLM

December 11, 2024

vLLM is for efficient inference AND serving. A great way to think about it is that while vLLM is building the racecar, FlashAttention enhances the engine and Quantization provides the light-weight, high-performance tires.

RSVP

🌐 Around the Community!

💡 Transformation Spotlight: Hear about how Tshwanelo took the opportunity to serve as a thought leader and "grab it with both hands" for a corporate investment bank after learning AI Engineering.