πŸ—οΈ 🚒 πŸš€ The DeepSeek Inference System


​

Hey, AIM community!

​

On Wednesday, we'll cover the infra stack that we recommend for RAG in 2025. Then, we'll build, ship, and share a best-practice RAG app.

We'll also discuss important production tradeoffs and implications that you should consider before and after deployment when going from zero to production RAG!

​


Last week, we discussed the latest open-source repo drops from DeepSeek Week, and we covered how they're being used as a new best-practice way to do inference on MoE models via Hopper (e.g., H100, H200, etc.) GPUs! vLLM has already implemented much of the inference system, as we saw during the event! Learn how cross-node Expert Parallelism drove the solution to the problem the DeepSeek team sought to solve of higher throughput and lower latency, resulting in decreased cost and high theoretical income.

🧰 Resources


πŸ”­ Coming Up!

​

Enterprise Agents with OpenAI

What does the agents SDK look like from OpenAI? How does it build on previous work they've done? Are they officially in the end-to-end platform game competing with orchestration frameworks like LangChain, LlamaIndex, CrewAI, and others? Join us live to find out!

Model Context Protocol

What is MCP, exactly? What can we learn about looking at the protocol as defined? What can we learn from tracing its impact since rits elease over the past 5 months? Released by Anthropic in November 2024, MCP has caught on as a new gold standard for the way that context is shared between models. Let's learn it, then build, ship, and share with it!


🌐 Around the Community!

πŸ’‘ Transformation Spotlight: Tyler Laughlin, an Information Systems Engineer at Adobe. He has some great advice for enterprise-level engineers on how they should be using Gen AI. Read more about his story!​

video preview​

​

πŸ€“ See what the community is πŸ“Ή building, shipping, and sharing this week. Join us in the Lounge every Monday at 9 AM PT for some accountability!

​

Want to join the AIM community? Hop into Discord and share your intro!


​

πŸ–ΌοΈ Meme of the Week


🌟 Want to start building, shipping, and sharing? Check out LLM Foundations - an email-based course or our newly open-sourced full LLM Engineering, Cohort 3 course!

​

Keep building πŸ—οΈ shipping 🚒 and sharing πŸš€,

​

​Dr. Greg, The Wiz, Seraacha, and Lusk​
​AI Makerspace​

​
​Unsubscribe Β· Preferences​

The LLM Edge

Read more from The LLM Edge
RAG: The 2025 Best Practice Stack

Hey, AIM community! Tomorrow, we'll cover Enterprise Agents with OpenAI! What does the agents SDK look like from OpenAI? How does it build on previous work they've done? Are they officially in the end-to-end platform game competing with orchestration frameworks like LangChain, LlamaIndex, CrewAI, and others? Join us live to find out! Last week, we discussed RAG: The 2025 Best-Practice Stack This is the year of Practical RAG, and we kicked it off by unpacking the Minimum Viable...

Optimization of LLMs

Hey, AIM community! Next Wednesday, we begin a new series on Optimization of LLMs! We'll tackle an important topic from first principles: building and optimizing LLMs before they make it to production. What are the essential concepts and code that underlie the technology, from loss functions and gradient descent to LSTMs, RLHF, and GRPO? Join us to kick off our new series - which we will continue monthly - about Optimization of LLMs. Last week, we put PydanticAI to the test! πŸš€ The team behind...

Cursor: An AI Engineer’s Guide to Vibe Coding and Beyond

Hey, AIM community! Next Wednesday, we cover a new agent orchestration framework: PydanticAI. The team "built PydanticAI to bring that FastAPI feeling to Gen AI app development" because everything else out there wasn't good enough. Join Dr. Greg and The Wiz to help us assess whether or not they're accomplishing their mission as we learn to build, ship, and share a multi-agent application. Last week, we explored Cursor An AI Engineer’s Guide to Vibe Coding and Beyond. In 2025, top engineers in...