Hey, AIM community!

Join Dr. Greg and The Wiz as they cover Agent Evaluation next Wednesday, January 22!

Have you seen these new agent evaluation metrics like topic adherence, tool call accuracy, and agent goal accuracy? They seem like something we should all know about in 2025! Join us live next week to break down when we should use them and how!

Last week, we dove into Large Reasoning Models (LRMs) (like OpenAI’s o1) designed to “think through step-by-step" before they answer. We had a wide-ranging discussion from Chain-of-Thought basics to what's been happening in the space (including the language space and latent space) of reasoning since last month! We had a blast, and look forward to digging deeper into process reward modeling and much more reasoning soon!

🧰 Resources

🧑‍🏫 Concepts: Slides
🧑‍💻 Code: Live Demo of o1 by The Wiz!
📜 Papers Discussed: CoT, OpenAI's Math Reasoning 1, 2, 3 --> o1, Test-Time Compute, RLHF on CoT, COCONUT, How to train your own o1

Do we really have to choose one or the other? Maybe not.

🔭 Coming Up!

Multimodality with Llama 3.2

Llama 3.2 from Meta adds vision to our tool stack in a way that finally feels compelling and useful. What are the limits of multimodal models today? How do they actually work? Are they ready for production use cases on complex data that includes text and figures? Let's find out!

RSVP

smolagents: Small Agents?

Explore Hugging Face’s smolagents, a sleek new library for building agents across multiple agency levels. Join us live to dive into its features and compare it to leading frameworks like LangChain, LlamaIndex, AG2, and others. Join us live to :bss: with smolagents!

RSVP

🌐 Around the Community!

💡 Transformation Spotlight: Debora Andrade, Learn how she went from a postdoctoral researcher, educated in Physics, to a Generative AI consultant and entrepreneur. Upskilling in software engineering and coding was critical to her journey!

🤓 See what the community is building, shipping, and sharing this week. Join us in the Lounge every Monday at 9 AM PT for some accountability!

@Raj breaks down ReAct from Scratch
@AI_by_AI doing real-time NBA play-by-play analysis with his avatars (not to mention the betting and wagering and voice-to-voice w/ Google Search work!)
@Angela benchmarked Anthropic's prompt caching
@MikeC, as always, is keeping us in the LLM loop for Week 4 of 2025 in AI
The AI Engineering Bootcamp Cohort 5 (AIE5) has kicked off! s/o to @BGibbons @Richard Moss @john-𒆜Sђ1v𒆜 @phillipjones @philip kang @Walid for building, shipping, and sharing this week!

Want to join the AIM community? Hop into Discord and share your intro!

🖼️ Meme of the Week

So like, how do we do reasoning with LLMs, exactly?

🌟 Want to start building, shipping, and sharing but not sure how? Check out our LLM Foundations - a 5-day email-based course to start learning and :bss:'ing today.

Keep building 🏗️ shipping 🚢 and sharing 🚀,

Dr. Greg, The Wiz, Seraacha, and Lusk
AI Makerspace

Unsubscribe · Preferences

🏗🚢🚀 Reasoning and Test-Time Compute

🔭 Coming Up!

Multimodality with Llama 3.2

smolagents: Small Agents?

🌐 Around the Community!

🖼️ Meme of the Week