πŸ—πŸš’πŸš€ Agent Evaluation


​

Hey, AIM community!

​

Next Wednesday, join us in learning about Multimodality​ with Llama 3.2. Llama 3.2 from Meta adds vision to our LLM application stack. What does this mean for AI Engineers and leaders?

We have questions:

  • How does multimodality actually work?
  • What are its limits today and what do we expect in the coming year?
  • When should we leverage multimodal models when building, shipping, and sharing?
  • Is Llama 3.2 ready for production? If so, what use cases?

​

​Join us live to find out!

​


Last week, we dove into the Agent Evaluation, uncovering best practices for assessing workflows like Topic Adherence, Tool Call Accuracy, and Agent Goal Accuracy. πŸ“Š

⚠️ Spoiler alert! It's not ready for prime time yet, and RAGAS is still developing synthetic test set generation tools. However, understanding how you'll likely combine agent-specific (e.g., tool-calling) evaluation tools based on LLM tracing with standard LLM and RAG application evals.

That said, very simple agents can be evaluated. Check out what we know!

🧰 Resources


πŸ”­ Coming Up!

smolagents: Small Agents?

Join us to build, ship, and share an agentic application or two that can make a big impact with a small number of lines of code! We'll talk about agency levels, code agents, and framework comparisons. See you there!

COCONUT: Chain of Continuous Thought

We continue our discussion of Large Reasoning models with a deep dive into continuous chains of thought! The official repo was just released, so join us to learn about the tech and give it a test drive!


🌐 Around the Community!

πŸ’‘ Transformation Spotlight: Xico Casillas! Follow his journey from conversational interface designer to leading his team's LLM and RAG app development. Read more!​

video preview​

πŸ€“ See what the community is building, shipping, and sharing this week. Join us in the Lounge every Monday at 9 AM PT for some accountability!

​

Want to join the AIM community? Hop into Discord and share your intro!


​

πŸ–ΌοΈ Meme of the Week


🌟 Want to start building, shipping, and sharing but not sure how? Check out our LLM Foundations - a 5-day email-based course to start learning and :bss:'ing today.

​

Keep building πŸ—οΈ shipping 🚒 and sharing πŸš€,

​

​Dr. Greg, The Wiz, Seraacha, and Lusk​
​AI Makerspace​

​
​Unsubscribe Β· Preferences​