🏗🚢🚀 Next-Level RLHF

Hey, AIM community! Here's a quick recap of week 44 at AI Makerspace.

TL;DR

⚛️ Learn about 👩‍⚖️👨‍⚖️ Mixture of Judges: Next-Level RLHF
📚 Learning, building, shipping, and sharing with the AIM Community!
- 💡 Transformation spotlight: Pano Evangeliou
- 🏫 1-minute lesson: What is the “Golden Chunk” in RAG?
- 🤓 See what folks are building, shipping, and sharing this week
- 📚 LLM Engineering detailed schedule. Take the LLME challenge!
⏭️ Join us live next week!
- RSVP here: Inference & GPU Optimization: VPTQ

🏫 Weekly Concepts and Code!

👩‍⚖️👨🏻‍⚖️ Mixture of Judges: Next-Level RLHF

We explored Meta's "Perfect Blend" paper, which "redefines RLHF" through the innovative Mixture-of-Judges (MoJ) approach. Participants learned how the new Constrained Generative Policy Optimization (CGPO) technique attempts to improve on classic Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), helping to tackle reward hacking and optimize multi-task objectives. We broke down key concepts, from 'OG RLHF, RLAIF, and DPO, while considering new "Mixture of" approaches like Mixture of Experts (MoE) and Mixture of Agents (MoA) to try and understand what's going on that's new and novel. The code is still in the works so it wasn't yet available to fully demo, but we showed the current status via PRs and discussed the host of brand-new concepts and acronyms!

🧰 Resources

🧑‍🏫 Concepts: Slides
🧑‍💻 Pull Requests: Calibrated reward, Mixture of judges, and CGPO Trainer (single task single objective).
📜 Paper: The Perfect Blend: Redefining RLHF with Mixture of Judges

🌐 Around the Community!

💡 Transformation Spotlight

Pano Evangeliou, a Sr. AI Engineer and AI Tech Lead, went from working in computational mechanics to AI Engineering. Learn how he developed the portfolio and skills he needed, from The Netherlands, without a software engineering background! He has some advice if you're looking to do the same!

🤩 The AI Engineering Bootcamp, Cohort 4 Demo Day!

Check out the Demo Day projects from students in Cohort 4 of The AI Engineering Bootcamp! Presented to a live audience on October 24, 2024!
S/o to folks repping their projects this week including Richard and Gilberto from CareDash, Jimmy and Yinong from LitPilot, and ofc big h/t to those repping their certifications - well-earned Xico and Richard!

🌍 Check out what the AIM community is building, shipping, and sharing!

Congrats to Nitin for leveraging the community to find a Founding Engineer to bring into his new venture. For a product sneak peek, check out his Demo Day project Sales Buddy (code NOT available 😉)!
H/t to Akash and Joe over at Publicus for shipping Disaster Navigator. They started and launched this company and product during the Bootcamp!
Richard's review of his experience with The AI Engineering Bootcamp was dope. A great read if you're considering a similar journey in 2025! Thanks Richard!
@AI_by_AI added real-time avatars to NotebookLM podcasts. e.g, Vodcasts
We're pumped to see Ravi and his team in Texas in a few weeks where he'll be presenting on his work at Toyota.
@AiMan1993 launched Knowledge Nexus AI (KNAI), and @Sid's using agents to teach us math with animation
@Nicolay teaches us to build use-case optimal search, and how not to optimize.
S/o to Mark Evans who built, shipped, and shared his first AI application!!
Dr. Greg learned a lot moderating a panel on AI in healthcare, and he and The Wiz ran a workshop at ODSC West on agents with LangGraph (Concepts, Code (Agents), Code (Multi-Agents), no video - ODSC be like that!)

🥳 Upcoming Events!

We'll be live in Austin, TX, next week! Join the AIM team in person at MLOps World 2024!

Go zero to agentic hero and start building production-ready multi-agent systems in just 3 hours. 🔧 💻 Workshop deets here. Ticket discounts for AIM community members here.

🧑‍💻 Join us live on YouTube every Wednesday at 10 AM PT for more concepts and code!

Inference & GPU Optimization: VPTQ

In part 3 of our series on Inference & GPU Optimization, we cover even lower-bit quantization than the previous two methods, GPTQ and AWQ. Let's quantize with all the methods and see what happens.

RSVP

Teaching LLMs to Use Computers

Did you see that LLMs like Claude can use computers yet? We want to check out the capability for ourselves. We'll cover how this works under the hood and how, today, we're doing our best to evaluate these AI systems.

RSVP