Inside: ARC-AGI-3, gold-medal reasoning, RAG pipeline walkthrough, and the future of AI hardware.AI_Distilled #104: What’s New in AI This WeekBecome an AI Generalist that makes $100K (in 16 hours)Join the World’s First 16-Hour LIVE AI Upskilling Sprint for professionals, founders, consultants & business owners like you.REGISTER NOWDate: Saturday and Sunday, 10 AM - 7 PM.All by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. 🤯 🚀$5100+ worth of AI tools across 2 days — Day 1: 3000+ Prompt Bible, Day 2: Roadmap to make $10K/month with AI, additional bonus:: Your Personal AI Toolkit Builder.Hi there,Welcome to the last issue of July 2025. From AI models clinching gold at the International Math Olympiad to the launch of ARC-AGI-3, a bold new benchmark for interactive reasoning, we’re witnessing a shift in how machines think. This issue also dives into an overview of how to build a RAG pipeline.Excited?Let’s get started!LLM Expert Insights,PacktIn today's issue:🧠 Expert Deep Dive: A hands-on guide to building reliable RAG pipelines and reducing LLM hallucinations with retrieval-augmented techniques.📅 AI Hardware Spotlight: Upcoming must-attend events—AI425, AI Infra Summit, The AI Conference 2025, and PyTorch Conference—focusing on GPUs, TPUs, and advanced AI infrastructure.🚀 DeepSeek in Production: Learn how experts are fine-tuning DeepSeek for real-world agentic workflows—50% off summit seats now.🏆 ARC Prize Challenge: ARC-AGI-3 launches as a new interactive benchmark where frontier AI scores 0% versus humans' 100%.🇺🇸 America’s AI Action Plan: Trump administration releases a deregulatory AI roadmap to accelerate innovation and global competitiveness.🥇 Reasoning Models at IMO: Google’s Gemini Deep Think and OpenAI’s new LLM reach gold-medal performance on the International Math Olympiad benchmark.⚠️ Altman’s AI Fraud Warning: OpenAI CEO cautions on voice-cloning scams, job losses, and calls for digital proof of personhood.🧑💻 Moonshot AI’s Kimi K2: A 1T-parameter open-source agentic model sets new benchmarks for coding and reasoning tasks.🖐 Meta’s AR Innovation: Reality Labs unveils a wrist-based EMG interface for next-gen AR glasses “Orion,” enabling intuitive gesture controls.⚡ OpenAI & Oracle Stargate: A 4.5 GW data center expansion pushes OpenAI’s infrastructure beyond 5 GW, aiming to power 2+ million AI chips.📈UPCOMING EVENTSWhen we talk about LLMs, a discussion on AI hardware is a natural consequence. To begin with, we have a curated list of upcoming conferences, meetups, and summits from August to October 2025 that focus on AI hardware, including GPUs, TPUs, and related infrastructure.AI425Date: August 11–13, 2025Location: Las Vegas, NV, USACost: TBAFocus: AI deployment, hardware accelerationAI Infra SummitDate: September 9–11, 2025Location: Santa Clara, CA, USACost: TBAFocus: AI systems, ML frameworks, infra toolsThe AI Conference 2025Date: September 18–19, 2025Location: San Francisco, CA, USACost: TBAFocus: AI at the Edge, AI in Retail, AI in Healthcare and BioTechPyTorch ConferenceDate: October 22–23, 2025Location: San Francisco, CA, USACost: TBAFocus: AI Infrastructure, AI AgentsUpskilling with MCP and A2A protocols is your gateway to building AI agents. Don’t miss the chance to explore these events and get ahead.DeepSeek is fast becoming the open-source LLM of choice for developers and engineers focused on speed, efficiency, and control.Join "DeepSeek in Production" summit to see how Experts are fine-tuning DeepSeek for real-world use cases, building agentic workflows, and deploying at scale.Seats are filling fast. Limited slots left. Book now at 50% off.SECURE YOUR SPOT NOW!Apply codeDEEPSEEK50at checkout to avail your 50% off.EXPERT INSIGHTSGetting started with a RAG PipelineLLMs have revolutionized how machines understand and generate text. However, these models are prone to hallucinations. Hallucinations, as you may already know, are instances where outputs, while grammatically correct, lack factual accuracy.This is where Retrieval-Augmented Generation becomes essential. RAG enables LLM responses to be grounded in facts by incorporating real-time external data, rather than relying solely on patterns learned during the training process.Deconstructing the RAG PipelineThe RAG architecture consists of two core components: a retriever and a generator. This dual-stage approach separates the task of fetching relevant information from the task of generating text, enabling more accurate and reliable responses.The RAG pipeline for a user-LLM chat interactionThe complete RAG pipeline can be understood through the following process:User prompt: The pipeline begins when a user submits a query.Retriever module: This component searches for a knowledge base to locate relevant documents. These could be structured (like tables or graphs) or unstructured (like articles or text passages).Relevant document retrieval: Using methods such as Dense Passage Retrieval (DPR) or BM25, the retriever finds top-matching documents based on vector similarity or keyword frequency.Fusion of context and query: Retrieved documents are combined with the original query to form a unified context.Encoder-decoder generation: The generator model (e.g., T5) processes this context to produce a well-informed response.Output delivery: The final, context-enriched response is sent back to the user.Unlike traditional LLM workflows, RAG integrates a database search step before invoking the model, thereby enriching the prompt with relevant results or documents retrieved externally.Building the pipeline: Fundamental ComponentsHere are the building blocks of a RAG pipeline.1. The RAG pipeline implementation starts with document encoding. Documents are embedded using a context encoder (such as a transformer-based model). These embeddings are stored and later compared against a query embedding using cosine similarity.# Retrieve documentssimilarity_scores = cosine_similarity(query_embedding, document_embeddings).flatten()top_indices = similarity_scores.argsort()[-num_results:][::-1]top_docs = [(documents[i], similarity_scores[i]) for i in top_indices]2. The generator then synthesizes a response based on these top documents:# Generate responseinput_text = f"Answer this question based on the provided context: {query} Context: {retrieved_passages}"inputs = tokenizer(input_text, return_tensors='pt', padding=True, truncation=True)outputs = model.generate(**inputs, max_length=300, num_beams=3, early_stopping=True)response = tokenizer.decode(outputs[0], skip_special_tokens=True)3. Finally, all components are integrated into a unified pipeline:def rag_pipeline(query): retrieved_docs = retrieve_documents(query) response = generate_response(query, retrieved_docs) return responseThis modular structure not only improves response accuracy but also allows flexible scaling, allowing new data sources or retrieval techniques to be plugged in without altering the core logic of the generator.This tutorial helped you with an overview of the core architecture and flow of a RAG system.Liked the Insights? Want to dig in deeper?Building Neo4j-Powered Applications with LLMsTo explore RAG integration with LLMs in greater depth, along with related techniques such as passage-level retrieval, semantic search, and integration with Neo4j knowledge graphs, we encourage you to delve into the Packt book, Building Neo4j-powered Applications with LLMs by Ravindranatha Anthapu and Siddhant Agarwal.BUY NOW📈LATEST DEVELOPMENTHere is the news of the week:ARC Prize launches ARC-AGI-3 contestARC Prize has introduced ARCAGI3, its first interactive reasoning benchmark where AI agents must learn entirely through trial and error in game-like environments. Well, for a start, the scores are 100% (Humans) and 0% (Frontier AI).America’s AI Action PlanThe AI Action Plan, released by the Trump administration, outlines a deregulatory approach to artificial intelligence. It promotes rapid AI innovation by cutting Biden-era restrictions, encouraging infrastructure growth (data centers, chips), and boosting international AI trade. The plan enforces ideological neutrality in federal AI use, links funding to state AI policies, and positions the U.S. to compete assertively against China in global AI leadership. Read the action plan here.Reasoning models compete for the International Math Olympiad (IMO) to achieve gold-medal-level performanceGoogle DeepMind confirmed that its Gemini Deep Think model officially earned gold by scoring 35/42 on the 4.5-hour IMO set, having been graded and certified by the IMO organizers. Learn more about this model here.Meanwhile, Alexander Wei, an employee of OpenAI announced OpenAI’s latest experimental reasoning LLM solved five of six IMO problems, achieving gold medal–level performance, with multiagent, parallel natural language proofs. Find more details in Wei’s posts.Sam Altman Warns of AI Fraud, Job Losses at Federal Reserve TalkAt a Federal Reserve event, OpenAI CEO Sam Altman warned that AI-driven voice cloning could trigger a fraud crisis, urging banks to abandon voice-based authentication. He predicted some jobs, like customer support, may vanish entirely, stressing the need for retraining and better policy. Altman criticized youth overreliance on ChatGPT for life decisions and emphasized the urgency of “proof of personhood” to verify identity online. Despite AI’s growing power, he said he wouldn’t trust it fully for medical or critical decisions. Watch this interview here.Moonshot AI Launches Kimi K2, a Powerful Open-Source Agentic ModelMoonshot AI has released Kimi K2, a 1T-param (32B active) Mixture-of-Experts model excelling in coding and agentic tasks. It leads to open models on SWE Bench, Tau2, and AceBench. While multimodal and thought mode aren't yet supported, Kimi K2 is now accessible via API and Hugging Face for broader developer use. Go check out the GitHub repo.Meta’s Reality Labs Presents Wrist-Based EMG Interface for NextGen AR Glasses “Orion”Meta’s Reality Labs has published research in Nature demonstrating a wrist-worn surface electromyography (sEMG) interface aimed at controlling its prototype AR glasses, Orion. The system translates subtle hand muscle activity into digital gestures, enabling intuitive, controller-free interaction. It operates non-invasively, generalizes across users, and supports high-bandwidth control for virtual and augmented reality environments. Learn more here.OpenAI and Oracle Expand Stargate Data Center Capacity by 4.5 GW, Surpassing 5 GW MilestoneOpenAI has announced a strategic expansion of its Stargate AI infrastructure, partnering with Oracle to develop an additional 4.5 gigawatts of data center capacity in the United States. Combined with the existing Stargate I site in Abilene, Texas, this brings the total to over 5 GW, capable of supporting more than 2 million AI chips. The initiative advances OpenAI’s earlier $500 billion, four-year plan to deploy 10 GW of AI infrastructure nationwide, supporting innovation, boosting U.S. AI leadership, and is expected to create 100,000+ jobs in construction and operations. The Abilene facility is partially operational, already running early workloads using Nvidia GB200 racks, and Oracle has begun delivering hardware. Find out more here.Built something cool? Tell us.Whether it's a scrappy prototype or a production-grade agent, we want to hear how you're putting generative AI to work. Drop us your story at nimishad@packtpub.com or reply to this email, and you could get featured in an upcoming issue of AI_Distilled.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!That’s a wrap for this week’s edition of AI_Distilled 🧠⚙️We would love to know what you thought—your feedback helps us keep leveling up.👉 Drop your rating hereThanks for reading,The AI_Distilled Team(Curated by humans. Powered by curiosity.)*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more