THIS WEEK IN AI — Week of 23rd Feb 25
This Week in AI — Week of 23rd Feb 25 from Mastering LLM (Large Language Model) , you will learn:
- GPT‑4.5: OpenAI’s enhanced model with deeper world knowledge and emotional intelligence for smoother, more natural conversations.
- Mercury Diffusion Model: Inception Labs’ breakthrough that generates text 10x faster using parallel processing.
- SmolVLM2: Hugging Face’s compact video model enabling local video analysis on everyday devices.
- Claude 3.7 Sonnet & Claude Code: Anthropic’s hybrid reasoning AI offering instant responses and extended thinking for complex tasks.
- QwQ‑Max Preview: Alibaba’s reasoning-focused AI that reveals its thought process for transparency.
- Gemini Code Assist: Google’s free, high-performance coding assistant for individual developers.
Additionally, get insights on trending AI tools, open source projects, and must-read research papers.
Latest AI development
GPT-4.5 with emotional intelligence
- OpenAI has released GPT‑4.5 (Orion), its largest model to date, which leverages unsupervised learning to achieve deeper world knowledge and enhanced emotional intelligence.
- The new model offers a more natural conversational experience with a sharper understanding of human intent and reduced hallucinations, making it well-suited for professional tasks, creative work, and everyday queries.
- While not a major upgrade in math or science, GPT‑4.5 outperforms o3-mini and o1 on SWE‑Lancer, OpenAI’s new benchmark for freelance coding tasks.
- Currently available to Pro users and paid developers — with Plus and Team users gaining access next week — its API comes with a notably high pricing of $75/$150 per million tokens compared to previous models.
- Source: https://openai.com/index/introducing-gpt-4-5
Inception Labs’ ultra-fast diffusion model
- Inception Labs launched Mercury, a groundbreaking diffusion-based LLM that matches traditional model quality while generating text 10x faster (over 1,000 tokens/sec on H100 chips), revolutionizing speed and efficiency in AI.
- Mercury’s diffusion architecture generates text in parallel blocks instead of token-by-token, achieving 5–20x faster speeds than autoregressive models like GPT-4o Mini, with coding performance that rivals or outperforms top models.
- Founded by Stanford professor Stefano Ermon, the model adapts diffusion techniques (traditionally used for images/video) to text, challenging conventional AI language generation and enabling instant, cost-effective deployments.
- Early adopters in coding, customer support, and enterprise automation report seamless integration as drop-in replacements, cutting latency costs while maintaining quality — unlocking use cases like advanced agents and edge computing.
- Source: https://www.inceptionlabs.ai/news
World’s smallest video model to bring Video Understanding to Every Device
Hugging Face researchers just released SmolVLM2, the world’s smallest AI model family to understand and analyze videos on everyday devices like phones and laptops, without requiring powerful servers or cloud connections.
- The SmolVLM2 family includes versions as small as 256M parameters while still matching the capabilities of much larger systems.
- The team has built practical applications, including an iPhone app for local video analysis and an integration for natural language video navigation.
- The 2.2B parameter flagship model outperforms other similarly-sized models on key benchmarks while running on basic hardware.
- The models are available in multiple formats (including MLX for Apple devices) with both Python and Swift APIs, enabling immediate deployment; this innovation could drive a new wave of privacy-preserving video applications by running locally on everyday devices.
- Source: https://huggingface.co/blog/smolvlm2
Claude 3.7 Sonnet with “hybrid reasoning” and Claude Code
- Anthropic has unveiled Claude 3.7 Sonnet, the world’s first hybrid reasoning AI that toggles between instant responses and an extended thinking mode — displaying its reasoning process via a scratchpad.
- In extended mode, API users can precisely control the thinking duration — up to 128K tokens — balancing speed, cost, and answer quality for complex tasks.
- The model achieves state-of-the-art performance on real-world coding benchmarks, outclassing competitors, and is now paired with Claude Code, a command-line tool that edits files, reads code, and runs tests.
- This breakthrough marks a significant leap into the reasoning era for AI, promising enhanced developer workflows and paving the way for a new generation of intelligent, adaptable models.
- Video demo for Claude code — https://youtu.be/AJpK3YTTKZ4
- Source: https://www.anthropic.com/news/claude-3-7-sonnet
<think>…</think> QwQ-Max-Preview
- Alibaba’s Qwen team has unveiled QwQ-Max-Preview, a reasoning-focused AI that enhances Qwen Chat with visible thinking capabilities.
- Built on Qwen2.5-Max, the model is optimized for deep reasoning, excelling in mathematics, coding, and agentic tasks.
- The new “Thinking (QwQ)” feature allows users to see the AI’s internal reasoning process as it tackles complex problems.
- With plans to open-source both QwQ-Max and Qwen2.5-Max under the Apache 2.0 license — and to release smaller variants like QwQ-32B for local deployment — Qwen is setting a new standard for accessible AI reasoning.
- Source: https://qwenlm.github.io/blog/qwq-max-preview
Gemini Code Assist — now for free
- Google has launched a free version of Gemini Code Assist for individual developers, powered by a fine-tuned Gemini 2.0 model optimized for coding tasks.
- The tool offers up to 180,000 monthly code completions — vastly exceeding GitHub Copilot’s free tier limit of 2,000 completions.
- With a 128,000 token context window, Gemini Code Assist can process and understand significantly larger codebases than competing tools.
- Seamless integration with Visual Studio Code, GitHub, and JetBrains, accessible with a personal Google account, makes advanced AI-powered coding help more accessible to students, freelancers, and startups.
- Source: https://blog.google/technology/developers/gemini-code-assist-free
Prepare for your next AI role
Prepare for Large Language Model & GenAI interviews by learning real interview questions from FAANG and Fortune 500 companies.
Learn all the answers in a structured framework specifically designed & tested in companies like Google, Microsoft, Nvidia, Apple etc.
LLM Interview Prep Course (LLM50 to get 50% off): https://www.masteringllm.com/course/llm-interview-questions-and-answers
Trending AI Tools
- Perplexity deep research: Save you hours of time by conducting in-depth research and analysis on your behalf.
- Qwen 2.5 Max: A reasoning-focused AI that enhances Qwen Chat with visible thinking capabilities.
- Gemini Code Assist: Gemini Code Assist brings the power of Gemini 2.0 to your IDE at no cost.
Open Source AI Projects
- The Ultra-Scale Playbook: Training LLMs on GPU Clusters.
- GenAI Agents: Comprehensive Repository for Development and Implementation.
- Tencent’s new ‘fast-thinking’ model — Tencent just released Hunyuan Turbo S, a new ‘fast-thinking’ AI designed for instant responses rather than deep reasoning.
- SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
AI Must Read Papers
- SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution.
- DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks.
- SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
- S ∗ : Test Time Scaling for Code Generation
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
We truly value your input. Please share your thoughts in the comments to help us improve.