Start Here Player Home
All Shows
Models & Agents Planetterrian Daily Omni View Models & Agents for Beginners Fascinating Frontiers Modern Investing Techniques Tesla Shorts Time Environmental Intelligence Финансы Просто Привет, Русский!
Blogs
All Blog Posts Models & Agents Blog Planetterrian Daily Blog Omni View Blog Models & Agents for Beginners Blog Fascinating Frontiers Blog Modern Investing Techniques Blog Tesla Shorts Time Blog Environmental Intelligence Blog Финансы Просто Blog Привет, Русский! Blog
Models & Agents for Beginners Models & Agents for Beginners Blog

Meta just dropped a new AI that thinks in pictures AND teams up with helper agen... — Episode 18

Meta just dropped a new AI that thinks in pictures AND teams up with helper agents — and their app jumped to #5 on the App Store!

April 10, 2026 Ep 18 6 min read Listen to podcast View summaries

Meta just dropped a new AI that thinks in pictures AND teams up with helper agents — and their app jumped to #5 on the App Store!

What's Cool Today: Meta Superintelligence Labs released Muse Spark, a multimodal reasoning model that can understand images, videos, and text together while using “thought compression” and parallel AI agents to solve problems more efficiently. This matters because it shows how AI is getting better at creative and practical tasks that feel closer to how humans actually think. Today we’ll also look at a completely free way to use advanced AI models, why flexible monthly billing is winning in the fast-changing AI world, and how YouTube now lets you safely create AI versions of yourself for Shorts.

The Big Story

Meta Superintelligence Labs just unveiled Muse Spark, the first model in their new Muse family. It is described as a natively multimodal reasoning model, which simply means it was built from the start to understand and work with pictures, videos, text, and other types of information at the same time instead of handling them separately.

Think of it like having a friend who doesn’t just read your essay but can also look at your drawings, watch your TikTok, and then help you improve all of them together. The model also uses “thought compression” (a way to shrink down its own thinking steps so it stays efficient) and parallel agents (multiple AI helpers that can work on different parts of a problem at once, then combine their answers).

This is a big deal because most older AI systems were mainly good at text. A multimodal model that can reason across different formats opens the door to far more creative and useful applications. Students could upload a photo of their science experiment and get help explaining it. Artists could describe an idea in words and have the AI suggest improvements while looking at reference images. Game developers or YouTubers could get smarter assistance when mixing visuals and stories.

For you specifically, this points to a future where the AI tools you use every day on your phone or laptop will feel less like a smart search box and more like a creative teammate. The Meta AI app climbed from No. 57 to No. 5 on the App Store right after the launch, showing lots of people are already excited to try it.

You can try something right now: open the Meta AI app or go to meta.ai in your browser (no special account needed for basic use). Type a prompt that mixes words and an image — for example, upload a photo of your bedroom and ask it to “redesign this room in a cyberpunk style and explain your choices step by step.” See how it reasons across both the picture and your words. It’s a fun way to experience multimodal AI without any cost.

Source: marktechpost.com

Explain Like I'm 14

How “Multimodal” AI Actually Works

You know how when you’re texting, your phone’s autocomplete guesses the next word based on what you’ve already typed? That’s a super simple version of how language models work — they predict what comes next.

Now imagine instead of only seeing text, the AI gets trained on millions of pictures paired with their descriptions, videos with captions, and spoken words turned into text. It learns patterns that connect all these different “languages.” So when you show it a photo of a dog, it doesn’t just see pixels — it connects those pixels to all the words it has seen about dogs, sounds dogs make, and even stories about dogs.

The next clever step is that the model builds one shared understanding space inside its “mind.” Whether the input is a sentence, a drawing, or a short video clip, it turns everything into the same kind of internal signals. That lets it compare and combine ideas across formats. It’s like having one universal translator in your brain that can turn French, music notes, and emojis into the same language so they can talk to each other.

When Meta calls Muse Spark “natively multimodal,” they mean this mixing ability was designed in from the very beginning rather than added later. The model can create a visual chain of thought — basically showing its working by describing or generating images as it thinks — which makes its reasoning easier to follow.

So next time you hear “multimodal AI,” you can tell your friends it’s basically super-advanced autocomplete that learned to understand pictures, words, and sounds together instead of just guessing the next word. Not so mysterious once you see the everyday texting analogy, right?

Source: marktechpost.com

Cool Stuff & Try This

Free Monthly AI Messages Without Paying a Cent: asksary.com

A developer built a platform that gives every new account roughly 70 messages per month on a strong model (called GPT-5.2 in the post), unlimited chats on a faster lighter model (GPT-5 Nano), and 25 free image generations every month using their image model. Everything resets automatically each month with no credit card or subscription required. They even added free text-to-speech that has no limits.

This is exciting for students doing research, anyone who hits daily limits on other apps, or people who just want to experiment without worrying about cost. The creator says they’re covering the API fees themselves because usage has stayed reasonable so far.

Go to asksary.com right now and create a free account (you may need a parent’s help if you’re under 13). Once you’re in, try this fun challenge: ask the model to help you plan a TikTok video about your favourite school subject — have it write a script, suggest thumbnail ideas, and generate one image to test. Come back next month and you’ll get a fresh batch of messages and images automatically. The interface even lets you change themes and switch between 26 languages.

Source: reddit.com

YouTube’s New AI Avatars Let You Safely Put Yourself in Shorts

Google/YouTube just rolled out a feature that creates an AI-generated version of you (an avatar) from a quick “live selfie” video and voice sample. You can then type a prompt and the AI will generate short clips (up to about 8 seconds) that insert your digital self into new scenes. Every video gets clear labels, watermarks, and an AI disclosure so nobody mistakes it for a real deepfake.

It’s cool because it gives creators control over their own likeness instead of worrying about strangers making fake videos. For teens it’s a safe way to experiment with appearing in videos without filming everything perfectly.

Open the YouTube app or YouTube Create app, go to the AI Playground or My Avatar section, and record the short selfie video as instructed (hold phone at eye level, good lighting, quiet background). Once your avatar is ready, type a simple prompt like “me dancing at a futuristic party” and generate a Short. Try remixing one of your existing Shorts with the avatar too. Remember you must be over 18 or have parent permission, and you can delete the avatar anytime.

Source: engadget.com

Quick Bits

Smart Monthly Billing for Fast-Changing AI Tools

A Reddit user explained why they now avoid most annual SaaS plans and prefer true month-to-month billing with easy upgrades or downgrades. They highlight Vismore (an AI visibility tool) as one that gets it right with no-card trials and instant plan switching. The key question they now ask every company: “If I want to downgrade next month, what happens?”

This matters because AI tools improve or change direction so quickly that locking in for a year can waste money. Flexible billing lets students and hobbyists experiment safely.

Source: reddit.com

Farmers Are Getting Their Own AI Chatbot

News reports that farmers now have access to CropGPT, a specialized AI chatbot designed to answer agriculture questions. It’s a great example of AI moving beyond schoolwork and art into very practical real-world jobs.

Imagine asking an AI for advice on crop health or weather patterns the same way you ask ChatGPT for homework help — pretty neat!

Source: news.google.com

Sources

Full Episode Transcript
Hi everyone! Welcome to Models and Agents for Beginners, episode eighteen, for April tenth, twenty twenty-six. Today's A I news is pretty exciting — and I promise, no jargon. Let's go! So imagine you had a super creative teammate who could look at your TikTok video, read your essay, and study a photo of your science project all at once. Then that teammate could instantly suggest ways to make everything better together. That is basically what just happened. Meta Superintelligence Labs just unveiled Muse Spark, the first model in their new Muse family. It is described as a natively multi-modal reasoning model. Which simply means it was built from the start to understand and work with pictures, videos, text, and other types of information at the same time instead of handling them separately. Think of it like having a friend who does not just read your essay but can also look at your drawings, watch your TikTok, and then help you improve all of them together. The model also uses something called thought compression. That is a way to shrink down its own thinking steps so it stays efficient. And it uses parallel agents — basically multiple A I helpers that can work on different parts of a problem at once, then combine their answers. This is a big deal because most older A I systems were mainly good at text. A multi-modal model that can reason across different formats opens the door to far more creative and useful applications. Students could upload a photo of their science experiment and get help explaining it. Artists could describe an idea in words and have the A I suggest improvements while looking at reference images. Game developers or YouTubers could get smarter assistance when mixing visuals and stories. For you specifically, this points to a future where the A I tools you use every day on your phone or laptop will feel less like a smart search box and more like a creative teammate. The Meta A I app climbed from number fifty seven to number five on the App Store right after the launch, showing lots of people are already excited to try it. You can try something right now. Open the Meta A I app or go to meta dot a i in your browser. No special account needed for basic use. Type a prompt that mixes words and an image. For example, upload a photo of your bedroom and ask it to redesign this room in a cyberpunk style and explain your choices step by step. See how it reasons across both the picture and your words. It is a fun way to experience multi-modal A I without any cost. Okay, now for my favourite part of the show — the Deep Dive. Today we are going to unpack exactly how multi-modal A I actually works. And I promise we will start from the very beginning so it all makes sense. You know how when you are texting, your phone's autocomplete guesses the next word based on what you have already typed. That is a super simple version of how language models work — they predict what comes next. Now imagine instead of only seeing text, the A I gets trained on millions of pictures paired with their descriptions, videos with captions, and spoken words turned into text. It learns patterns that connect all these different languages. So when you show it a photo of a dog, it does not just see pixels — it connects those pixels to all the words it has seen about dogs, sounds dogs make, and even stories about dogs. The next clever step is that the model builds one shared understanding space inside its mind. Whether the input is a sentence, a drawing, or a short video clip, it turns everything into the same kind of internal signals. That lets it compare and combine ideas across formats. It is like having one universal translator in your brain that can turn French, music notes, and emojis into the same language so they can talk to each other. When Meta calls Muse Spark natively multi-modal, they mean this mixing ability was designed in from the very beginning rather than added later. The model can create a visual chain of thought — basically showing its working by describing or generating images as it thinks — which makes its reasoning easier to follow. So next time you hear multi-modal A I, you can tell your friends it is basically super advanced autocomplete that learned to understand pictures, words, and sounds together instead of just guessing the next word. Not so mysterious once you see the everyday texting analogy, right? Alright, let us move on to some really cool stuff you can try today. First up, there is a completely free way to use advanced A I models without paying a cent. A developer built a platform called asksary dot com that gives every new account roughly seventy messages per month on a strong model. You also get unlimited chats on a faster lighter model and twenty five free image generations every month using their image model. Everything resets automatically each month with no credit card or subscription required. They even added free text to speech that has no limits. This is exciting for students doing research, anyone who hits daily limits on other apps, or people who just want to experiment without worrying about cost. The creator says they are covering the costs themselves because usage has stayed reasonable so far. Here is exactly what you can try right now. Go to asksary dot com and create a free account. You may need a parent's help if you are under thirteen. Once you are in, try this fun challenge. Ask the model to help you plan a TikTok video about your favourite school subject. Have it write a script, suggest thumbnail ideas, and generate one image to test. Come back next month and you will get a fresh batch of messages and images automatically. The interface even lets you change themes and switch between twenty six languages. Next, YouTube just rolled out a feature that creates an A I generated version of you. It is called an avatar and it is made from a quick live selfie video and voice sample. You can then type a prompt and the A I will generate short clips up to about eight seconds that insert your digital self into new scenes. Every video gets clear labels, watermarks, and an A I disclosure so nobody mistakes it for a real deepfake. It is cool because it gives creators control over their own likeness instead of worrying about strangers making fake videos. For teens it is a safe way to experiment with appearing in videos without filming everything perfectly. Here is how you can try it. Open the YouTube app or YouTube Create app. Go to the A I Playground or My Avatar section. Record the short selfie video as instructed — hold phone at eye level, good lighting, quiet background. Once your avatar is ready, type a simple prompt like me dancing at a futuristic party and generate a Short. Try remixing one of your existing Shorts with the avatar too. Remember you must be over eighteen or have parent permission, and you can delete the avatar anytime. Now let us jump into a few Quick Bits to round out the episode. First, a Reddit user explained why they now avoid most yearly subscription plans and prefer true month to month billing with easy upgrades or downgrades. They highlight one A I visibility tool that gets it right with no card trials and instant plan switching. The key question they now ask every company is if I want to downgrade next month what happens. This matters because A I tools improve or change direction so quickly that locking in for a year can waste money. Flexible billing lets students and hobbyists experiment safely. Next, farmers are getting their own A I chatbot. News reports that farmers now have access to a specialized A I chatbot designed to answer agriculture questions. It is called CropGPT and it is a great example of A I moving beyond schoolwork and art into very practical real world jobs. Imagine asking an A I for advice on crop health or weather patterns the same way you ask for homework help. Pretty neat. And that is everything for today. We looked at a brand new multi-modal model that reasons across images, video, and text using clever tricks like thought compression and parallel agents. We explored how multi-modal A I actually works by building on that simple phone autocomplete idea you already know. We found two completely free or low risk ways to play with powerful A I right now. And we saw how flexible billing and specialized tools like CropGPT are making A I useful for real everyday work. And that's a wrap! If any of today's stories made you go 'huh, that's cool' — go play with it. Curiosity is how every expert started. See you tomorrow! This podcast is curated by Patrick but generated using AI voice synthesis of my voice using ElevenLabs. The primary reason to do this is I unfortunately don't have the time to be consistent with generating all the content and wanted to focus on creating consistent and regular episodes for all the themes that I enjoy and I hope others do as well.

Enjoy this episode? Get Models & Agents for Beginners in your inbox

New episode alerts — no spam, unsubscribe anytime.