Home
All Shows
Models & Agents Planetterrian Daily Omni View Models & Agents for Beginners Fascinating Frontiers Modern Investing Techniques Tesla Shorts Time Environmental Intelligence Финансы Просто Привет, Русский!
Blogs
All Blog Posts Models & Agents Blog Planetterrian Daily Blog Omni View Blog Models & Agents for Beginners Blog Fascinating Frontiers Blog Modern Investing Techniques Blog Tesla Shorts Time Blog Environmental Intelligence Blog Финансы Просто Blog Привет, Русский! Blog
Models & Agents Models & Agents Blog

Models & Agents — Episode 3

Perplexity open-sources embedding models that match Google and Alibaba performance at a fraction of the memory cost.

February 28, 2026 Ep 3 6 min read Listen to podcast View summaries

Models & Agents

Date: February 28, 2026

Perplexity open-sources embedding models that match Google and Alibaba performance at a fraction of the memory cost.

What You Need to Know: Perplexity's new open-source embedding models deliver high-quality text representations with drastically lower memory footprints, making them a game-changer for resource-constrained RAG setups compared to heavier alternatives from Google or Alibaba. Meanwhile, a wave of arXiv papers introduces innovative frameworks like CultureManager for task-specific cultural alignment and SMTL for efficient agentic search, pushing boundaries in multilingual and long-horizon reasoning. Pay attention this week to how these tools bridge gaps in low-resource languages and agent efficiency, offering fresh ways to optimize your workflows without massive compute.

Top Story

Perplexity has open-sourced two new text embedding models that rival or surpass offerings from Google and Alibaba while using far less memory. These models focus on efficient embeddings for search and RAG applications, with one optimized for short queries and another for longer passages, achieving top performance on benchmarks like MTEB at reduced sizes. Compared to Google's Gecko or Alibaba's BGE, they cut memory needs by up to 10x without sacrificing accuracy, thanks to techniques like Matryoshka Representation Learning. Developers building AI search or retrieval systems should care, as this democratizes high-performance embeddings for edge devices or cost-sensitive apps. To get started, integrate them via Hugging Face for quick RAG prototypes. Watch for community fine-tunes and integrations with agent frameworks like LangChain, which could amplify their impact on multilingual search.

Source: the-decoder.com

Model Updates

Current language model training leaves large parts of the internet on the table: The Decoder

Researchers from Apple, Stanford, and UW revealed how different HTML extractors lead to vastly different training data for LLMs, with tools like Trafilatura capturing more diverse content than BeautifulSoup. This highlights a key limitation in current foundation model training, where extractor choice can exclude up to 50% of web data, affecting model robustness compared to more inclusive pipelines. It matters for practitioners fine-tuning models, as it suggests auditing your data pipeline for better generalization in real-world apps.

Source: the-decoder.com

Decoder-based Sense Knowledge Distillation: cs.CL updates on arXiv.org

DSKD introduces a framework to distill lexical knowledge from sense dictionaries into decoder LLMs like Llama, improving performance on benchmarks without needing runtime lookups. It outperforms vanilla distillation by enhancing semantic understanding, though it adds training overhead compared to encoder-focused methods. This is crucial for builders creating generative agents that need structured knowledge integration, bridging gaps in models like GPT or Claude.

Source: arxiv.org

Ruyi2 Technical Report: cs.CL updates on arXiv.org

Ruyi2 evolves the AI Flow framework for adaptive, variable-depth computation in LLMs, using 3D parallel training to speed up by 2-3x over Ruyi while matching Qwen2 models. It enables "Train Once, Deploy Many" via family-based parameter sharing, reducing costs for edge deployment compared to full retraining in models like Mistral. Developers in inference optimization will benefit from its balance of efficiency and performance in dynamic agent scenarios.

Source: arxiv.org

dLLM: Simple Diffusion Language Modeling: cs.CL updates on arXiv.org

dLLM is an open-source framework unifying training, inference, and evaluation for diffusion language models, with checkpoints for small DLMs convertible from BERT or autoregressive LMs. It simplifies building DLMs from scratch with low compute, outperforming ad-hoc implementations in reproducibility, though it lags autoregressive models like Llama in speed. This empowers community devs to experiment with diffusion for creative generation tasks.

Source: arxiv.org

Agent & Tool Developments

A new benchmark pits five AI models against each other as autonomous social media agents on X: The Decoder

Arcada Labs' benchmark tests leading LLMs like GPT-4o and Claude as autonomous agents on X, evaluating social engagement and task completion in real-time scenarios. It's a step up from static benchmarks, revealing strengths in models like Grok for dynamic interactions, but highlights latency issues in agent frameworks like AutoGen. Try it by running the open benchmark scripts to assess your custom agents' social capabilities.

Source: the-decoder.com

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization: cs.CL updates on arXiv.org

SMTL reframes agentic RAG for long-horizon tasks with parallel evidence acquisition, boosting accuracy by 7.7 points on QA benchmarks while cutting reasoning steps by 70% over baselines like Mirothinker. It generalizes across deterministic and open-ended research, making it more efficient than depth-scaling in LangGraph agents. Integrate it into your RAG pipelines via the released code for faster multi-step reasoning.

Source: arxiv.org

Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads: cs.CL updates on arXiv.org

This work identifies Retrieval-Transition Heads (RTH) in multilingual LLMs like Qwen-2.5 and Llama-3.1, improving Chain-of-Thought reasoning by 5-10 points on benchmarks like MGSM through targeted masking. RTHs enhance cross-lingual transitions over standard retrieval heads, vital for agents in diverse languages, though they require white-box access unlike black-box tools in AutoGPT. Experiment with the masking technique in your fine-tuned models for better multilingual agents.

Source: arxiv.org

Practical & Community

Detecting Hate and Inflammatory Content in Bengali Memes: A New Multimodal Dataset and Co-Attention Framework: cs.CL updates on arXiv.org

Bn-HIB is a new dataset of 3,247 annotated Bengali memes for multilabel hate detection, paired with the MCFM model that fuses visual and textual features via co-attention, outperforming baselines on low-resource tasks. It solves multimodal challenges in non-English contexts, but requires multimodal LLMs like Claude for full use. Get started by downloading the dataset from the repo and fine-tuning MCFM for custom content moderation.

Source: arxiv.org

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context: cs.CL updates on arXiv.org

SAFARI provides a multilingual stereotype dataset for Ghana, Kenya, Nigeria, and South Africa, with 3,534 English and 3,206 native-language entries, using community surveys for cultural sensitivity. It addresses gaps in global AI safety for low-resource regions, though it needs integration with alignment tools like red-teaming in Llama Guard. Explore it for fine-tuning models on diverse cultural biases via the open dataset.

Source: arxiv.org

Causality $\neq$ Invariance: Function and Concept Vectors in LLMs: cs.CL updates on arXiv.org

This paper distinguishes Concept Vectors (CVs) from Function Vectors in LLMs like Llama, enabling better out-of-distribution generalization in reasoning tasks across languages. CVs improve steering over FVs for abstract concepts, but they're compute-intensive for large models. Use the released code to extract CVs in your agents for enhanced ICL performance.

Source: arxiv.org

Things to Try This Week

  • Try Perplexity's new embedding models via Hugging Face for lightweight RAG in your LangChain apps — they cut memory use dramatically while holding accuracy, ideal for mobile or edge deployments.
  • Experiment with the SMTL framework on a QA benchmark like MGSM to see efficiency gains in long-horizon agentic search — it's a quick way to reduce latency in tools like CrewAI without losing performance.
  • Fine-tune a small DLM using the dLLM framework from a base like BERT — perfect if you're exploring diffusion for creative text generation on limited hardware.
  • Test Retrieval-Transition Heads in Llama-3.1 for multilingual CoT tasks — mask them to boost generalization in your custom agents, especially for non-English workflows.
  • Download the Bn-HIB dataset and run MCFM for Bengali meme analysis — great for building multimodal hate detection in low-resource languages with models like Gemini.

On the Horizon

  • Expect more open-source releases from Perplexity, potentially expanding to multimodal embeddings for broader RAG applications.
  • Watch for expansions of agent benchmarks like Arcada's to include more platforms beyond X, influencing frameworks like Microsoft AutoGen.
  • Anticipate community fine-tunes of Ruyi2 for specialized agent tasks, building on its adaptive computation paradigm.
  • Look out for integrations of new datasets like SAFARI into alignment tools, advancing safety in underrepresented regions.

Sources