Meta has released Muse Spark, the first model produced by Meta Superintelligence Labs, the team assembled roughly nine months ago under Chief AI Officer Alexandr Wang following Meta’s $14 billion acquisition of Scale AI. The model is available immediately on meta.ai and the Meta AI app, with a broader rollout to Facebook, Instagram, and WhatsApp expected within the next few weeks. A private API preview is also opening to select developers.
Unlike previous iterations of Meta’s AI offerings, Muse Spark is natively multimodal, meaning it handles images, text, and voice from the ground up rather than adding visual capabilities onto an existing text-based system. The model includes visual chain-of-thought reasoning, tool-use support, and a feature Meta calls Contemplating mode, which runs multiple AI agents in parallel to address more complex tasks. The company describes this as its answer to extended thinking features found in Google‘s Gemini Deep Think and OpenAI‘s GPT Pro.
Meta worked with more than 1,000 physicians to curate training data focused on medical reasoning. On the HealthBench Hard benchmark, which tests open-ended health queries, Muse Spark scored 42.8, compared to 40.1 for GPT 5.4 and 20.6 for Gemini 3.1 Pro. The model also leads on agentic search via DeepSearchQA with a score of 74.8, ahead of Gemini at 69.7 and GPT 5.4 at 73.6. On CharXiv Reasoning, which measures figure comprehension from scientific papers, Muse Spark posted the highest score among compared models at 86.4.
Despite those results, the broader benchmark picture shows Gemini 3.1 Pro ahead in most categories. The gap is most pronounced on ARC AGI 2, an abstract reasoning benchmark where Gemini scored 76.5 against Muse Spark’s 42.5. On coding via LiveCodeBench Pro, Gemini’s 82.9 outpaces Meta’s 80.0, and on MMMU Pro multimodal understanding, Gemini scored 83.9 versus 80.4. Meta’s own announcement acknowledges current shortcomings in long-horizon agentic systems and coding workflows.
When Contemplating mode is activated, performance improves considerably. In that configuration, Muse Spark reached 58% on Humanity’s Last Exam and 38% on FrontierScience Research, placing it in competitive range with the most capable versions of Gemini and GPT rather than their standard releases. Meta states that its new pretraining stack can reach capability levels comparable to Llama 4 Maverick using more than ten times less compute. The model was developed under the internal codename Avocado.
One notable aspect of the launch is a strategic departure from Meta’s established open-source approach. Muse Spark is a closed model, meaning its architecture and weights will not be made public, marking a clear break from the Llama series that built Meta’s standing in open AI communities. Following a lukewarm reception to Llama 4 earlier this year, the company appears to be charting a different course. Meta has said it hopes to open-source future versions of the Muse family, but the current release remains proprietary.
Meta’s stock rose nearly 9% on Wednesday following the announcement before closing the trading day up 6.5% at $612.42. The company is also introducing a shopping assistant capable of comparing products and linking directly to purchases. Meta plans to deploy Muse Spark across its platforms, potentially reaching more than 3.5 billion users, following the distribution strategy it has applied since Llama 3. A more capable successor model is already reported to be in development, with Muse Spark described internally as a small and fast first step in the broader Muse family.
Originally reported by Decrypt.
