Meta hasn’t fully caught up, but they came close and I think can solidly claim to be a frontier lab again. I’d call it a 3.5 horse race right now, and hopefully their next model improves. More model competition is good!
Poor Grok 4.2 should probably be dropped from the table.
So many different companies are going to have similarly powerful ai that there will be no moat around it and it will be cheap. They will never earn their investment back.
Major analytical errors in their response to multiple of my technical questions.
Not sure what this is now.
How does one get their hands on these models? They are not open-source, right? I go to meta.ai, but it's just a chat interface---no equivalent to codex or claud code? Can you use this through OpenCode? Is meta charging for model access, or is the gathering of chat data a sufficiently large tithe?
Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has
I don't like that I need to login to my FB/Instagram account to access this.
This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?
I Googled it and found absolutely nothing.
Well, to be honest, I got 100% of websites containing the French word "boîtier" (box) with a typo.
Even on Google Scholar, the closest match is "BioTiER (Biological Training in Education and Research) Scholars Program", which is at least 10 years old and has nothing to do with that.
Is that an AI-generated image with an AI-generated name that has no physical existence?
Finding a little bit tricky to evaluate because the harness is unfortunately very, very bad (e.g. search is awful). Can't wait to try this in some real external services where we can see how it performs for real.
Definitely getting ordinary high-quality results, overall. But hard to test agentic behavior and hard to test prose quality, even, when just working off of the default chat interface.
One thing that stands out is that _for_ the quality it feels very, very fast. Perhaps it's just only very lightly loaded right now, but irrespective it's lovely to feel.
I'm quite impressed with the tone overall. It definitely feels much more like Opus than it does, like, GPT or Grok in the sense that the style is conversational, natural and enjoyable.
It nailed all the ChatGPT meme gotchas (walk to the carwash, Alice 50 brothers, upside down cup, R's in strawberry, which number is bigger, 9.11 or 9.9?)
I guess all that money poaching OpenAI / Anthropic talent went somewhere...
Now, would I use "Meta Muse Code" or "Muse CoWork" if I have to have a facebook account to all of my developers? Maybe not.
Would I use it via an API key? I might, depends on the pricing!
What could have been interesting has been reduced to simply another subpar LLM release.
Love to see it. Cheers!
If Meta wants to be seen as a cutting edge massive lab they need to come across as one instead of looking like a school project version of a frontier model.
Also, I think people aren't used that using such models requires meta.ai or meta ai app.
Not my loss, will keep using DeepSeek then. Wake me up when my country is no longer in the wrong/right side of history.
The same is true with any other model, unless otherwise stated.
In the next few days, we'll see who Meta has paid to promote this model on social media.
Edit: nvm I can't read, regular benchmarks against SOTA are there
Maybe they need to mine more libra coin first? or is it diem now? is that even still part of meta?
I'm sure this new AI is super intelligent and super awesome and will be writing all the code, making all the blog posts, and generating all our youtube shorts in 6 months.