Solid bird, not a great bicycle frame.
> GLM-5 can turn text or source materials directly into .docx, .pdf, and .xlsx files—PRDs, lesson plans, exams, spreadsheets, financial reports, run sheets, menus, and more.
A new type of model has joined the series, GLM-5-Coder.
GLM-5 was trained on Huawei Ascend, last time when DeepSeek tried to use this chip, it flopped and they resorted to Nvidia again. This time seems like a success.
Looks like they also released their own agentic IDE, https://zcode.z.ai
I don’t know if anyone else knows this but Z.ai also released new tools excluding the Chat! There’s Zread (https://zread.ai), OCR (seems new? https://ocr.z.ai), GLM-Image gen https://image.z.ai and Voice cloning https://audio.z.ai
If you go to chat.z.ai, there is a new toggle in the prompt field, you can now toggle between chat/agentic. It is only visible when you switch to GLM-5.
Very fascinating stuff!
Although it doesn't really matter much. All of the open weights models lately come with impressive benchmarks but then don't perform as well as expected in actual use. There's clearly some benchmaxxing going on.
In my personal benchmark it's bad. So far the benchmark has been a really good indicator of instruction following and agentic behaviour in general.
To those who are curious, the benchmark is just the ability of model to follow a custom tool calling format. I ask it to using coding tasks using chat.md [1] + mcps. And so far it's just not able to follow it at all.
Certainly seems to remember things better and is more stable on long running tasks.
US attempts to contain Chinese AI tech totally failed. Not only that, they cost Nvidia possibly trillions of dollars of exports over the next decade, as the Chinese govt called the American bluff and now actively disallow imports of Nvidia chips as a direct result of past sanctions [3]. At a time when Trump admin is trying to do whatever it can to reduce the US trade imbalance with China.
[1] https://tech.yahoo.com/ai/articles/chinas-ai-startup-zhipu-r...
[2] https://www.techradar.com/pro/chaos-at-deepseek-as-r2-launch...
[3] https://www.reuters.com/world/china/chinas-customs-agents-to...
Claude Opus 4.6: 65.5%
GLM-5: 62.6%
GPT-5.2: 60.3%
Gemini 3 Pro: 59.1%
Why is GLM 5 more expensive than GLM 4.7 even when using sparse attention?
There is also a GLM 5-code model.
So far I haven't managed to get comparably good results out of any other local model including Devstral 2 Small and the more recent Qwen-Coder-Next.
I am still waiting if they'd launch GLM-5 Air series,which would run on consumer hardware.
We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.
I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?
How do you read this?
Edit: Input tokens are twice as expensive. That might be a deal breaker.
Honestly these companies are so hard to takes seriously with these release details. If it's an open source model and you're only comparing open source - cool.
If you're not top in your segment, maybe show how your token cost and output speed more than make up for that.
Purposely showing prior-gen models in your release comparison immediately discredits you in my eyes.
Betting on whether they can actually perform their sold behaviors.
Passing around code repositories for years without ever trying to run them, factory sealed.