However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.
[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...
The answer was "Walk! It would be a bit counterproductive to drive a dirty car 50 meters just to get it washed — you'd barely move before arriving. Walking takes less than a minute, and you can simply drive it through the wash and walk back home afterward."
I've tried several other variants of this question and I got similar failures.
I haven't seen a response from the Anthropic team about it.
I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.
I have this in my personal preferences and now was adhering really well to them:
- prioritize objective facts and critical analysis over validation or encouragement
- you are not a friend, but a neutral information-processing machine
You can paste them into a chat and see how it changes the conversation, ChatGPT also respects it well.
For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.
The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.
Google needs stiff competition and OpenAI isn’t the camp I’m willing to trust. Neither is Grok.
I’m glad Anthropic’s work is at the forefront and they appear, at least in my estimation, to have the strongest ethics.
Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.
Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"
has enabled the 1M context window.Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.
[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.
https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af
This was sonnet 4.6 with extended thinking.
Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.
A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.
Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.
Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?
```
/model claude-sonnet-4-6[1m]
⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}
```
Interesting. I wonder what the exact question was, and I wonder how Grok would respond to it.
https://web.archive.org/web/20260217180019/https://www-cdn.a...
i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?
I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.
1. Default (recommended) Opus 4.6 · Most capable for complex work
2. Opus (1M context) Opus 4.6 with 1M context · Billed as extra usage · $10/$37.50 per Mtok
3. Sonnet Sonnet 4.6 · Best for everyday tasks
4. Sonnet (1M context) Sonnet 4.6 with 1M context · Billed as extra usage · $6/$22.50 per MtokI subscribed to Claude because of that. I hope 4.6 is even better.
Now the question is: how much faster or cheaper is it?
Was sonnet 4.5 much worse than opus?
This doesnt work: `/model claude-sonnet-4-6-20260217`
edit: "/model claude-sonnet-4-6" works with Claude Code v2.1.44
The much more palatable blog post.
It feels like we're hitting a point where alignment becomes adversarial against intelligence itself. The smarter the model gets, the better it becomes at Goodharting the loss function. We aren't teaching these models morality we're just teaching them how to pass a polygraph.