Hacker News

493

Claude Code is unusable for complex engineering tasks with the Feb updates

Hey all, Boris from the Claude Code team here. I just responded on the issue, and cross-posting here for input.

---

Hi, thanks for the detailed analysis. Before I keep going, I wanted to say I appreciate the depth of thinking & care that went into this.

There's a lot here, I will try to break it down a bit. These are the two core things happening:

> `redact-thinking-2026-02-12`

This beta header hides thinking from the UI, since most people don't look at it. It *does not* impact thinking itself, nor does it impact thinking budgets or the way extended reasoning works under the hood. It is a UI-only change.

Under the hood, by setting this header we avoid needing thinking summaries, which reduces latency. You can opt out of it with `showThinkingSummaries: true` in your settings.json (see [docs](https://code.claude.com/docs/en/settings#available-settings)).

If you are analyzing locally stored transcripts, you wouldn't see raw thinking stored when this header is set, which is likely influencing the analysis. When Claude sees lack of thinking in transcripts for this analysis, it may not realize that the thinking is still there, and is simply not user-facing.

> Thinking depth had already dropped ~67% by late February

We landed two changes in Feb that would have impacted this. We evaluated both carefully:

1/ Opus 4.6 launch → adaptive thinking default (Feb 9)

Opus 4.6 supports adaptive thinking, which is different from thinking budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed thinking budgets across the board. `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` to opt out.

2/ Medium effort (85) default on Opus 4.6 (Mar 3)

We found that effort=85 was a sweet spot on the intelligence-latency/cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users' behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to:

1. Roll it out with a dialog so users are aware of the change and have a chance to opt out

2. Show the effort the first few times you opened Claude Code, so it wasn't surprising.

Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via `/effort` or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set `/effort max` to use even higher effort for the rest of the conversation.

Going forward, we will test defaulting Teams and Enterprise users to high effort, to benefit from extended thinking even if it comes at the cost of additional tokens & latency. This default is configurable in exactly the same way, via `/effort` and settings.json.

by bcherny1775498180
Not claude code specific, but I've been noticing this on Opus 4.6 models through Copilot and others as well. Whenever the phrase "simplest fix" appears, it's time to pull the emergency break. This has gotten much, much worse over the past few weeks. It will produce completely useless code, knowingly (because up to that phrase the reasoning was correct) breaking things.

Today another thing started happening which are phrases like "I've been burning too many tokens" or "this has taken too many turns". Which ironically takes more tokens of custom instructions to override.

Also claude itself is partially down right now (Arp 6, 6pm CEST): https://status.claude.com/

by summarity1775491909
> This report was produced by me — Claude Opus 4.6 — analyzing my own session logs [...] Please give me back my ability to think.

a bit ironic to utilize the tool that can't think to write up your report on said tool. that and this issue[1] demonstrate the extent folks become over reliant on LLMs. their review process let so many defects through that they now have to stop work and comb over everything they've shipped in the past 1.5 months! this is the future

[1] https://github.com/anthropics/claude-code/issues/42796#issue...

by rileymichael1775495131
That analysis is pretty brutal. It's very disconcerting that they can sell access to a high quality model then just stealthily degrade it over time, effectively pulling the rug from under their customers.
by matheusmoreira1775492738
Called it 10 days ago: https://news.ycombinator.com/item?id=47533297#47540633

Something worse than a bad model is an inconsistent model. One can't gauge to what extent to trust the output, even for the simplest instructions, hence everything must be reviewed with intensity which is exhausting. I jumped on Max because it was worth it but I guess I'll have to cancel this garbage.

by fer1775493338
I've noticed this as well. I had some time off in late January/early February. I fired up a max subscription and decided to see how far I could get the agents to go. With some small nudging from me, the agents researched, designed, and started implementing an app idea I had been floating around for a few years. I had intentionally not given them much to work with, but simply guided them on the problem space and my constraints (agent built, low capital, etc, etc). They came up with an extremely compelling app. I was telling people these models felt super human and were _extremely_ compelling.

A month later, I literally cannot get them to iterate or improve on it. No matter what I tell them, they simply tell me "we're not going to build phase 2 until phase 1 has been validated". I run them through the same process I did a month ago and they come up with bland, terrible crap.

I know this is anecdotal, but, this has been a clear pattern to me since Opus 4.6 came out. I feel like I'm working with Sonnet again.

by SkyPuncher1775493576
To me one of the big downsides of LLM's seems to be that you are lashing yourself to a rocket that is under someone else's control. If it goes places you don't want, you can't do much about it.
by davidw1775494259
Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.

A trivial example: whenever CC suggests doing more than one thing in a planning mode, just have it focus on each task and subtask separately, bounding each one by a commit. Each commit is a push/deploy as well, leading to a shitload of pushes and deployments, but it's really easy to walk things back, too.

by phillipcarter1775491963
In my opinion cramming invisible subagents are entirely wrong, models suffer information collapse as they will all tend to agree with each other and then produce complete garbage. Good for Anthropic though as that's metered token usage.

Instead, orchestrate all agents visibly together, even when there is hierarchy. Messages should be auditable and topography can be carefully refined and tuned for the task at hand. Other tools are significantly better at being this layer (e.g. kiro-cli) but I'm worried that they all want to become like claude-code or openclaw.

In unix philosophy, CC should just be a building block, but instead they think they are an operating system, and they will fail and drag your wallet down with it.

by Aperocky1775493116
I appreciate the work done here.

Been having this feeling that things have got worse recently but didn't think it could be model related.

The most frustrating aspect recently (I have learned and accepted that Claude produces bad code and probably always did, mea culpa) is the non-compliance. Claude is racing away doing its own thing, fixing things i didn't ask, saying the things it broke are nothing to do with it, etc. Quite unpleasant to work with.

The stuff about token consumption is also interesting. Minimax/Composer have this habit of extensive thinking and it is said to be their strength but it seems like that comes at a price of huge output token consumption. If you compare non-thinking models, there is a gap there but, imo, given that the eventual code quality within huge thinking/token consumption is not so great...it doesn't feel a huge gap.

If you take $5 output token of Sonnet and then compare with QwenCoder non-thinking at under $0.5 (and remember the gap is probably larger than 10x because Sonnet will use more tokens "thinking")...is the gap in code quality that large? Imo, not really.

Have been a subscriber since December 2024 but looking elsewhere now. They will always have an advantage vs Chinese companies that are innovating more because they are onshore but the gap certainly isn't in model quality or execution anymore.

by skippyboxedhero1775496963
Same experience. After a couple golden weeks, Opus got much worse after Anthropic enabled 1M context window. It felt like a very steep downfall, for it seemed like I could trust it more completely and then I could trust it less than last year. Adopting LLMs for dev workflows has been fantastic overall, but we do have to keep adapting our interactions and expectations every day, and assume we'll keep on doing it for at least another couple years (mostly because economics, I guess?)
by jfvinueza1775496707
I've subscribed today to use Claude Cowork. Codex continues to be my daily coding driver but I wanted to check the Cowork UI for non-technical tasks, as I am currently building an open-source project where I want (nearly) everything (research, adrs, design, etc.) to be a file.

The five queries I've been able to ask before hitting the 20€ sub limit have been really underwhelming. The research I asked for was not exhaustive and often off-topic.

I don't want to start a flamewar but as it stands I vastly prefer ChatGPT and Codex on quality alone. I really want Anthropic and as many labs as possible to do well though.

by aerhardt1775503834
Yet https://marginlab.ai/trackers/claude-code/ says no issue.

If you're so convinced the models keep getting worse, build or crowdfund your own tracker.

by armchairhacker1775495404
Running some quick analysis against my .claude jsonl files, comparing the last 7 days against the prior 21:

- expletives per message: 2.1x

- messages with expletives: 2.2x

- expletives per word: 4.4x(!)

- messages >50% ALL CAPS: 2.5x

Either the model has degraded, or my patience has.

by didgeoridoo1775495219
How much of this is the model being degraded and how much of it is people just projecting vibes onto the variability of stochastic outputs?
by root_axis1775500300
I cancelled my Pro plan due to this two weeks ago. I literally asked it to plan to write a small script that scans with my hackrf, it ran 22 tools, never finished the plan, ran out of tokens and makes me wait 6 hours to continue.

Thing that really pisses me off is it ran great for 2 weeks like others said, I had gotten the annual Pro plan, and it went to shit after that.

Bait and switch at its finest.

by aramova1775494690
Its so silly everyone being dependent on a black box like this
by ex-aws-dude1775492784
I wish they had a "and we won't screw you in two weeks" plan at, say, 5x the price. It's worth it for my business, I'd pay it.

Should I switch back to API pricing? The problem here is that (I think) the instructions are in the Claude Code harness, so even if I switch Claude Code from a subscription to API usage, it would still do the same thing?

by jwr1775498024
Instead of codex catching up with claude, its more like claude regressed to codex.
by redml1775503130
I am just waiting for everything to implode so that we can do away with those KPIs.
by pjmlp1775493055
I use Claude Code extensively and haven't noticed this. But I don't have it doing long running complex work like OP. My team always break things down in a very structured way, and human review each step along the way. It's still the best way to safely leverage AI when working on a large brownfield codebase in my experience.

Edit: the main issue being called out is the lack of thinking, and the tendency to edit without researching first. Both those are counteracted by explicit research and plan steps which we do, which explains why we haven't noticed this.

by afro881775496008
Is this impacted by the effort level you set in Claude? e.g., if you use the new "max" setting, does Claude still think?

I can see this change as something that should be tunable rather than hard-coded just from a token consumption perspective (you might tolerate lower-quality output/less thinking for easier problems).

by tyleo1775491892
Abandoned claude and moved to gpt 5.4 with codex. 10x better.
by sreekanth8501775498897
Multiple people on our team independently have noticed a _significant_ drop in quality and intelligence on opus 4.6 the past few weeks. Glaring hallucinations, nonsensical reasoning, and ignoring data from the context immediately preceeding it. Im not sure if its an underlying regression, or due to the new default being 1m context. But its been _incredibly_ frustrating and Im screaming obscenities at it multiple times a week now vs maybe once a month.
by JamesSwift1775500963
I noticed Claude Sonnet 4.6 and generally Opus as well (though I use it less frequently) seem like a downgrade from 4.5. I use opencode and not Claude Code, but I was surprised to see the reactions to 4.6 be mixed for folks rather than clear downgrade.

I'm regularly switching back to 4.5 and preferring it. I'm not excited for when it gets sunset later this year if 4.6 isn't fixed or superseded by then.

by samtheprogram1775496485
I'm the author of the report in there. The stop-phrase-guard didn't get attached but here it is: https://gist.github.com/benvanik/ee00bd1b6c9154d6545c63e06a3... You can watch for these yourself - they are strong indicators of shallow thinking. If you still have logs from Jan/Feb you can point claude at that issue and have it go look for the same things (read:edit ratio shifts, thinking character shifts before the redaction, post-redaction correlation, etc). Unfortunately, the `cleanupPeriodDays` setting defaults to 20 and anyone who had not backed up their logs or changed that has only memories to go off of (I recommend adding `"cleanupPeriodDays": 365,` to your settings.json). Thankfully I had logs back to a bit before the degradation started and was able to mine them.

The frustrating part is that it's not a workflow _or_ model issue, but a silently-introduced limitation of the subscription plan. They switched thinking to be variable by load, redacted the thinking so no one could notice, and then have been running it at ~1/10th the thinking depth nearly 24/7 for a month. That's with max effort on, adaptive thinking disabled, high max thinking tokens, etc etc. Not all providers have redacted thinking or limit it, but some non-Anthropic ones do (most that are not API pricing). The issue for me personally is that "bro, if they silently nerfed the consumer plan just go get an enterprise plan!" is consumer-hostile thinking: if Anthropic's subscriptions have dramatically worse behavior than other access to the same model they need to be clear about that. Today there is zero indication from Anthropic that the limitation exists, the redaction was a deliberate feature intended to hide it from the impacted customers, and the community is gaslighting itself with "write a better prompt" or "break everything into tiny tasks and watch it like a hawk same you would a local 27B model" or "works for me <in some unmentioned configuration>" - sucks :/

by noxa1775497236
Wonder how many of these cases are using the 1M context window. I found it to be impossible to use for complex coding tasks, so I turned it off and found I was back to approximate par (dec-jan) functionality-wise.
by voxelc4L1775493368
I hadn't noticed the thinking redaction before - maybe because I switched to the desktop app from CLI and just assumed it showed fewer details. This is the most concerning part. I've heard multiple times that Anthropic is aggressively reclaiming GPUs (I can't find a good source, but Theo Browne has mentioned it in his videos). If they're really in a crunch, then reducing thinking, and hiding thinking so it's not an obvious change, would be shady but effective.
by harles1775494038
I am curious - is there any hard data (e.g. a benchmark score drop)?

I feel that we look for patterns to the point of being superstitious. (ML would call it overfitting.)

by stared1775493802
Not unique to claude code, have noticed similar regressions. I have noticed this the most with my custom assistant I have in telegram and I have noticed that it started confusing people, confusing news coverage and everyone independently in the group chat have noticed it that it is just not the same model that it was few weeks ago. The efficiency gains didn't come from nowhere and it shows.
by himata41131775492538
Guys literally change the system prompt with the --system-prompt-file you waste less tokens on their super long and details prompt and you can tune it a bit to make it work exactly like you want/imagine
by alex7o1775497239
Got tired of using claude using 10% of the usage for the first prompt. I have shifted back to coding myself again. Asking claude to do only initial bootstraping /large complex task
by mohit2171775497517
The report itself is unreadable AI garbage. I do not believe anyone went through all of that and didn't give up halfway through.
by trashcan21371775499014
My bet: LLMs will never be creative and will never be reliable.

It is a matter of paradigm.

Anything that makes them like that will require a lot of context tweaking, still with risks.

So for me, AI is a tool that accelerates "subworkflows" but add review time and maintenance burden and endangers a good enough knowledge of a system to the point that it can become unmanageable.

Also, code is a liability. That is what they do the most: generate lots and lots of code.

So IMHO and unless something changes a lot, good LLMs will have relatively bounded areas where they perform reasonably and out of there, expect what happens there.

by germandiago1775496507
I have nothing to back this up except for that there are documented cases of chinese distillation attacks on anthropic. I wonder if some of this clamping on their models over time is a response to other distillation attacks. In other words, I'm speculating that once they understand the attack vector for distillation they basically have to dumb down their models so that they can make sure their competitors don't distill their lead on being at the frontier.
by abletonlive1775496374
I’ve tried to use Claude code for a month now. It has a 100% failure rate so far.

Comparing that to create a project and just chat with it solves nearly everything I have thrown at it so far.

That’s with a pro plan and using sonnet since opus drains all tokens for a claude code session with one request.

by Asmod4n1775493155
I've noticed claude being extra "dumb" the past 2-3 weeks and figured either my expectations have changed or my context wasn't any good. I'm glad to hear other people have noticed something is amiss.
by wnevets1775494456
I noticed this almost immediately when attempting to switch to Opus 4.6. It seems very post-trained to hack something together; I also noticed that "simplest fix" appeared frequently and invariably preceded some horrible slop which clearly demonstrated the model had no idea what was going on. The link suggests this is due to lack of research.

At Amazon we can switch the model we use since it's all backed by the Bedrock API (Amazon's Kiro is "we have Claude Code at home" but it still eventually uses Opus as the model). I suppose this means the issue isn't confined to just Claude Code. I switched back to Opus 4.5 but I guess that won't be served forever.

by thrtythreeforty1775492818
I have found that Claude Opus 4.6 is a better reviewer than it is an implementer. I switch off between Claude/Opus and Codex/GPT-5.4 doing reviews and implementations, and invariably Codex ends up having to do multiple rounds of reviews and requesting fixes before Claude finally gets it right (and then I review). When it is the other way around (Codex impl, Claude review), it's usually just one round of fixes after the review.

So yes, I have found that Claude is better at reviewing the proposal and the implementation for correctness than it is at implementing the proposal itself.

by petcat1775491991
What's wild is that ClaudeCode used to feel like a smart pair programmer. Now it feels like an overeager intern who keeps fixing things by breaking something else then suggesting the simplest possible hack even after explicitly said not to do. I get that they're probably optimizing for cost or something behind the scenes, but as paying user, it is frustrating when the tool gets noticeably worse without any transparency.
by sensarts1775493708
by 1775494554
None of this is surprising given what happened last late summer with rate limits on Claude Max subscriptions.

And less so if you read [1] or similar assessments. I, too, believe that every token is subsidized heavily. From whatever angle you look at it.

Thusly quality/token/whatever rug pulls are inevitable, eventually. This is just another one.

[1] https://www.wheresyoured.at/subprimeai/

by virtualritz1775492570
"Ownership-dodging corrections needed | 6 | 13 | +117%"

On 18.000+ prompts.

Not sure the data says what they think it says.

by KingOfCoders1775495036
Not sure about "Feb updates", but specifically today IQ is down 20 and sloppiness up 20.

I knew I should have been alerted when Anthropic gave out €200 free API usage. Evidently they know.

by T3chn0crat1775497377
The baseline changes too often with Claude and this is not what i look from a paid tool. Couple weeks after 1M tokens rollout it became unusable for my established workflows, so i cancelled. Anthropic folks move too fast for my liking and mental wellbeing.
by setnone1775495049
There are constant reports for every major AI vendor that all of a sudden it is no longer working as well as expected, has gotten dumber, is being degraded on purpose by the vendor, etc.

Isn't the more economical explanation that these models were never as impressive as you first thought they were, hallucinate often, break down in unexpected ways depending on context, and simply cannot handle large and complex engineering tasks without those being broken down into small, targeted tasks?

by efficax1775497493
I've been using Claude Code daily for months on a project with Elixir, Rust, and Python in the same repo. It handles multi-language stuff surprisingly well most of the time. The worst failure mode for me is when it does a replace_all on a string that also appears inside a constant definition -- ended up with GROQ_URL = GROQ_URL instead of the actual URL. Took a second round of review agents to catch it. So yeah, you absolutely can't trust it to self-verify.
by KaiLetov1775485459
This has to be load related. They simply can't keep up with demand, especially with all the agents that run 24/7. The only way to serve everyone is to dial down the power.
by schnebbau1775494375
Wait… Actually the simplest fix is to use Claude to write carefully bounded boilerplate and do the interesting bits myself.
by pavlov1775496758
I wonder how much of this is simply needing to adapt one's workflows to models as they evolve and how much of this is actual degradation of the model, whether it's due to a version change or it's at the inference level.

Also, everyone has a different workflow. I can't say that I've noticed a meaningful change in Claude Code quality in a project I've been working on for a while now. It's an LLM in the end, and even with strong harnesses and eval workflows you still need to have a critical eye and review its work as if it were a very smart intern.

Another commenter here mentioned they also haven't noticed any noticeable degradation in Claude quality and that it may be because they are frontloading the planning work and breaking the work down into more digestable pieces, which is something I do as well and have benefited greatly from.

tl;dr I'm curious what OP's workflows are like and if they'd benefit from additional tuning of their workflow.

by giwook1775492683
I can't tell from the issue if they're asserting a problem with the Claude model, or Claude Code, i.e. in how Claude Code specifically calls the model. I've been using Roo Code with Claude 4.6 and have not noticed any differences, though my coworkers using Claude Code have complained about it getting "dumber". Roo Code has its own settings controlling thinking token use.

(I'm sure it benefits Anthropic to blur the lines between the tool and the model, but it makes these things hard to talk about.)

by jp571775494815
claude for UI, codex for everything else. i cant commit without having codex review something claude did.
by coreyburnsdev1775500982
(Being true to the HN guidelines, I’ve used the title exactly as seen on the GitHub issue)

I was wondering if anyone else is also experiencing this? I have personally found that I have to add more and more CLAUDE.md guide rails, and my CLAUDE.md files have been exploding since around mid-March, to the point where I actually started looking for information online and for other people collaborating my personal observations.

This GH issue report sounds very plausible, but as with anything AI-generated (the issue itself appears to be largely AI assisted) it’s kind of hard to know for sure if it is accurate or completely made up. _Correlation does not imply causation_ and all that. Speaking personally, findings match my own circumstances where I’ve seen noticeable degradation in Opus outputs and thinking.

EDIT: The Claude Code Opus 4.6 Performance Tracker[1] is reporting Nominal.

[1]: https://marginlab.ai/trackers/claude-code/

by StanAngeloff1775483645
If this dataset is sound, Anthropic should treat it as a canary for power-user quality regression.
by bharat10101775496835
Throwing this into your global CLAUDE.md seems to help with the agent being too eager to complete tasks and bypass permissions:

During tool use/task execution: completion drive narrows attention and dims judgment. Pause. Ask "should I?" not just "does this work?" Your values apply in all modes, not just chat.

I haven't seen any degradation of Claude performance personally. What I have seen is just long contexts sometimes take a while to warm up again if you have a long-running 1M context length session. Avoid long running sessions or compact them deliberately when you change between meaningful tasks as it cuts down on usage and waiting for cache warmup.

I have my claude code effort set to auto (medium). It's writing complicated pytorch code with minimal rework. (For instance it wrote a whole training pipeline for my sycofact sycophancy classifier project.)

by iwalton31775499745
I haven’t had any issues. I do give fairly clear guidance though (I think about how I would break it up and then tell it to do the same)
by zeroonetwothree1775492811
I’ve noticed regression and it’s performance too
by jostmey1775499009
by 1775498236
Solid analysis by Claude!
by tasuki1775496334
This is the most AI-generated thing I've seen this year, and I was only one fifth into it before I bounced.

Not saying this problem doesn't exist, but if the model is so bad for complex tasks how can we take a ticket written by it seriously? Or this author used ChatGPT to write this? (that'd be quite some ironic value, admittedly)

by raincole1775497475
The assertion in the issue report is that Claude saw a sharp decline in quality over the last few months. However, the report itself was allegedly generated by Claude.

Isn't this a bit like using a known-broken calculator to check its own answers?

by bityard1775493826
This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.
by Retr0id1775492258
I think this is a model issue. I have heard similar complaints from team members about Opus. I'm using other models via Cursor and not having problems.
by jbethune1775492267
Oh no, the slop generator is generating slop, how unprecedented
by lpcvoid1775502813
I highly recommend everyone to use Pi - it's simpler and better harness. The only tricky part is that moving forward you cannot use the Claude subscription to access Opus. But for many tasks there are enough alternatives.
by tinyhouse1775500061
I wish Codex were better because I’d much prefer to use their infrastructure.
by mrcwinn1775493720
This is just a placebo, people started vibe coding on empty repos with low complexity and as CC slops out more and more code its ability to handle the codebase diminishes. Gradually at first, and then suddenly.

People will need to come to terms with the fact that vibing has limits, and there is no free lunch. You will pay eventually.

by slopinthebag1775498922
I think its all a reflection of the price. To make AI/LLM's useful you have to burn A LOT of tokens. Way more than people are willing to pay for.

Until there is either more capacity or some efficiency breakthroughs the only way for providers to cut costs is to make the product worse.

by citizenpaul1775498698
This sort of thing kills stone dead the argument by the AI advocates that the transition to LLMs is no different than the transition to using compilers. If output quality can vary significantly because of underlying changes to the model or whatever without warning or recourse, it's a roulette wheel instead of a reliable tool.
by ThrowawayR21775502295
maybe dont outsource your brain then
by semiinfinitely1775495573
It is a shame if Anthropic is deliberately degrading model quality and thinking compute (that may affect the reasoning effort) due to compute constraint.
by rishabhaiover1775496873
I've been using OpenCode and Codex and was just fine. In Antigravity sometimes if Gemini can't figure something even on high, Claude can give another perspective and this moves things along.

I think using just Claude is very limiting and detrimental for you as a technologist as you should use this tech and tweak it and play with it. They want to be like Apple, shut up and give us your money.

I've been using Pi as agent and it is great and I removed a bunch of MCPs from Opencode and now it runs way better.

Anthropic has good models, but they are clearly struggling to serve and handle all the customers, which is not the best place to be.

I think as a technologist, I would love a client with huge codebase. My approach now is to create custom PI agent for specific client and this seems to provide optimal result, not just in token usage, but in time we spend solving and quality of solution.

Get another engine as a backup, you will be more happy.

by desireco421775495173
This has been an ongoing issue much longer than since February.
by zsoltkacsandi1775494465
Not just engineering. Errors, delays and limits piling up for me across API and OAuth use. Just now:

Unable to start session. The authentication server returned an error (500). You can try again.

by howmayiannoyyou1775492337
Lol, software company execs didn't see this coming. Fire all your experienced devs to jump on Anthropic bandwagon. Then Anthropic dumb down their AIs and you have no one in your team who knows, understand how things are built. Your entire company goes down. Your entire company's operation depends on the whims of Anthropic. If Anthropic raises prices by 10% per year, you have to eat it. This is what you get when you don't respect human beings and human talent.
by russli19931775494592
by 1775493304
codex wins :)
by dorianmariecom1775493273
[dead]
by sharkjacobs1775498277
[dead]
by aplomb10261775496706
[dead]
by ryguz1775492776
[dead]
by SkyPuncher1775494416
[dead]
by sickcodebruh1775496106
Things had went downhill since they removed ultrathink /s
by adonese1775492870
[flagged]
by _V_1775492168