One can find opinions that Claude Code Opus 4 is worth the monthly $200 I pay for Anthropic's Max plan. Opus 4 is smarter; one either can't afford to use it, or can't afford not to use it. I'm in the latter group.
One feature others have noted is that the Opus 4 context buffer rarely "wears out" in a work session. It can, and one needs to recognize this and start over. With other agents, it was my routine experience that I'd be lucky to get an hour before having to restart my agent. A reliable way to induce this "cliff" is to let AI take on a much too hard problem in one step, then flail helplessly trying to fix their mess. Vibe-coding an unsuitable problem. One can even kill Opus 4 this way, but that's no way to run a race horse.
Some "persistence of memory" harness is as important as one's testing harness, for effective AI coding. With the right care having AI edit its own context prompts for orienting new sessions, this all matters less. AI is spectacularly bad at breaking problems into small steps without our guidance, and small steps done right can be different sessions. I'll regularly start new sessions when I have a hunch that this will get me better focus for the next step. So the cliff isn't so important. But Opus 4 is smarter in other ways.
Sometimes after it flails for a while, but I think it's on the right path, I'll rewind the context to just before it started trying to solve the problem (but keep the code changes). And I'll tell it "I got this other guy to attempt what we just talked about, but it still has some problems."
Snipping out the flailing in this way seems to help.
One feature others have noted is that the Opus 4 context buffer rarely "wears out" in a work session. It can, and one needs to recognize this and start over. With other agents, it was my routine experience that I'd be lucky to get an hour before having to restart my agent. A reliable way to induce this "cliff" is to let AI take on a much too hard problem in one step, then flail helplessly trying to fix their mess. Vibe-coding an unsuitable problem. One can even kill Opus 4 this way, but that's no way to run a race horse.
Some "persistence of memory" harness is as important as one's testing harness, for effective AI coding. With the right care having AI edit its own context prompts for orienting new sessions, this all matters less. AI is spectacularly bad at breaking problems into small steps without our guidance, and small steps done right can be different sessions. I'll regularly start new sessions when I have a hunch that this will get me better focus for the next step. So the cliff isn't so important. But Opus 4 is smarter in other ways.