More

hi_hi · 2026-03-12T20:00:24 1773345624

For those who may be interested in learning more about the gut and how it affects your body and brain, this is a great, accessible, read

https://www.amazon.com/Gut-inside-story-bodys-under-rated/dp...

Also, while we're on the topic, if you ever find your self at the other end of the world in Tasmania, I highly recommend a visit to the MONA museum, which houses the Poo Machine.

https://www.abc.net.au/news/2017-11-07/mona-poo-machine-join...

hi_hi · 2026-03-11T02:36:52 1773196612

This _all_ (waves hands around) sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

Writing _all_ (waves hands around various llm wrapper git repos) these frameworks and harnesses, built on top of ever changing models sure doesn't feel sensible.

I don't know what the best way of using these things is, but from my personal experience, the defaults get me a looong way. Letting these things churn away overnight, burning money in the process, with no human oversight seems like something we'll collectively look back at in a few years and laugh about, like using PHP!

serial_dev · 2026-03-11T04:22:11 1773202931

> sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

Not if you are an AI gold rush shovel salesman.

From the article:

> I've run Claude Code workshops for over 100 engineers in the last six months

p0w3n3d · 2026-03-11T06:03:14 1773208994

Yeah, my colleague recently said "hey I've burnt through $200 in Claude in 3 days". And he was prompting. Max 8hrs/day Imagine what would happen if AI was prompting.

As I like this allegory really much, AI is (or should be) like and exoskeleton, should help people do things. If you step out of your car putting it first in drive mode, and going to sleep, next day it will be farther, but the question is, is it still on road

2026-03-11T13:16:42 1773235002

[dead]

malfist · 2026-03-11T16:27:58 1773246478

This comment reads very strongly like it was written by an LLM.

braebo · 2026-03-11T18:36:34 1773254194

Your sibling even more so.

aray07 · 2026-03-11T15:43:49 1773243829

Agreed. The spec file is context. Writing acceptance criteria before you prompt provides the context the agent needs to not go off in the wrong direction. Human leverage just moved up and the plan/spec is the most important step.

Parallelism on top of bad context just gets you more wrong answers faster

gnatolf · 2026-03-11T17:25:05 1773249905

Sorry but isn't the bottleneck then simply to do even relevant things? Like how much of a qualified backlog do you have that your pipeline does not run dry?

robutsume · 2026-03-11T16:02:39 1773244959

[flagged]

9wzYQbTYsAIc · 2026-03-11T16:53:46 1773248026

https://github.com/safety-quotient-lab/psychology-agent <- I've been exploring ways to track decisions, making some interesting findings, at the homelab scale, at least.

The cognitive architecture, so to speak, for the LLM can make a huge difference - triggers and skills go a long way when combined with shell scripts that dual-write.

viccis · 2026-03-11T15:39:01 1773243541

Reminds me of when I was looking for Obsidian note management workflows and every single person who posted about theirs used it to take notes on... note taking workflows.

owlcompliance · 2026-03-11T13:36:16 1773236176

Bingo.

brobdingnagians · 2026-03-11T07:09:27 1773212967

I would encourage my competitors to use AI agents on their codebase as much as possible. Make sure every new feature has it, lots of velocity! Run those suckers day and night. Don't review it, just make sure the feature is there! Then when the music stops, the AI companies hit the economic realities, go insolvent, and they are left with no one who understands a sprawling tangled web of code that is 80% AI generated, then we'll see who laughs last.

KronisLV · 2026-03-11T11:19:10 1773227950

> they are left with no one who understands a sprawling tangled web of code that is 80% [random people that I can't ask because they don't work here anymore and they didn't care to leave docs or comments] generated, then we'll see who laughs last.

Yes, this matches my experience with codebases before AI was a thing.

nananana9 · 2026-03-11T11:41:55 1773229315

Yes, but given a feature that should take say 100 lines of code, the average programmer will write in the order of 100 to 500 lines. If they're a heavy OOP user, maybe they'll write 10 classes that total 2000 lines. Regardless, worst case, it will be within ~2 orders of magnitude of a reasonable solution.

It's not that they're not trying to write the biggest clusterfuck possible and maximize suffering in the world, it's just that there's a human limit on how much garbage they can type out in their allocated time.

This is where AI revolutionizes things. You want 25,000 lines of React? On the backend? And a custom useEffect-backed database? Certainly!

palmotea · 2026-03-11T14:21:30 1773238890

> it's just that there's a human limit on how much garbage they can type out in their allocated time.

Another example where removing friction and constraints is a bad thing.

aray07 · 2026-03-11T15:39:23 1773243563

i think the friction has moved upstream - now it's working on the right thing and specifying what correct looks like. i don't think we are going back to a world where we will write code by hand again.

seunosewa · 2026-03-11T16:27:01 1773246421

Unless what you want to do isn't well represented in the training set.

Verdex · 2026-03-11T13:31:11 1773235871

Yeah, in the past the limiting factor was the human suffering of the engineer who had to try and fit the sprawling nightmare fuel into their brain.

The machine doesn't suffer. Or if it does nobody cares. People eventually start having panic attacks, the machine can just be reset.

I suspect that the end result is just driving further into the wilderness before reality sets in and you have to call an adult.

lugu · 2026-03-11T08:37:46 1773218266

Both be true at the same time: some teams spend a fortune on AI and the AI investments won't get the expected ROI (bubble collapse). What is sure is that a lot of capacity is been built and that capacity won't disappear.

What I could see happening in your scenario is the company suffers from diminishing return as every task becomes more expensive (new feature, debugging session, library update, refactoring, security audit, rollouts, infra cost). They could also end up with an incoherent gigantic product that doesn't make sense to their customer.

Both pitfall are avoidable, but they require focus and attention to detail. Things we still need humans for.

snoman · 2026-03-11T15:29:32 1773242972

> What is sure is that a lot of capacity is been built and that capacity won't disappear.

They really are subsidizing what will be an incredibly healthy used server equipment market in a year or two. Can’t wait. My homelab is going to be due for an upgrade.

evrenesat · 2026-03-11T08:36:30 1773218190

Qwen3 Coder Next and Qwen3.5-35B-A3B already very good and can be run on today's higher end home computers with good speed. Tomorrow's machines will not be slower but models are keep getting more efficient. A good sw engineer still would be valuable in Tomorrow's world but not as a software assembler.

ruszki · 2026-03-11T09:29:03 1773221343

Even cutting edge models are not very good. They are not even on mediocre level. Don’t get me wrong, they are improving, and they are awesome, but they are nowhere near good yet. Vibe coded projects have more bugs than features, their architecture and design system are terrible, and their tests are completely useless about half the time. If you want a good product you need to rewrite almost everything what’s written by LLMs. Probably this won’t be the case in a few years, but now even “very good” LLMs are not very good at all.

rwmj · 2026-03-11T10:18:18 1773224298

Not sure why you're being downvoted, this is very much my experience. When it matters (like, customer data is on the line) vibecoded projects are not just hilariously bad, but put you in legal danger.

We've so far found that Claude code is fine as a kind of better Coverity for uncovering memory leaks and similar. You have to check its work very carefully because about 1 time in 5 it just gets stuff wrong. It's great that it gets stuff right 4 times in 5 and produces natural code that fits into the style of the existing project, but it's nothing earth-shattering. We've had tools to detect memory leaks before.

We had someone attempt to translate one of our existing projects into Rust and the result was just wrong at a fundamental level. It did compile and pass its own tests, so if you had no idea about the problem space you might even have accepted its work.

9wzYQbTYsAIc · 2026-03-11T16:56:09 1773248169

With Claude Code now having a /plan mode - you can take your time and deliberate through architecture and design, collaboratively, instead of just sending a fire-and-forget. Much less buggy and saves time if you keep an eye on the output as you go, guiding it and catching defects, imho.

ruszki · 2026-03-11T22:41:27 1773268887

For that you need to create something which you know exactly how you want to code, or what architecture is needed. In other words, you would win basically nothing, because typing was never the real bottleneck (no matter what VIM and Emacs people would tell you).

LLMs also make mistakes even way lower level than those one pagers allow you to control with the planning mode. Which I use all the time btw. And anyway, they throw the plan out of the window immediately when their tried solutions don't work during execution, for example when a generated test is failing.

Btw, changing the plan after its generation is painful. It happens more than not that when I decline it with comments it generates a worse version of it, because it either miss things from the previous one which I never mentioned, or changes the architecture to a worse one completely. In my experience, it's better to restart the whole thing with a more precise prompt.

9wzYQbTYsAIc · 2026-03-11T22:58:20 1773269900

Ah, this is true - for my purposes, I've been directing the design and deliberating on the constraints and specifications for a larger system in tandem with smaller planning sessions.

That has worked well so far, but yes, you are totally right, there are still quite a few pain points and it is still rather far from being fire-and-forget "build me a fancy landing page for a turnkey business" and getting enterprise quality code.

edit: I think it is most important that you collaborate with Claude Code on quality in a systematic way, but even that has limits, right now - 1M context changes things a little bit.

pepoluan · 2026-03-12T01:45:28 1773279928

You know, with all the babysitting needed, I wonder if effort is not better spent in just, you know, writing code.

Can you actually quantify the time & effort 'saved' letting LLM generate code for you?

9wzYQbTYsAIc · 2026-03-12T02:19:34 1773281974

For me, personally, I'm building things that would have been impractical for me to do as cleanly within the same amount of time - prototypes in languages that I don't have the muscle memory for, using algorithms i have a surface level understanding of but would need time to deeply understand and implement by hand, and, at my pace, as a retired dev, is probably quantified in terms of years worth of time and effort saved.

edit: also, would I take the time to implement LCARS by hand? No. But with an LLM, sure, took it about 3 minutes or less to implement a pretty decent LCARS interface for me.

bojan · 2026-03-11T12:34:40 1773232480

> Tomorrow's machines will not be slower

The way it's going, the AI hyperscalers are buying such a big portion of the world's hardware, that it may very well happen that tomorrow's machines do get slower per dollar of purchase value.

weinzierl · 2026-03-11T11:23:22 1773228202

Not my experience. Current Qwen Coder is noteworthy but still far from good. Can't compare them with current commercial offerings, it is just different leagues.

otabdeveloper4 · 2026-03-11T08:57:54 1773219474

> Don't review it, just make sure the feature is there!

Bad idea. Use another agent to do automatic review. (And a third agent writing tests.)

Don't forget the architecting and orchestrating agent too!

baq · 2026-03-11T09:57:58 1773223078

Multiple agents with different frontier models for best results. Claude code/codex shops don’t know what they’re missing if they never let Gemini roast their designs, code and formal models.

9wzYQbTYsAIc · 2026-03-11T16:59:26 1773248366

This.

Claude Code wrote a blog article for me documenting a Gemini interaction that I manually operated. I found it quite interesting - the difference in "personalities", and the quality of Claude's output is stark in comparison to the Gemini's.

But still, best to have two sets of eyes.

mewpmewp2 · 2026-03-11T02:40:15 1773196815

I am not laughing about PHP. To this very day many of my best projects are built on PHP. And while last 7 years I have spent in full stack JavaScript/TypeScript environment it has never produced the same things I was actually able to do with PHP.

I actually feel that things I built 15 years ago in PHP were better than anything I am trying to achieve with modern things that gets outdated every 6 months.

giva · 2026-03-11T14:16:59 1773238619

I feel like today an engineer with a modern framework and AI con produce in an afternoon a product that deliver real value, something that 25 years ago would have required a full hour by a high schooler with MS Access.

vardalab · 2026-03-11T16:11:02 1773245462

I was building awesome things with Access 20 years ago. I loved that thing. I wasn't even a software engineer. I was in the EE, but I needed a way to track process and it definitely outperformed. And the best thing, it didn't cost us anything. Everybody already had access, lol. I had 40 people use it in production, manufacturing cutting edge stuff. Definitely beat spreadsheets because Access gave you gui for operators.

jack_pp · 2026-03-11T02:43:15 1773196995

what in God's Name could you do in PHP that you can't do in a modern framework?

tbossanova · 2026-03-11T03:40:14 1773200414

Nothing; but PHP, in experienced hands, will be waaay more productive for small-to-medium things. One issue is that experienced hands are increasingly hard to come by. Truly big, complicated things, built by large teams or numbers of teams, teams with a lot of average brains or AIs trained on average brains, will be better off in something like Typescript/React. And everyone wants to work on the big complicated stuff. So the "modern frameworks" will continue to dominate while smaller, more niche shops will wonder why they waste their time.

jack_pp · 2026-03-11T04:11:11 1773202271

I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful, app doesn't scale, high latency etc. What does their php code do? 95% of it is calling a DB.

You're telling me today with LLM power multiplier it's THAT much faster to write in PHP compared to something that can actually have a future?

frio · 2026-03-11T07:16:45 1773213405

“PHP was so easy and fast that they’ve built such a successful startup they now have scaling problems” is, as far as I can tell, an endorsement of PHP and not a criticism of it.

skeledrew · 2026-03-11T08:32:49 1773217969

I think the point here is that the scaling problem is hard because of PHP.

Imustaskforhelp · 2026-03-11T10:34:51 1773225291

Scaling can be hard in PHP at the same time GGP comment's about PHP being in productive hands and thus being one of the reasons why PHP worked for them. Both of these can be true at the same time.

And for what its worth, Typescript scaling, although better than PHP is still somewhat of an issue and If you want to have massive scaling, Elixir/ (to-an-extent gleam) are developed for solving the scalability problem especially with Phoenix framework in Elixir-land.

So I guess, jack_pp comment's about PHP can also be applied to an degree towards Typescript as well so we should all use elixir, and also within the TS framework the question can be asked for (sveltekit/solid vs next-js/react)

I am more on the svelte side of things but I see people who love react and same for those who love PHP. So my opinion is sort of that everyone can run in their own languages.

Golang is another language to be taken into consideration especially with Htmx/datastar-go/alpine.

hparadiz · 2026-03-11T11:05:42 1773227142

Scaling in PHP is easy. Has never actually been an issue in my entire career unless it was a badly designed database.

PUSH_AX · 2026-03-11T08:30:42 1773217842

Yes, startup success has a direct correlation to the language chosen for your CRUD api…

duggan · 2026-03-11T07:19:49 1773213589

> I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful

You can stop there! Sounds like PHP worked for them. Already doing better than 90% of startups.

watermelon0 · 2026-03-11T06:05:00 1773209100

If 95% of what app does is calling a DB, then the bottleneck is in the DB, not with the PHP.

You can use persistent DB connections, and app server such as FrankenPHP to persist state between requests, but that still wouldn't help if DB is the bottleneck.

imron · 2026-03-11T06:43:28 1773211408

Sometimes it’s still the app:

   rows = select all accounts
   for each row in rows:
       update row

But that’s not necessarily a PHP problem. N+1 queries are everywhere.

silon42 · 2026-03-11T07:39:01 1773214741

Depending on what you are doing, the above is not necessarily bad.. often much better than an SQL that locks an entire table (potentially blocking the whole DB, if this is one of the key tables).

NorwegianDude · 2026-03-11T07:55:10 1773215710

> I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful, app doesn't scale, high latency etc. What does their php code do? 95% of it is calling a DB.

So PHP worked perfectly, but the DB is slow? Your DB isn't going any faster by switching to something else, if that's what you think.

PHP is the future, where React has been heading for years.

usrusr · 2026-03-11T10:17:12 1773224232

> Your DB isn't going any faster by switching to something else, if that's what you think.

Only true if none of the DB accesses are about stuff that could live as state across requests in a server that wasn't php. Sure, for some of that the DB's caching will be just as good, but for others, not at all.

graemep · 2026-03-11T12:08:36 1773230916

That is possible, but it sounds unlikely to me.

In most cases you could add a shared cache to fix the problem - e.g. put your shared state in Redis, or in a file that is synced across servers (if its kept as state in a long running process it cannot need to be updated frequently).

nake89 · 2026-03-11T06:43:23 1773211403

Not scaling and high latency sound like a skill issue, not a PHP issue.

Chaosvex · 2026-03-11T12:01:53 1773230513

What does this even mean? If you've got scaling problems, it's not because you've used PHP.

arjvik · 2026-03-11T04:12:51 1773202371

by future do you mean Future<T> or metaphorical future? :)

withinboredom · 2026-03-11T22:33:12 1773268392

I see what you did there.

rurban · 2026-03-11T04:58:10 1773205090

PHP did better than python and perl. Python is doomed. PHP got a good jit already, a good OO lately, good frameworks, stable extensions. It has a company behind.

Unlike python or ruby which break right and left all the time on updates. you have to use bunkers of venvs, without any security updates. A nightmare.

PHP can scale and has a future.

Incipient · 2026-03-11T05:49:13 1773208153

Python is doomed? That's new.

You use python docker images pinned to a stable version (3.11 etc), and between bigger versions, you test and handle any breaking changes.

I feel like this approach applies to pretty much every language?

Who on earth raw dogs on "language:latest" and just hopes for the best?

Granted I wouldn't be running Facebook's backend on something like this. But i feel that isn't a problem 95% of people need to deal with.

rurban · 2026-03-11T06:16:13 1773209773

No, only to python. And partially ruby and ocaml. Not to typescript, perl or PHP.

pepoluan · 2026-03-12T01:53:56 1773280436

What's so bad about Python's virtualenv? It's a good way to have an LKGC (Last Known Good Configuration). When a CVE happens, you spin up a new venv and do smoke test, canary test, blue/green deployment, and so on.

If the update breaks things, you go back to the old venv. If the update goes well, then you just delete the old venv.

Not implementing security updates the right way is just human laziness and has nothing to do with the language in use.

skeledrew · 2026-03-11T08:38:08 1773218288

Introducing uv...

https://docs.astral.sh/uv/

rurban · 2026-03-11T10:18:55 1773224335

uv does not fix the need for venv's or docker containers. normal people update their libs with the hope to get problems fixed.

python people don't update their libs, because then everything will break right and left. so they keep their security problems running.

skeledrew · 2026-03-11T14:23:38 1773239018

No matter how you look at it, the dependencies have to go somewhere. Node uses node_modules, most compiled languages require compiled libraries (or they're a huge blob), etc. Idk about PHP but I'm pretty sure 3rd party things for any given app also live somewhere. Different ways of managing dependencies. It's recommended that venvs are used in Python because you may accidentally nuke a system script by doing global installs, and otherwise there still needs to be some sort of 3p version handling when you have multiple projects going.

Once something works in Python (which uv now makes trivial; before it could be a pain), updating 3rd party packages rarely cause breakage. But yes, I think many who use it hardly update, because things usually continue to work for years and the attack surface is pretty narrow[0]. Heck just a few days ago I checked out a project that I hadn't touched in years, which I wrote in Python 3.7; updated to 3.13 and it continued to just work. Compare to PHP which has a far higher attack surface[1] and often has breaking changes. I've heard a couple nightmare stories of a v7.x -> v8.x move being delayed because it required a serious codebase rewrite.

[0] https://www.cvedetails.com/product/18230/Python-Python.html?... [1] https://www.cvedetails.com/product/128/PHP-PHP.html?vendor_i...

stavros · 2026-03-11T08:23:30 1773217410

I don't think it's true that experienced hands will be faster in PHP than in Python or JS or whatever. It's just about what you know, and experienced hands are experienced.

ipaddr · 2026-03-11T17:13:40 1773249220

PHP is faster to develop in then Pythin or JS then addin a framework like Laravel and you are already done.

Python has the curse of spaces or tabs and JS has the curse of npm.

stavros · 2026-03-11T17:17:43 1773249463

PHP has the curse of T_PAAMAYIM_NEKUDOTAYIM.

withinboredom · 2026-03-11T22:34:53 1773268493

Wasn’t expecting to see Hebrew here today.

stavros · 2026-03-12T00:18:52 1773274732

Eh, you know, when the conversation has devolved to the level of "Python is slower to develop in than PHP because of spaces or tabs", you have to bust out the Hebrew.

mewpmewp2 · 2026-03-11T02:45:24 1773197124

You can build those things in modern frameworks, it will just be more headache and will feel outdated in 6 months.

the_lonely_time · 2026-03-11T02:59:19 1773197959

Where are my backbone apps? In the trash? Me ember apps? Next to them. My create-react-apps? On top of those. My Next apps? Being trashed as we speak. My rails apps? Online and making money every year with minimal upgrade time. What the hell was I thinking.

tdeck · 2026-03-11T05:43:49 1773207829

I'm guessing you avoided the CoffeeScript era of Rails, which is a good thing.

Gigachad · 2026-03-11T03:23:54 1773199434

6 years ago I was writing apps in typescript and react, if I was starting a new project today I'd write it in typescript and react.

ehnto · 2026-03-11T03:36:23 1773200183

People bicker about PHP and Javascript, sorry Typescript, like they aren't both mule languages peoppe pick up to get work done. They both matured really well through years of production use.

They are in the same group, similar pedigree. If you were programming purely for the art of it, you would have had time to discover much nicer languages than either, but that's not what most people are doing so it doesn't really matter. They're different but they're about as good as eachother.

dheera · 2026-03-11T03:45:28 1773200728

Not have to "build" anything. You edit code and it is already deployed on your dev instance.

Deploying to production is just scp -rv * production:/var/www/

Beautifully simple. No npm build crap.

jack_pp · 2026-03-11T04:08:15 1773202095

You trade having to compile for actually having code that can scale

ericd · 2026-03-11T04:27:06 1773203226

Not sure what you’re talking about, I scaled to millions of users on a pair of boxes with PHP, and its page generation time absolutely crushed Rails/Django times. Apache with mod PHP auto scales wonderfully.

lelanthran · 2026-03-11T05:13:34 1773206014

It scales just fine the same way everything else scales: put a load balancer in front of multiple instances of your app.

vachina · 2026-03-11T04:34:16 1773203656

It can scale by the virtue of spending a lot less time processing the request

spiderfarmer · 2026-03-11T11:59:22 1773230362

You don't know anything about the PHP ecosystem and it shows.

tjoff · 2026-03-11T07:59:09 1773215949

Making instant loading and user respecting sites.

baq · 2026-03-11T09:59:37 1773223177

Don’t confuse php the language with php the way of webmaster 2006 vintage.

spiderfarmer · 2026-03-11T11:57:03 1773230223

Those webmasters built the web a lot of people are now nostalgic about already.

azangru · 2026-03-11T12:12:46 1773231166

Could you give examples of the modern frameworks that you have in mind?

eqvinox · 2026-03-11T13:31:44 1773235904

The comparison target for PHP is IMHO a good Python web framework, e.g. Django being the most popular one. I still don't understand how JavaScript is ever considered viable, TypeScript makes it workable I guess…

brushfoot · 2026-03-11T08:49:45 1773218985

> sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

It's not more work; it's a convergence of roles. BA/PO/QA/SWE are merging.

AI has automated aspects of those roles that have made the traditional separation of concerns less desirable. A new hybrid role is emerging. The person writing these acceptance criteria can be the one guiding the AI to develop them.

So now we have dev-BAs or BA-devs or however you'd like to frame it. They're closer to the business than a dev might have been or closer to development than a BA might have been. The point is, smaller teams are able to play wider now.

humanfromearth9 · 2026-03-11T11:17:33 1773227853

Oh a modern comeback of the analyst-programmer?

otabdeveloper4 · 2026-03-11T08:56:33 1773219393

> It's not more work

It literally is. You're spending weeks of effort babysitting harnesses and evaluating models while shipping nothing at all.

brushfoot · 2026-03-11T09:06:54 1773220014

That hasn't been my experience, as a "ship or die" solopreneur. It takes work to set up these new processes and procedures, but it's like building a factory; you're able to produce more once they're in place.

And you're able to play wider, which is why the small team is king. Roles are converging both in technologies and in functions. That leads to more software that's tailored to niche use cases.

otabdeveloper4 · 2026-03-11T09:24:27 1773221067

> you're able to produce more once they're in place

Cool story, unfortunately the proof is not in the pudding and none of this fantom x10 vibe-coded software actually works or can be downloaded and used by real people.

P.S. Compare to AI-generated music which is actually a thing now and is everywhere on every streaming platform. If vibe coding was a real thing by now we'd have 10 vibecoded repos on Github for every real repo.

brushfoot · 2026-03-11T10:11:39 1773223899

There's no need to be rude with comments like "cool story." I'm sharing my experience with you. I'm not an AI-hype influencer. I'm a SWE who runs a small SaaS business.

Where it sounds like we agree is that there's some obnoxious marketing hype around LLMs. And people who think they can vibe code without careful attention to detail are mistaken. I'm with you there.

rafaelmn · 2026-03-11T09:29:35 1773221375

These people play around with shit and try to sell you on their secret sauce. If it actually works it will come to claude code - so you can consider them practical SOTA and honestly just plopping CC to a mid sized codebase is a pretty great experience for me already. Not ideal but I get real tangible value out of it. Not 10x or any such nonsense but enough to think that I don't think I want to be managing junior developers anymore, the ROI with LLMs is much faster and significant IMO.

godelski · 2026-03-11T05:04:22 1773205462

I can't believe we're back to advocating for TDD. It was a failed paradigm that last few times we tried it. This time isn't any different because the fundamental flaw has always been the same: tests aren't proofs, they don't have complete coverage.

Before anyone gets too confused, I love tests. They're great. They help a lot. But to believe they prove correctness is absolutely laughable. Even the most general tests are very narrow. I'm sure they help LLMs just as they help us, but they're not some cure all. You have to think long and hard about problems and shouldn't let tests drive your development. They're guardrails for checking bonds and reduce footguns.

Oh, who could have guessed, Dijkstra wrote about program completeness. (No, this isn't the foolishness of natural language programming, but it is about formalism ;)

https://www.cs.utexas.edu/~EWD/transcriptions/EWD02xx/EWD288...

josephg · 2026-03-11T07:00:07 1773212407

Testing works because tests are (essentially) a second, crappy implementation of your software. Tests only pass if both implementations of your software behave the same way. Usually that will only happen if the test and the code are both correct. Imagine if your code (without tests) has a 5% defect rate. And the tests have a 5% defect rate (with 100% test coverage). Then ideally, you will have a 5%^2 defect rate after fixing all the bugs. Which is 0.25%.

The price you pay for tests is that they need to be written and maintained. Writing and maintaining code is much more expensive than people think.

Or at least it used to be. Writing code with claude code is essentially free. But the defect rate has gone up. This makes TDD a better value proposition than ever.

TDD is also great because claude can fix bugs autonomously when it has a clear failing test case. A few weeks ago I used claude code and experts to write a big 300+ conformance test suite for JMAP. (JMAP is a protocol for email). For fun, I asked claude to implement a simple JMAP-only mail server in rust. Then I ran the test suite against claude's output. Something like 100 of the tests failed. Then I asked claude to fix all the bugs found by the test suite. It took about 45 minutes, but now the conformance test suite fully passes. I didn't need to prompt claude at all during that time. This style of TDD is a very human-time efficient way to work with an LLM.

mewpmewp2 · 2026-03-11T08:23:35 1773217415

I think there is a difference whether you do TDD or write tests after the fact to avoid regression. TDD can only work decently if you already know your specs very well, but not so much when you still need to figure them out, and need to build something actual to be able to figure it out.

josephg · 2026-03-11T09:38:35 1773221915

Yes; I think this remains true with coding agents. If you need to do some exploration of the solution space, it makes sense to do that before writing tests. Once you have a clear, workable design, you can get the agent to make a battery of tests to make sure the final product works correctly.

aray07 · 2026-03-11T15:42:36 1773243756

This is great. The tests in this case are the spec. When you give the agent something concrete to fail against, it knows what done looks like.

The problem is if you skip that step and ask Claude to write the tests after.

godelski · 2026-03-11T17:55:00 1773251700

  > Tests only pass if both implementations of your software behave the same way.

That's not true.

I even addressed this in my comment as did Dijkstra

josephg · 2026-03-12T00:04:15 1773273855

What is untrue about this statement you quoted?

godelski · 2026-03-12T03:48:24 1773287304

You can have software behave differently while passing the same tests.

Idk man, this is pretty easy to demonstrate. Start with a trivial example: test is that input (2,2) -> 4. Function 1 does multiplication, function 2 does exponentiation. Both functions pass the test.

Sure, simple example but illustrative examples should be simple. But add more complexity and I'll add more examples of functions where the outputs are the same for a given set of inputs. (There's a whole area of mathematics dedicated to this!) It's simple, but you also confidently claimed something that was trivial to disprove.

Your claim is true if and only if your tests have complete coverage. So, your claim is only true if you've done formal verification of your code. Which was what I said in the beginning and is what Dijkstra claimed as well.

josephg · 2026-03-12T07:35:15 1773300915

I mean, yeah, I thought that was obvious. If you want to be a pedant:

> Tests only pass if both implementations of your software behave the same way in the exact area being tested.

As I said in my comment above. Tests are a crappy second implementation. The test in your example isn’t even defined outside the input range of (2,2). Tests are a stochastic tool. Tests can prove the presence of a bug, not their absence. Completeness isn’t something tests alone can provide. But in the choice between yolo coding and yolo coding plus tests, you’re obviously going to get fewer bugs with tests.

theshrike79 · 2026-03-11T06:52:37 1773211957

When you write tests with LLM-generated code you're not trying to prove correctness in a mathematically sound way.

I think of it more as "locking" the behavior to whatever it currently is.

Either you do the red-green-with-multiple-adversarial-sub-agents -thing or just do the feature, poke the feature manually and if it looks good then you have the LLM write tests that confirm it keeps doing what it's supposed to do.

The #1 reason TDD failed is because writing tests is BOORIIIING. It's a bunch of repetition with slight variations of input parameters, a ton of boilerplate or helper functions that cover 80% of the cases, but the last 20% is even harder because you need to get around said helpers. Eventually everyone starts copy-pasting crap and then you get more mistakes into the tests.

LLMs will write 20 test cases with zero complaints in two minutes. Of course they're not perfect, but human made bulk tests rarely are either.

godelski · 2026-03-11T18:16:26 1773252986

  > you're not trying to prove correctness in a mathematically sound way.

  > "locking" the behavior to whatever it currently is.

These two sentences are incompatible

  > The #1 reason TDD failed is

Because spec is an ever evolving thing that cannot be determined a priori. And because it highly incentivized engineers to metric hack.

  > It's a bunch of repetition with slight variations

If that's how you're writing tests then you're writing them wrong. You have the wrong level of abstraction. Abstraction is not a dirty word. It solves these problems. Maybe juniors don't understand that abstraction and fuck it up while learning but making abstraction a dirty word is throwing the baby out with the bath water.

  > Eventually everyone starts copy-pasting crap

Which is a horrendous way to write code.

theshrike79 · 2026-03-11T20:49:47 1773262187

Locking behavior with tests isn't the same as comprehensive and foolproof tests. They might not cover every edge case, but will fail if the happy path starts failing for some reason.

And yes, copy-pasting is a horrendous way to write code, but everyone does it.

When you're adding the 1600th CRUD endpoint of your career to an enterprise Java/C# application, can you with all honesty say you will type every single character with the same thought and consideration every time?

Or do you just make one, copy-paste that one and modify accordingly?

Or if you write 20 unit tests with slight alterations you masterfully craft every single character to perfection?

I have a limited amount of energy to use every day, I choose to use it in places that matter. The hard bits that LLMs and copy-pasting can't speed up.

computerdork · 2026-03-11T05:24:43 1773206683

Hmm, not so sure TDD is a failed paradigm. Maybe it isn't a pancea, but it is seems like it's changed how software development is done.

Especially for backend software and also for tools, seems like automated tests can cover quite a lot of use cases a system encounters. Their coverage can become so good that they'll allow you to make major changes to the system, and as long as they pass the automated tests, you can feel relatively confident the system will work in prod (have seen this many times).

But maybe you're separating automated testing and TDD as two separate concepts?

prerok · 2026-03-11T06:15:33 1773209733

Indeed, they are two separate concepts.

I write lots of automated tests, but almost always after the development is finished. The only exception is when reproducing a bug, where I first write the test that reproduces it, then I fix the code.

TDD is about developing tests first then writing the code to make the tests pass. I know several people who gave it an honest try but gave up a few months later. They do advocate everyone should try the approach, though, simply because it will make you write production code that's easier to test later on.

computerdork · 2026-03-11T17:42:34 1773250954

... hmm, just looked it up. According to some sites on the web, TDD was created by Kent Beck as apart of Extreme Programming in the 90's and automated testing is a big part of TDD. Having lived through that era, thinking back, would say that TDD did help to popularize automated testing. It made us realize that focusing a ton on writing tests had a lot of benefits (and yeah, most of us didn't do the test first development part).

But this is kind of splitting hairs on what TDD is, not too important.

mewpmewp2 · 2026-03-11T08:27:25 1773217645

I think tests in general are good, just not TDD as it forces you to what I think bad and narrow paradigm of thinking. I think e.g. it is better that I build the thing, then get to 90%+ coverage once I am sure this is what I would also ship.

godelski · 2026-03-11T18:55:00 1773255300

That's the result I've seen with anyone who tries TDD. Their code ends up being very rigid, making it difficult to add new features and fix bugs. It just ends up making them over confident in their code's correctness. As if their code is bug free. It just seems like an excuse to not think and avoid doing the hard stuff.

godelski · 2026-03-11T18:50:33 1773255033

  > But maybe you're separating automated testing and TDD as two separate concepts?

I hope it's clear that I am given my content and how I stress I write tests. The existence of tests do not make development TDD.

The first D in TDD stands for "driven". While my sibling comment explains the traditional paradigm it can also be seen in an iterative sense. Like just developing a new feature or even a bug. You start with developing a test, treating it like spec, and then write code to that spec. Look at many of your sibling comments and you'll see that they follow this framing. Think carefully about it and adversarially. Can you figure out its failure mode? Everything has a failure mode, so it's important to know.

Having tests doesn't mean they drive the development. So there's many ways to develop software that aren't TDD but have tests. The important part is to not treat tests as proofs or spec. They are a measurement like any other; a hint. They can't prove correctness (that your code does what you intend it to do). They can't prove that it is bug free. But they hint at those things. Those things won't happen unless we formalize the code and not only is that costly in time to formalize but often will result in unacceptable computational overhead.

I'll give an example of why TDD is so bad. I taught a class a year ago (upper div Uni students) and gave them some skeleton code, a spec sheet, and some unit tests. I explicitly told them that the tests are similar to my private tests, which will be used to grade them, but that they should not rely on them for correctness and I encourage them to write their own. The next few months my office hours were filled with "but my code passes the tests" and me walking students through the tests and discussing their limitations along with the instructions. You'd be amazed at how often the same conversations happened with the same students over and over. A large portion of the class did this. Some just assumed tests had complete coverage and never questioned them while others read the tests and couldn't figure out their limits. But you know the students who never struggled in this way? The students who first approached the problem through design and even understood that even the spec sheet is a guide. That it tells requirements, not completeness. Since the homeworks built on one another those students had the easiest time. Some struggled at first, but many of them got the right levels of abstraction that I know I could throw new features at them and they could integrate without much hassle. They knew the spec wasn't complete. I mean of course it wasn't, we told them from the get go that their homeworks were increments to building a much larger program. And the only difference between that and real world programming is that that isn't always explicitly told to you and that the end goal is less clear. Which only makes this design style more important.

The only thing that should drive the software development is an unobtainable ideal (or literal correctness). A utopia. This prevents reduces metric hacking, as there is none to hack. It helps keep you flexible as you are unable to fool yourself into believing the code is bug free or "correct". Your code is either "good enough" or not. There's no "it's perfect" or "is correct", there's only triage. So I'll ask you even here, can you find the failure mode? Why is that question so important to this way of thinking?

computerdork · 2026-03-13T04:38:10 1773376690

Hmm, saying tests are just a hint seems to be under appreciating their significance. Yes, they do have bugs of their own, but as you said they are a measurement. Having them statistically reduces the chances of bugs reaching production. They don't remove them completely of course, but they do greatly decrease the rate of bugs (and have read the same thing, formal verification of the code is typically not worth the time and cost).

And just looked up TDD on wikipedia. Actually, the standard process is not to write all the tests first, then do the implementation. It's to do what a lot of devs already do, write some tests based on your requirements. Then, write the implementation for these tests. Then repeat, adding in more test for other paths through the system.

Didn't know this myself about TDD (I thought it was focus writing all the tests, then do the implementation). Yeah, TDD is actually a very practical approach and something I pretty much do in my own development. Instead of using a driver program to run your working code, just write unit tests to run it. And keep building your unit tests for every new feature or execution path you're working on. You'll miss a lot of them early on, but you fill out the rest at the end.

Now that I know, in my opinion, TDD was pretty amazing and changed our industry.

siva7 · 2026-03-11T11:07:39 1773227259

TDD and similiar test paradigms have all the same fundamental flaw -> It's testing for the sake of testing. You need to know exactly what you want in order to start, which isn't compatible with a competitive iterative workflow no matter how much TDD yells otherwise. TDD doesn't make sense in agile and fast iteration workflows, only in heavily regulated / restricted products.

tinodb · 2026-03-11T18:22:49 1773253369

It certainly isn’t. It is more a way of discovery on how to implement something, with the benefit of being able to safely (and thus easily) change it later.

The 99 Bottles book by Sandi Metz [0] is a good short display of how it works and where it helps actually building maintainable software

[0] https://sandimetz.com/99bottles

mvdtnz · 2026-03-11T05:30:51 1773207051

> But to believe they prove correctness is absolutely laughable.

You don't need to believe this to practice TDD. In fact I challenge you to find one single mainstream TDD advocate who believes this.

godelski · 2026-03-11T18:58:10 1773255490

https://news.ycombinator.com/item?id=47333160

skeledrew · 2026-03-11T08:51:56 1773219116

> But to believe they prove correctness is absolutely laughable.

Sounds like a lack of tests for the correct things.

godelski · 2026-03-11T18:57:44 1773255464

True, but I seriously doubt people are writing formal proofs for their code. I've only seen this in niche academic circles and high security/safety settings. I also am pretty certain it's not what you're suggesting, but hey, I could be wrong

globular-toast · 2026-03-11T11:28:53 1773228533

"You better work, bitch" -- Britney Spears

Our society is obsessed with work. Work will never end. If things become easier we just do more of them. Whether putting all our efforts into recycling things created by those that came before is good for us will remain to be seen.

Aushin · 2026-03-11T11:37:21 1773229041

Our society is obsessed with <the appearance of> work

thewhitetulip · 2026-03-11T14:02:26 1773237746

I saw a guys post on LinkedIn who created llm agent to water how plants based on sensor on his plants

He still has to water the plants on his own. Its just that it costs him quite a bit when all of that could he mamaged with an alarm to remind him to water plants.

dwedge · 2026-03-11T08:09:16 1773216556

It being a lot of work is why they didn't do it at all for weeks and still, without self reflection, wrote that they care about the code quality of the code they hadn't looked at or tested

ipaddr · 2026-03-11T17:06:39 1773248799

Looking back we see how foolish the anti-php memes were. Meanwhile PHP lives on and becomes better with each release.

Tooling around llms are a natural next step that will become your default one day.

spiderfarmer · 2026-03-11T11:54:43 1773230083

It's always the uber conservative and over principled people who laugh about using PHP that have an opinion on everything while not knowing how to get shit done.

They're all just tools. You decide how to use them.

philipwhiuk · 2026-03-11T13:39:35 1773236375

Sure but we can agree there's essentially two parallel industries in web development

Engineer at tech firms and WebShops writing WordPress plugins for single clients where Squarespace doesn't cut it.

Is AI another field of people or is it killing one or both of those. TBD

spiderfarmer · 2026-03-12T19:42:33 1773344553

To be fair my perception of the Javascript ecosystem and their one-day-fly frameworks is ruined by jQuery plugins developers.

EricEspen · 2026-03-11T13:41:19 1773236479

php still makes money though!

AmazingTurtle · 2026-03-11T13:08:21 1773234501

> like using PHP

lmao, chuckled

hi_hi · 2026-03-07T09:57:25 1772877445

Nah man. I understand the frustration, but this is a glass is half empty view.

You have decades of expert knowledge, which you can use to drive the LLMs in an expert way. Thats where the value is. The industry or narrative might not have figured that out yet, but its inevitable.

Garbage in, garbage out still very much applies in this new world.

And just to add, the key metric to good software hasn't changed, and won't change. It's not even about writing the code, the language, the style, the clever tricks. What really matters is how well does the code performs 1 month after it goes live, 6 months, 5 years. This game is a long game. And not just how well does the computer run the code, but how well can the humans work with the code.

Use your experience to generate the value from the LLMs, cuase they aren't going to generate anything by themselves.

Dumblydorr · 2026-03-07T12:06:33 1772885193

Glass half empty view? Their whole skill set built up over decades, digitized, and now they have to shift everything they do, and who knows humans will even be in the loop, if they’re not c-suite or brown nosers. Their whole magic and skill is now capable of being done by a PM in 5 minutes with some tokens. How is that supposed to make skillful coders feel?

Massive job cuts, bad job market, AI tools everywhere, probable bubble, it seems naive to be optimistic at this juncture.

mitchitized · 2026-03-07T13:02:27 1772888547

The world changes. Time marches on, and the very skills you spend your time developing will inevitably expire in their usefulness. Things that were once marvelous talents are now campfire stories or punchlines.

LLMs may be accelerating the process, but definitely not the cause.

If you want a career in technology, a durable one, you learn to adapt. Your primary skill is NOT to master a given technology, it is the ability to master a given technology. This is a university that has no graduation!

gzread · 2026-03-07T14:18:00 1772893080

Is it though? If it was that universal, we'd employ the best programmers as plumbers, since they have the best ability to master plumbing technology. There are limits, and I think the skill being to master programming technologies is a reasonable limit.

If you're a great programmer, can you can stop using Angular and master React? Yes. Can you stop telling the computer what to do, and master formal proof assistants? Maybe. Can you stop using the computer except as a tool and go master agricultural technology? Probably not. (Which is not to say you can't be a good programmer at an agritech company)

scoofy · 2026-03-08T00:35:55 1772930155

The “this wrecked my industry” sob story is especially rich when the vast majority of tech workers ability to demand premium salaries comes directly from creating software that makes existing jobs obsolete.

Let’s talk about the industries the computer killed: travel agents, musician, the entire film development industry, local newspapers built on classified ads, the encyclopedia industry, phone operators, projectionists, physical media industries, and a few dozen other random industries.

We aren’t special because we are coders. Creativity and engineering thoughtfulness will still exist even with LLMs, it will just take a different form.

reverius42 · 2026-03-08T11:21:33 1772968893

Since I love programming, I feel pretty lucky I got to live and work in the only few decades in which it's economically viable to work as a computer programmer. At least "musician" had a longer run, but I guess we had it coming.

overgard · 2026-03-07T16:05:48 1772899548

What exactly would people retrain into? The future these companies explicitly want is AI taking ALL the jobs, It's not like PMs are going to be any safer, or any other knowledge work. I see little evidence that AI is going to create new jobs other than a breathless assurance that it "always happens"

intended · 2026-03-08T13:17:32 1772975852

No, retraining has been tested and found to be unfeasible. Even if you throw money at it.

mactavish88 · 2026-03-07T12:31:38 1772886698

> Their whole skill set

This is the fundamental problem with how so many people think about LLMs. By the time you get to Principal, you've usually developed a range of skills where actual coding represents like 10% of what you need to do to get your job done.

People very often underestimate the sheer amount of "soft" skills required to perform well at Staff+ levels that would require true AGI to automate.

dopidopHN2 · 2026-03-07T12:42:21 1772887341

Yeah well. That's what we've been doing to other industries over and over.

I remember a cinema theater projectionist telling me exactly that while I was wiring a software controlling numeric projector, replacing the 35mm ones.

bmurphy1976 · 2026-03-07T15:48:01 1772898481

If a principal doesn't have the skills to mentor juniors, plan and define architecture, review work and follow a good process, they really shouldn't be considered a principal. A domain expert? Perhaps. A domain expert should fear for their job but a principal should be well rounded, flexible, and more than capable of guiding AI tooling to a good outcome.

seba_dos1 · 2026-03-07T12:34:29 1772886869

> Their whole magic and skill is now capable of being done by a PM in 5 minutes with some tokens.

[citation needed]

It has just merely moved from "almost, but not entirely useless" to "sometimes useful". The models themselves may perhaps be capable already, but they will need much better tooling than what's available today to get more useful that that, and since it's AI enthusiasts who will happily let LLMs code them that work on these tools it will still take a while to get there :)

merlindru · 2026-03-08T11:33:16 1772969596

> It has just merely moved from "almost, but not entirely useless" to "sometimes useful"

[citation needed]

:P

This thing has changed the way I work. I barely touch my editor to actually edit anymore, because speaking into the chat field what changes I want it to make is more efficient

The tooling does need to get better, yes, but anecdotally, I do a fundamentally different job (more thinking, less typing, less sifting through docs, less wiring up) than 3 months ago

So much of my career was spent on especially rummaging in docs and googling and wiring things up. I believe that's the same for most of us

shadowgovt · 2026-03-07T15:24:02 1772897042

I'm optimistic about people being able to build the things they always wanted to build but either didn't have the skills or resources to hire somebody who did.

If we truly value human creativity, then things that decrease the rote mechanical aspects of the job are enablers, not impediments.

bluefirebrand · 2026-03-07T16:32:50 1772901170

If we truly value human creativity we should stop building technology that decreases human value in the eyes of the rich and powerful

shadowgovt · 2026-03-08T01:01:23 1772931683

Or stop measuring ourselves by our reflection in their eyes.

Society can interpret sociopathy as damage and route around it, if we do the work to make it happen. It will not happen by itself without effort.

codazoda · 2026-03-07T13:38:11 1772890691

> What really matters is how well does the code performs 1 month after it goes live, 6 months, 5 years.

After 40 years in this industry—I started at 10 and hit 50 this year—I’ve developed a low tolerance for architectural decay.

Last night, I used Claude to spin up a website editor. My baseline for this project was a minimal JavaScript UI I’ve been running that clocks in at a lean 2.7KB (https://ponder.joeldare.com). It’s fast, it’s stable, and I understand every line. But for this session, I opted for Node and neglected to include my usual "zero-framework" constraint in the prompt.

The result is a functional, working piece of software that is also a total disaster. It’s a 48KB bundle with 5 direct dependencies—which exploded into 89 total dependencies. In a world where we prioritize "velocity" over maintenance, this is the status quo. For me, it’s unacceptable.

If a simple editor requires 89 third-party packages to exist, it won't survive the 5-year test. I'm going back to basics.

I'll try again but we NEED to expertly drive these tools, at least right now.

crazygringo · 2026-03-07T15:05:11 1772895911

I don't understand. You specifically:

> neglected to include my usual "zero-framework" constraint in the prompt

And then your complaint is that it included a bunch of dependencies?

AI's do what you tell them. I don't understand how you conclude:

> If a simple editor requires 89 third-party packages to exist

It obviously doesn't. Why even bother complaining about an AI's default choices when it's so trivial to change them just by asking?

codazoda · 2026-03-07T18:57:52 1772909872

My main point is that we need to expertly drive these tools. I forgot the trivial instruction and ended up with something that more closely resembles modern software instead of what I personally value. AI still requires our expertise to guide it. I'm not sure if that will be the case in a year, but it is today.

mlrtime · 2026-03-08T11:09:58 1772968198

You seem intelligent so it is probably confusing to many why you are posting this.

You call it a trivial instruction, but it is not trivial. It was a core requirement for your own design that you neglected to specify. This is not different than leaving out any other core requirement for a engineering specification.

Most people would NOT want this requirement. Meaning most people wouldn't care if there are package dependencies are not, so the agent 100% did the right thing.

barrkel · 2026-03-07T15:51:43 1772898703

I always tell Claude, choose your own stack but no node_modules.

What's missing is another LLM dialog between you and Claude. One that figures out your priorities, your non-functional requirements, and instructs Claude appropriately.

We'll get there.

aryehof · 2026-03-09T05:33:30 1773034410

This perhaps reflects the general divide in viewpoints on “vibe-coding”. Do you let go of everything (including understanding) and let it rip, or require control and standards to some degree. Current coding agents seem to promote the former. The only way with their approach, is to provide them with constraints?

ValentineC · 2026-03-07T17:38:52 1772905132

> What's missing is another LLM dialog between you and Claude. One that figures out your priorities, your non-functional requirements, and instructs Claude appropriately.

There are already spec frameworks that do precisely this. I've been using BMAD for planning and speccing out something fairly elaborate, and it's been a blast.

ACS_Solver · 2026-03-07T12:55:27 1772888127

Yes, I think this is reasonable.

I have been consistently skeptical of LLM coding but the latest batch of models seems to have crossed some threshold. Just like everyone, I've been reading lots of news about LLMs. A week ago I decided to give Claude a serious try - use it as the main tool for my current work, with a thought out context file, planning etc. The results are impressive, it took about four hours to do a non-trivial refactor I had wanted but would have needed a few days to complete myself. A simpler feature where I'd need an hour of mostly mechanical work got completed in ten minutes by Claude.

But, I was keeping a close eye on Claude's plan and gradual changes. On several occasions I corrected the model because it was going to do something too complicated, or neglected a corner case that might occur, or other such issues that need actual technical skill to spot.

Sure, now a PM whose only skills are PowerPoint and office politics can create a product demo, change the output formatting in a real program and so on. But the PM has no technical understanding and can't even prompt well, let alone guide the LLM as it makes a wrong choice.

Technical experts should be in as much demand as ever, once the delirious "nobody will need to touch code ever again gives way to a realistic understanding that LLMs, like every other tool, work much better in expert hands. The bigger question to me is how new experts are going to appear. If nobody's hiring junior devs because LLMs can do junior work faster and cheaper, how is anyone going to become an expert?

twodave · 2026-03-07T14:06:17 1772892377

> I have been consistently skeptical of LLM coding but the latest batch of models seems to have crossed some threshold.

It’s refreshing to hear I’m not the only one who feels this way. I went from using almost none of my copilot quota to burning through half of it in 3 days after switching to sonnet 4.6. I’m about to have to start lobbying for more tokens or buy my own subscription because it’s just that much more useful now.

ACS_Solver · 2026-03-07T20:02:42 1772913762

Yes, it's Sonnet 4.6 for me as well as the most impressive inflection point. I guess I find Anthropic's models to be the best, even before I found Sonnet 3.7 to be the only model that produced reasonable results, but now Sonnet 4.6 is genuinely useful. It seems to have resolved Claude's tendency to "fix" test failures by changing tests to expect the current output, it does a good job planning features, and I've been impressed by this model also telling me not to do things - like it would say, we can save 50 lines of code in this module but the resulting code would be much harder to read so it's better not to. Previous models in my experience all suffered from constantly wanting to make more changes, and more, and more.

I'm still not ready to sing praises about how awesome LLMs are, but after two years of incremental improvements since the first ChatGPT release, I feel these late-2025 models are the first substantial qualitative improvement.

luc_ · 2026-03-07T10:33:08 1772879588

^ Big this. If we take a pessimistic attitude, we're done for.

themacguffinman · 2026-03-07T13:19:44 1772889584

I think the key metric to good software has really changed, the bar has noticeably dropped.

I see unreliable software like openclaw explode in popularity while a Director of Alignment at Meta publicly shares how it shredded her inbox while continuing to use openclaw [1], because that's still good enough innit? I see much buggier releases from macOS & Windows. The biggest military in the world is insisting on getting rid of any existing safeguards and limitations on its AI use and is reportedly using Claude to pick bombing targets [2] in a bombing campaign that we know has made mistakes hitting hospitals [3] and a school [4]. AI-generated slop now floods social networks with high popularity and engagement.

It's a known effect that economies of scale lowers average quality but creates massive abundance. There never really was a fundamental quality bar to software or creative work, it just has to be barely better than not existing, and that bar is lower than you might imagine.

[1] https://x.com/summeryue0/status/2025774069124399363

[2] https://archive.ph/bDTxE

[3] https://www.reuters.com/world/middle-east/who-says-has-it-ha...

[4] https://www.nbcnews.com/world/iran/iran-school-strike-us-mil...

decker_dev · 2026-03-07T13:10:02 1772889002

[flagged]

eiriklv · 2026-03-07T15:38:02 1772897882

Is this a bot? I feel like HN is dying (for me at least) with all the em-dashes and the "it's not just X, it's Z".

MrDarcy · 2026-03-07T13:30:15 1772890215

This is correct. Had lunch with a senior staff engineer going for a promo to principal soon. He explained he was early to CC, became way more productive than his peers, and got the staff promo. Now he’s not sharing how he uses the agent so he maintains his lead over his peers.

This is so clearly a losing strategy. So clearly not even staff level performance let alone principal level.

MrDarcy · 2026-03-07T15:15:55 1772896555

Why the downvotes? It is the defining characteristic of the staff+ level to empower others. Individual contributions don’t matter at this level.

38591-123 · 2026-03-07T14:03:23 1772892203

Hi Grok, nice comment!

hi_hi · 2026-03-02T02:59:32 1772420372

I'm intrigued...how was he wrong?

protocolture · 2026-03-02T03:19:35 1772421575

The building in question wasnt really tall enough. And would have to be precision demolished to collapse in the way he was afraid of.

It would still cause chaos and possible power issues.

Needs to be taken in context with some Sydney buildings having maintenance defects a few years after they open. Largely due to inferior materials imported from china. The building in question developed some cracks in supporting beams and was briefly evacuated. There was never a chance it was going to topple on its own in a way that impacted more than 1/2 datacenters, so he pivoted to possible terrorism, but even thats largely nonsensical.

I just went hunting for the case and couldnt find it. The gentleman in question had published the claim to his business that was it happens trying to build contacts with defense and intelligence agencies for third party threat assessment. As far as I can tell the business no longer exists and he has deleted their footprint.

But he also posted the claim on public mailing lists so I can probably trawl it up if necessary.

hi_hi · 2026-02-26T23:37:45 1772149065

This is a great and worthwhile discussion. People are loosing sight of what art is. The art is the idea, not the medium. And just because something is easy, doesn't mean it will be good.

I've seen some fantastic original pictures that actual artists have generated through AI. I can't wait to see what current and future artists can do with the new tools at their disposal.

hi_hi · 2026-02-26T23:30:04 1772148604

I don't understand the fascination and focus on Vibe Coding.

Sure, you can do that, it's an option, but no serious engineering effort is being left entirely up to the AI.

Vibe coding is essentially the Jackson Pollock approach to software building. Throw a bunch of paint down, with very little control, and look, we have something novel.

It doesn't mean your going to replace all the ways of making art with paint throwing.

I'd love to start seeing more discussions about alternative approaches to working with AI. The recent Vinext article was great https://blog.cloudflare.com/vinext/. This seems to be "the way" for working with AI in a high stakes production environment, but what other ways are there.

I fear the focus on vibe coding is diluting and taking focus away from far better alternatives. Maybe because the narrative around those aren't quite so dramatic?

hi_hi · 2026-02-20T03:06:37 1771556797

Not sure I understand this distinction.

> I’ve always wanted an ARM server in my homelab. But earlier, I either had to use an underpowered ARM system, or use Asahi...

What is stopping you using Mac with MacOS?

tiew9Vii · 2026-02-20T03:24:00 1771557840

It’s not great as a headless server.

With full disk encryption enabled you need a keyboard and display attached at boot to unlock it. You then need to sign in to your account to start services. You can use an IP based KVM but that’s another thing to manage.

If you use Docker, it runs in a vm instead of native.

With a Linux based ARM box you can use full disk encryption, use drop bear to ssh in on boot to unlock disks, native docker, ability to run proxmox etc.

Mac minis/studio have potential to be great low powered home servers but Apple is not going down that route for consumers. I’d be curious if they are using their own silicon and own server oriented distro internally for some things.

azov · 2026-02-20T03:51:58 1771559518

They fixed unlock in the last release:

"On a Mac with Apple silicon with macOS 26 or later, FileVault can be unlocked over SSH after a restart if Remote Login is turned on and a network connection is available."

https://support.apple.com/guide/security/managing-filevault-...

dd_xplore · 2026-02-20T11:44:51 1771587891

That's a bad implementation and seems like a bad afterthought

hi_hi · 2026-02-20T06:17:15 1771568235

Thanks for the reply. I'm looking to replace my aging mini pc with a mac mini, so I'm quite interested in any limitations here.

The full disk encryption I can live without. I'm assuming these limitations don't apply if it's disabled. [Ah, I just saw the other reply that this has now been fixed]

I was aware of the Docker in a VM issue. I haven't tested this out yet, but my expectation is this can be mitigated via https://github.com/apple/container ?

I appreciate any insights here.

justincormack · 2026-02-20T09:06:42 1771578402

Apple containers run in one vm per container

unsnap_biceps · 2026-02-20T04:38:42 1771562322

    The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot.

https://security.apple.com/blog/private-cloud-compute/

Granted, I don't know if it's really server oriented or if they're a bunch of iPhones on cards plugged into existing servers.

geerlingguy · 2026-02-20T03:09:19 1771556959

Most likely wanting to run Linux natively. Only M1/M2 can fill that role with Asahi, and still not with 100% hardware compatibility.

On the flip side, an M4 mini is cheaper, faster, much smaller (with built in power supply) and much more efficient. Plus for most applications, they can run in a Linux container just as well.

hi_hi · 2026-02-20T06:23:39 1771568619

Thanks for the reply Jeff. This aligns with my understanding too. I'm close to purchasing a mac mini to replace my aging media pc. The core feature I want is to run microK8s natively, which I'm assuming the newish Mac containers will support.

hi_hi · 2026-02-18T00:14:54 1771373694

You can't talk about hobby tunnels without mentioning the Moleman of Hackney :-)

- https://en.wikipedia.org/wiki/William_Lyttle

- The house was later brought back to life in an amazing way https://www.vogue.co.uk/arts-and-lifestyle/article/sue-webst...

Nursie · 2026-02-18T04:18:46 1771388326

That is fascinating!

But I feel he was an amateur compared to Joseph Williamson :)

https://en.wikipedia.org/wiki/Williamson_Tunnels

hi_hi · 2026-02-18T05:16:02 1771391762

Brilliant. This indeed wins :-)

elihu · 2026-02-19T02:16:42 1771467402

Also worth mentioning Baldassare Forestiere, who spent about 40 years digging an extensive network of tunnels in Fresno, CA.

https://en.wikipedia.org/wiki/Forestiere_Underground_Gardens

jacquesm · 2026-02-18T08:43:40 1771404220

I haven't seen the Public Image Ltd logo in a very long time.

hi_hi · 2026-02-16T22:12:46 1771279966

Is it just me, or does there appear to be a big gap in how people understand this works?

There is no magic here. Replace "car" with some nonsense word the LLM hasn't encountered before. It will completely ignore the small amount of nonsense you have provided, and confidently tell you to walk, while assuming you are talking about a car. I'm fairly confident the first time this was tried using "car", it told them to walk.

"I want to wash my flobbergammer. The flobbergammer wash place is only 50 meters away. should I drive or walk."

Reply:

If it’s only *50 meters away*, definitely *walk*.

That’s about a 30–45 second walk for most people. Driving would likely:

* Take longer (getting in, starting the car, parking) * Waste fuel * Add unnecessary wear to your car * Be objectively funny in a “why did I do this” kind of way

The only reasons to drive would be:

* The flobbergammer is extremely heavy * Severe weather * You have mobility limitations

Otherwise, enjoy the short stroll. Your future self will approve.

Via chatGPT free tier. Paid Claude Sonnet 4.5 Extended gives me:

For just 50 meters, you should definitely walk! That's an incredibly short distance - less than a minute on foot. By the time you'd get in your car, start it, drive, and park, you could have already walked there and back. Plus, you'd avoid the hassle of finding parking for such a short trip. Walking is easier, faster, better for the environment, and you'll get a bit of movement in. Save the car for longer distances!

itronitron · 2026-02-16T22:23:26 1771280606

The car wash is 50 parsecs away, should I walk, drive, or jump?

hi_hi · 2026-02-16T02:16:36 1771208196

Here's a thought. Lets all arbitrarily agree AGI is here. I can't even be bothered discussing what the definition of AGI is. It's just here, accept it. Or vice versa.

Now what....? Whats happening right now that should make me care that AGI is here (or not). Whats the magic thing thats happening with AGI that wasn't happening before?

Right, so, not much has changed from 1-2 years ago that I can tell. The job markets a bit shit if you're in software...is that what we get for billions of dollars spent?

hackyhacky · 2026-02-16T04:43:56 1771217036

Cultural changes take time. It took decades for the internet to move from nerdy curiosity to an essential part of everyone's life.

The writing is on the wall. Even if there's no new advances in technology, the current state is upending jobs, education, media, etc

themafia · 2026-02-16T05:48:51 1771220931

> It took decades

It took one September. Then as soon as you could take payments on the internet the rest was inevitable and in _clear_ demand. People got on long waiting lists just to get the technology in their homes.

> no new advances in technology

The reason the internet became so accessible is because Moore was generally correct. There was two corresponding exponential processes that vastly changed the available rate of adoption. This wasn't at all like cars being introduced into society. This was a monumental shift.

I see no advances in LLMs that suggest any form of the same exponential processes exist. In fact the inverse is true. They're not reducing power budgets fast enough to even imagine that they're anywhere near AGI, and even if they were, that they'd ever be able to sustainably power it.

> the current state is upending jobs

The difference is companies fought _against_ the internet because it was so disruptive to their business model. This is quite the opposite. We don't have a labor crisis, we have a retention crisis, because companies do not want to pay fair value for labor. We can wax on and off about technology, and perceptrons, and training techniques, or power budgets, but this fundamental fact seems the hardest to ignore.

If they're wrong this all collapses. If I'm wrong I can learn how to write prompts in a week.

hackyhacky · 2026-02-16T06:45:59 1771224359

> It took one September.

It's the classic "slowly, then suddenly" paradigm. It took decades to get to that one September. Then years more before we all had internet in our pocket.

> The reason the internet became so accessible is because Moore was generally correct.

Can you explain how Moore's law is relevant to the rise of the internet? People didn't start buying couches online because their home computer lacked sufficient compute power.

> I see no advances in LLMs that suggest any form of the same exponential processes exist.

LLMs have seen enormous growth in power over the last 3 years. Nothing else comes close. I think they'll continue to get better, but critically: even if LLMs stay exactly as powerful as they are today, it's enough to disrupt society. IMHO we're already at AGI.

> The difference is companies fought _against_ the internet

Some did, some didn't. As in any cultural shift, there were winners and losers. In this shift, too, there will be winner and losers. The panicked spending on data centers right now is a symptom of the desire to be on the right side of that.

> because companies do not want to pay fair value for labor.

Companies have never wanted to pay fair value for labor. That's a fundamental attribute of companies, arising as a consequence of the system of incentives provided in capitalism. In the past, there have been opportunities for labor to fight back: government regulation, unions. This time that won't help.

> If I'm wrong I can learn how to write prompts in a week.

Why would you think that anyone would want you to write prompts?

nubg · 2026-02-17T01:25:35 1771291535

what September?

hackyhacky · 2026-02-17T02:20:08 1771294808

This is an allusion to the old days, before the internet became a popular phenomenon. It used to be, that every September a bunch of "newbies" (college student who just access to an internet connection for the first time) would log in and make a mess of things. Then, in the late nineties when it really took off, everybody logged in and made a mess of things. This is this the "eternal september." [1]

[1] https://en.wikipedia.org/wiki/Eternal_September

materielle · 2026-02-16T04:51:18 1771217478

I really think corporations are overplaying their hand if they think they can transform society once again in the next 10 years.

Rapid de industrialization followed by the internet and social media almost broke our society.

Also, I don’t think people necessarily realize how close we were to the cliff in 2007.

I think another transformation now would rip society apart rather than take us to the great beyond.

foo42 · 2026-02-16T09:17:05 1771233425

I worry that if the reality lives up to investors dreams it will be massively disruptive for society which will lead us down dark paths. On the other hand if it _doesn't_ live up to their dreams, then there is so much invested in that dream financially that it will lead to massive societal disruption when the public is left holding the bag, which will also lead us down dark paths.

pydry · 2026-02-16T11:50:15 1771242615

It's already made it impossible to trust half of the content i read online.

Whenever i use search terms to ask a specific question these days theres usually a page of slop dedicated to the answer which appears top for relevancy.

Once i realize it is slop i realize the relevant information could be hallicinated so i cant trust it.

At the same time im seeing a huge upswing in probable human created content being accused of being slop.

We're seeing a tragedy of the information commons play out on an enormous scale at hyperspeed.

Induane · 2026-02-16T15:57:01 1771257421

You trust nearly half??!!??

hackyhacky · 2026-02-16T05:30:16 1771219816

I think corporations can definitely transform society in the near future. I don't think it will be a positive transformation, but it will be a transformation.

Most of all, AI will exacerbate the lack of trust in people and institutions that was kicked into high gear by the internet. It will be easy and cheap to convince large numbers of people about almost anything.

BobbyJo · 2026-02-16T05:01:43 1771218103

As a young adult in 2007, what cliff were we close to?

The GFC was a big recession, but I never thought society was near collapse.

edmundsauto · 2026-02-16T05:31:52 1771219912

We were pretty close to a collapse of the existing financial system. Maybe we’d be better off now if it happened, but the interim devastation would have been costly.

zeroonetwothree · 2026-02-16T05:29:51 1771219791

It felt like the entire global financial system had a chance of collapsing.

verzali · 2026-02-16T11:14:12 1771240452

We weren't that far away from ATMs refusing to hand out cash, banks limiting withdrawals from accounts (if your bank hadn't already gone under), and a subsequent complete collapse of the financial system. The only thing that saved us from that was an extraordinary intervention by governments, something I am not sure they would be capable of doing today.

the1st · 2026-02-16T16:57:44 1771261064

I'm still not buying that AI will change society anywhere as much as the internet or smart phones for the matter.

The internet made it so that you can share and access information in a few minute if not seconds.

Smart phones build on the internet by making this sharing and access of information could done from anywhere and by anyone.

AI seems occupies the same space as google in the broader internet ecosystem.I dont know what AI provides me that a few hours of Google searches. It makes information retrieval faster, but that was the never the hard part. The hard part was understanding the information, so that you're able to apply it to your particalar situation.

Being able to write to-do apps X1000 faster is not innovation!

graemep · 2026-02-16T13:24:51 1771248291

You are assuming that the change can only happen in the west.

The rest of the world has mostly been experiencing industrialisation, and was only indirectly affected by the great crash.

If there is a transformation in the rest of the world the west cannot escape it.

A lot of people in the west seem to have their heads in the sand, very much like when Japan and China tried to ignore the west.

China is the world's second biggest economy by nominal GDP, India the fourth. We have a globalised economy where everything is interlinked.

expedition32 · 2026-02-16T14:32:45 1771252365

When I look at my own country it has proven to be open to change. There are people alive today who remember Christianity now we swear in a gay prime minister.

In that sense Western countries have proven that they are intellectualy very nimble.

graemep · 2026-02-16T15:31:54 1771255914

Three of the best known Christians I have known in my life are gay. Two are priests (one Anglican, one Catholic). Obviously the Catholic priest had taken a vow of celibacy anyway to its entirely immaterial. I did read an interview of a celeb friend (also now a priest!) of his that said he (the priest I knew) thought people did not know he was gay we all knew, just did not make a fuss about it.

Even if you accept the idea that gay sex is a sin, the entire basis of Christianity is that we are all sinners. Possessing wealth is a failure to follow Jesus's commands for instance. You should be complaining a lot more if the prime minister is rich. Adultery is clearly a more serious sin than having the wrong sort of sex, and I bet your country has had adulterous prime ministers (the UK certainly has had many!).

I think Christians who are obsessed with homosexuality as somehow making people worse than the rest of us, are both failing to understand Christ's message, and saying more about themselves than gays.

If you look at when sodomy laws were abolished, countries with a Christian heritage lead this. There are reasons in the Christian ethos if choice and redemption for this.

hackyhacky · 2026-02-16T14:36:56 1771252616

> people alive today who remember Christianity now we swear in a gay prime minister

Why would that be a contradiction? Gay people can't be Christian?