Hacker Newsnew | past | comments | ask | show | jobs | submit | simonw's commentslogin

Confusing headline. This is "Tech CEOs are breaking Tesler’s Law", which states that you can't eliminate irreducible complexity from your product. The argument is that Tech CEOs think they can replace their workforce with generative AI, which violates that law because it takes humans to design for human problems.

You could ask the same question about any tool ever created. Users who figure out ways to use their agents that are profitable to them make money. Everyone else spends money.

> Users who figure out ways to use their agents that are profitable to them make money

Does anyone know any of these users? Most of the agentic-coding boosters seem to be pretty much exclusively building personal knowledge bases and more agentic coding tools



Comments moved thither. Thanks!

> Pew writes that 44 percent of U.S. adults now say they use OpenAI’s chatbot, a figure that’s more than doubled since 2023.

> The next most popular chatbot is Gemini (24 percent), followed by Copilot (17 percent) and MetaAI (14 percent), with Grok (8 percent), Claude (6 percent) and Character.ai (3 percent) lagging behind.

Claude in 6th place, behind Gemini and Copilot and MetaAI and Grok?

No wonder the general public still think AI is junk.

Update: here's the underlying report: https://www.pewresearch.org/internet/2026/06/17/americans-an...

The question there was "% of U.S. adults who say they ever use the following AI chatbots", so it's not a measure of overall usage, just exposure. Not surprising Gemini and Grok and MetaAI rank higher then.


I think there is a valid point here that Anthropic has a found a great product-market fit among programmers.

By comparison, all the rest of the tools non-programmers get exposure to are floundering around trying to be everything to everyone. It's a push not a pull.

The rest of the pack, when given everyday real-world computing tasks, for people that don't know what a terminal is, just suck. (e.g. "copilot, fix the spacing issue in this word document" or literally any apple genmoji attempt with more than two basic english words)


I had a big culture shock moment when I had to prep some slides a few weeks back. I'd assumed it would be a breeze now: I've always been good at making slide decks, I had a clear classification-friendly idea of exactly what I wanted them to look like, and there's even an AI native integration! Nope, didn't work, just had to shuffle components around like I always have.

Ah, they're using the wrong model, of course. "AI" hasn't failed, it's the users who are wrong.

Is Claude really that much better than all the others for normal use?

It's not that Claude is better, it's that Gemini and Copilot are so overwhelmingly bad.

Especially the free tiers. Meta AI, too.

Weaker models and less powerful harnesses give people a very sub-par experience compared to what you get if you pay for access to the better tools and models.


Normal users aren't using harnesses in the sense developers think of them. They're interacting with models where they've been shoehorned in for no good reason, or they're using them nearly entirely through chat interfaces.

Yes. There’s not really any doubt about it.

Yeah, I'm sure the public will totally 180 on AI as soon as the newest model release rolls out lmfao

People here genuinely think public perception on AI is a model issue, that's how you know it's become an echo chamber.

> What happened in 2025 was this: the economics of code production were turned upside down. Instead of being very hard, time-consuming, and expensive to generate code, it became effectively free and instant. Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight.

I've been thinking about this a whole lot recently. So much of my intuition about software development is based on 25 years of accumulated experience on how long it will take to write different bits of code.

Should I add validation for this one edge-case which won't break everything but will make a little bit of a mess if someone hits it? If that's an extra couple of hours of code I might skip it. If it's one more prompt, why wouldn't I?

This new feature would be a lot easier to understand if there was a custom API explorer for it. There's no way I could justify investing in that... unless it's just 10 minutes with Codex, and it was: https://tools.simonwillison.net/datasette-extras-explorer#ur... (linked from the release notes https://docs.datasette.io/en/latest/changelog.html#extra-sup...)

That's just on the small scale. There are entire projects that I'd never previously have considered, because I don't need a custom SQLite SELECT query parsing library enough to justify spending a week or more building one. But now... https://github.com/simonw/sqlite-ast

People get VERY upset (and condescending) any time you suggest that being able to produce lines of code faster is a valuable thing. And sure, measuring output through "lines of code" is stupid.

But measuring "lines of verified code that deliver valuable" isn't stupid at all. That's the thing we can do faster now.


I’m gonna say this in the most polite way that I can but who cares?

Look around you - google is valuable because it hoovers up data to generate revenue from advertising and has minimal expenditures compared with the revenues. All those bets? Lol yeah what about them?

Engineering for the sake of engineering has no value to the economy - aka it’s irrelevant. It’s the hard truth nobody wants to hear. There’s a limited set of things that can existence in the economy at any given moment in time - only those that provide value and can be sustained w.r.t economics stay the course.


> Engineering for the sake of engineering has no value to the economy

I think that's the adventure we're on now. If recreating something is low cost, what is the value in investing in designing it well in the first place? We can empirically discover issues and the the AI to address them.

I certainly routinely find in supervising what the LLM is writing that it's making terrible internal design choices and correct them. Usually things one level up from code. "This will cache every image on the client and cause a huge amount of bloat. Change it to pull the image in real time from the server" kind of stuff. You do slowly build that up in the project documentation - "Never store unnecessary data on the client: we assume they are using low powered devices without substantial storage". But it takes time and the road to discovering that empirically is through a lot of unhappy users.

So I think there is still a lot of room for genuine engineering - that is, at the technical design level. Levels up from that - code structure etc - are much less clear. I am guessing that over time we will heavily optimise code written by AI for maintenance by AI. Which may be mostly about matching the context window to the code module size. Factoring something to 5 modules may be less of a good idea if it means the context window has to hold all of them for the LLM to work. But that is the path of discovery we are on which history tells us is a 20 year journey.


> Engineering for the sake of engineering has no value to the economy - aka it’s irrelevant.

I would put that as my signature if HN supported that. I see a lot of systems being built where the whole point seems to be about the ritual, not anything valuable for the user.


>People get VERY upset (and condescending) any time you suggest that being able to produce lines of code faster is a valuable thing.

I think some people care about understanding things they have to attach their names to. Many obviously don't care, but others do.


I was surprised that GLM 5.1/5.2 are not vision models - they are text input only.

That's actually pretty uncommon these days. All of the OpenAI/Anthropic/Gemini models accept images, and so do the other leading open weight families - Gemma 4, Qwen 3.6, Kimi 2.x.

In GLM's case image input would be useful because it's a model that scores very highly for tasks like web design, but without image input it can't take a screenshot and output HTML+CSS.

Don't get me wrong, GLM is a phenomenal model, but the image thing is a bit of a gap.


Configure a subagent in your coding harness to spin up a new sub-session with any vision model for those tasks and feed the result back to the main model. No need for "one model that does everything"

That doesn’t work well in a lot of scenarios. The text LLM doesn’t know what to look for in an image before it sees a description, you might need multiple rounds of back and forth.

Vision decoding outside of the latent space of the model is lossy, but claude opus's vision isn't that great outside of UI screenshots. I mean it works in a pinch. At least in my testing, if you're looking at non UI images, there are better image to text models that can turn into a very precise documents that any LLM can easily parse.

Are you suggesting it should summarize the image in text or generate it in HTML or something else?

Agreed, that's actually one step that will make people adopt it widely for customer facing AI Agent!

I don't see this being such a big gap. There are some use-cases for sure but apart from UX/UI work it is not really needed. Besides, none of the frontier models can replicate actual images - the can approximate at least in my own experience.

One of my tests for a new model is dumping in a screenshot of a web page and seeing if it can recreate it from scratch in HTML and CSS.

Even the local models I run on my Mac are getting surprisingly good at that now.


a pretty fun and quick tests i do with vision models is to screenshot the hackernews homepage and ask the model to return a json representation of the screenshot - qwen 3.5 0.8b did surprisingly well at this.

Using llms to generate docx. Being able to rasterize and review is an important part of the process.

I've been using Google ai studio as a free vision bridge. Gemma 31B is dummy capable at vision and at 1500 rpd its basically unlimited.

I had the same reaction with Deepseek V4 ! It would be more useful as a vision model

Greg Brockman spent $25m on that out of his own money, it won't show up in the OpenAI figures. https://www.theverge.com/ai-artificial-intelligence/867947/o...

Anthropic are apparently making more revenue than OpenAI, and their government contracts have famously been curtailed.

They are different things. Government money is very predictable and consistent, and based on different calculations that typical consumer-oriented sales. Profits are usually easier.

"OpenAI generated $13.07 billion in revenue in 2025"

Considering just four years ago they were a research lab with hardly any revenue at all, and no corporate muscles for earning revenue, I think that is a very impressive number.

(Sure, they're losing a whole lot of money too. Same goes for almost every other hyper-growth company in the history of tech.)


> Same goes for almost every other hyper-growth company in the history of tech

Except it's not true. No one lost $38.5B in a year just to 'hyper-grow' or whatever it means. Uber accumulated ~$30B loss over a decade.

Edit: I read it wrong. The loss was mostly caused by one-time event[0]:

> Before OpenAI’s switch late last year to become a public benefit corporation, investors in the company received convertible interest rights rather than conventional equity. Under US accounting rules, those interests were treated as liabilities and periodically revalued as the company’s valuation increased.

It looks like that OpenAI is actually quite in line with other companies that lost money to grow.

[0]: https://www.ft.com/content/e15b0d7e-ff6b-4f16-ba7a-4068feddb...


Depreciating assets is one of several possibilities.

That argument supports any levels of losses, however I also think it’s rather misleading.

Growth means some inefficiencies, but their expenses are largely around commodities like electricity and data centers not a sudden army of salespeople. They also got 150M 11 years ago and 1 billion 7 year ago, they where quite large in 2022.

Basically you don’t get better at writing checks to your local utility which limits how much they can control costs.


> They also got 150M 11 years ago and 1 billion 7 year ago, they where quite large in 2022.

I had to look that up, you're talking about investment there, not earned revenue.

The 150M was their initial funding (actually 130M I think https://www.clay.com/dossier/openai-funding)

The 1B was from Microsoft in 2019: https://openai.com/index/microsoft-invests-in-and-partners-w...

In 2022 they only had 335 employees (according to various internet searches but I can't find an original source for that number.) I can't find credible numbers for revenue from the GPT-3 API, which did have some usage - GitHub Copilot started charging a subscription fee on June 21, 2022 - https://github.blog/changelog/2022-06-21-github-copilot-is-n... - and that was running on the OpenAI Codex model so presumably OpenAI had some revenue from that.


We don’t disagree with the underlying facts.

That said, in many ways 335 employees is the midpoint between 3 employees and 30,000 employees. The CEO can’t keep track of everyone’s names and what they’re doing, you need layers of management, HR, etc. It’s not really a simple exponential function but 335 to 336 is way more automated than going from 3 to 4.


> Sure, they're losing a whole lot of money too. Same goes for almost every other hyper-growth company in the history of tech

That doesn't mean anything. There are examples to make both ways. E.g. WeWork


and WeWork is awesome example because it fell apart before IPO. It didn't even make it that far. On the other hand, for all of the shit talking that goes on online, SpaceX is up 49% from IPO price.

All of the shit that people said about SpaceX is still true. It's still up 49%. I'm sure it'll take a dump the next time anything bad happens, like a rocket explodes, but now that it's public, I'ma be watching all their rocket launches so I can buy if that happens and sell right after. I'm also going to be watching because going to space is fucking awesome but I can't buy a trip on that rocket yet and and no one's gonna pay me to watch it.


And a JPEG was just a JPEG in 2020, but in 2021 they became a 17 billion dollar speculative market.

How "very impressive" is your NFT collection these days?


NFTs were very obviously stupid. LLMs aren't.

Your argument went from "big number good" to redefining "stupid", and you think that somehow supports your original statement?

What word would you use to describe someone that:

    - told you to put glue on pizza?
    - thinks there's 1 'r' in strawberry?
    - is incapable of stopping terminal flickering? 
    - deletes your production database?
    - bankrupts you trying to scan the entire IPv6 address space of a play network interface?
    - can only attempt to draw a bird on a bike in the most bland and unimaginative style possible yet still completely failing?
All while being given the entire US economy and polluting the only planet we have to do so?

If you still think models can't count the Rs in strawberry you're about a year out of date.

Nice.

I mean, sure you said "LLMs", rather than "LLMs in the last 12 months", and sure, you completely abandoned your original argument, and sure, you ignored the other things listed, and of course everyone knows that list is a comprehensive list of the only failings of genai rather than a honeypot to positively identify you as a shameless shill, but ultimately, the fact that HN chose someone this terrible at making a defensible logical argument to be their favorite genai financial interest mouthpiece is a strong indicator defending the criticisms in the original submission.


If your argument is that LLMs are stupid in the same way that NFTs were stupid I don't think it's worth spending any more time discussing this with you.

If your losses scale with your growth, while at the same time your competitors are eating into your future user-base, how are you ever gonna become profitable? Only two ways comes to my mind: regulatory capture, and moving upwards into full software-development house.

Or a utility :)

Look at how a utlity works, in setting price specifically, for things that are considered a public good. The story is not about how much profit or revenue they make. Its about how do you keep it afloat and expanding in the coming year. Thats it.


Compute is both rivalrous and excludable so it can't be a public good in an economic sense

$867 million of that was Softbank. News the market has not taken well.

Annual revenue of $13 billion per year puts them on par with Apple's AirPods revenue, which places them in U.S. Fortune 500.

lol so weird to put that into perspective macbook ~8b airpods ~18b iphones ~60b

I know they are massive, but AI seems something much more important than airpods


AI doesn't work like the rest of the tech industry. The cost of selling another license for a software program is approximately zero.

In the case of AI the marginal cost of the next token is not zero, and is in fact probably not going down much with volume, if at all.

So I'm not sure one can argue that scale will solve everything. It's very much like the old adage "we lose money on every sale, but make it up in volume".

No you don't.


It's wild to think how efficient Internet services were prior to AI. The most expensive thing would probably have been something like encoding video. Now you've a substantial portion of a rack dedicated to a user in the case of something like fable

Best analogue we have is probably video streaming. Or maybe more so live streaming. Unless subscription based and limited time events it seems those don't do well. Twitch has lost money for how long? And most smaller players seem propped up in other ways.

So if there is real cost involved things start to look lot worse and might not be overcome. OpenAI is unlikely to be exception for me.


But video streaming can be very profitable! Youtube and Netflix are great examples.

But there is no indication they are losing money on tokens when R&D and other expenses are factored out? The margins on API are likely very high so the higher the volume the more likely they will be able to cover the other mostly fixed costs.

Maybe. We'll see.

Also, what are they calling "R&D" exactly? If it is training new models, which needs to be done almost constantly and means spending billions on energy and newer GPUs, then it's not really R&D, but rather operating costs.


Yeah, but they have no moat.

They gave up on video because three separate Chinese companies were kicking their ass (and for cheaper).

Google has a better image model in the majority of cases. Much faster, too.

Claude Opus and Fable are like a billion times better. It's not even funny. Codex can't do Rust at all.

What does that leave them? Ads in ChatGPT? I've started to just rely on Google search blended with Gemini answers now because it's faster and doesn't spit out a 20-page essay of useless effusive prose.

Open source models will eat them from the bottom.

Will those enterprise contracts be renewed in a market full of alternatives?

There's nothing sticky about this company.

They're making a necklace with Jony Ive though, I guess?


They still have the most recognized AI brand name and they are still the most popular LLM. For most users, a 10% diff between Claude and GPT isnt going to move the needle plus it seems to be a horse race anyways. I think their user base is stickier than you would think. Still, it isn't as sticky as social media and it is cheaper to switch AIs than email accounts.

Look at the ChatGPT usage share. It's dropped dramatically in the last year.

https://techcrunch.com/2026/06/16/chatgpts-market-share-slip...

This is not the winner take all market that OpenAI needs it to be.


Its still dominant and a lot higher % wise if you count paying users. Gemini was integrated into Google search so its not necessarily people using Gemini as their daily assistant.

Opus 4.8 is quite weak. And GPT-Pro is very much available unlike Fable, it's just not hooked up to the Codex harness yet.

Will it be? It's so obscenely slow and expensive and its not obvious it could provide a lot of value for non highly specialized tasks.

So, just like Fable? You can shorten the thinking effort to tweak the "slow and expensive" part a little bit, but at the higher end being more meticulous than even Fable is actually a benefit.

> Google has a better image model in the majority of cases.

Not always. A couple months ago (before ChatGPT Images 2) I tried various prompts on both Google's Nano Banana or whatever and ChatGPT.

"Capybara riding a tricycle. It has 7 tentacles instead of legs"

Google got the number of tentacles completely wrong: https://i.postimg.cc/nzY30y7X/Capybara-Gemini-Nano.png

and after some additions like spotted fur and multicolored tentacles, it was no contest:

ChatGPT: https://i.postimg.cc/02c2LrxV/Capybara-Chat-GPT-before-Image... (although there's still kinda 1 extra tentacle)

And Google still seems to have that odd choice of a European plaza/square/cobblestone street background for everything.

> Claude Opus and Fable are like a billion times better.

NOT at ALL: https://i.imgur.com/jYawPDY.png

Subscribed to Claude Opus for 2 months, with a few months gap between subscriptions to try different versions.

The UX/UI around Anthropic's products was excruciatingly annoying, right from the payment process, and Claude's AI was often hilariously dumb and "trying too hard", constantly full of "oops, you're right" backtracking and often borderline dangerous.

I tried Claude and ChatGPT Codex side by side on some tasks, with the same prompts. Each time, my confidence in Claude fell.

I've been subscribed to the $20 ChatGPT plan for more than 1 year, and this month, I am trying the $100 plan for 1 month.

ChatGPT Codex has been actually helpful and made me more productive enough that I can't imagine going back to coding without it.


I use LLMs more in the context of peer-reviewing and also came to a similar conclusion, gpt-5.5 codex xhigh reasoning seemed to catch more edge cases and went "deeper" into analysis than Opus 4.7/4.8.

My preliminary tests of Fable were pretty promising but that's DOA for everyone for now.


Claude often spent most of its output listing all the things that were already correct and working! "This is good"

and most of its findings were false positive or outright wrong as in the screenshot I posted above.


Neat, works against example.com

  exec 3<>/dev/tcp/example.com/80
  printf 'GET / HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n' >&3
  cat <&3
Outputs:

  HTTP/1.1 200 OK
  Date: Tue, 16 Jun 2026 17:37:45 GMT
  Content-Type: text/html
  ...
I always end up on example.com for this kind of thing because there are so few domains these days that don't enforce https!

example.com is also great for that reason when something fails about a captive portal on a public WiFi.

I open my web browser and go to http://example.com and get redirected to the captive portal page again and retry completing what they need from me to get internet access.


Fun fact, this is almost exactly how active portal detection is done in the OS/browser!

https://gist.github.com/skull-squadron/edb8c0122f902013304c0...


Yep :) I just find example.com easier to remember and quicker to type than any of the OS or browser makers own URLs like

- http://captive.apple.com/

- http://connectivitycheck.gstatic.com/generate_204

- http://detectportal.brave-http-only.com/

Plus, it feels nice to depend on the reserved domain name example.com instead of relying on a domain that any one specific corporation has to maintain :D


What gives you confidence example.com won't start serving the HTTPS redirect though? There isn't any reason they wouldn't, and given that browsers are clearly tending towards showing big scary warnings to even accessing something over cleartext, I wouldn't be surprised if they flipped that switch just to avoid confusing noobs.

True, that could happen. If it does do that then I will have to switch over to remembering a different URL instead. But as long as it hasn’t I will keep using http://example.com :)

Also http://detectportal.firefox.com. And http://neverssl.com was set up for this purpose while being a bit easier to remember :)

I remember a while back neverssl.com would happily serve HTTPS requests! Another good alternative is http://httpforever.com/

I have been using neverssl.com for this same purpose :)

My only concern would be that example.com doesn't promise to never do the 'required SSL' thing.


I use neverssl.com for this purpose because it is designed to resist caching.

This works too

  exec 3<>/dev/tcp/example.com/80
  printf 'GET / HTTP/1.1\r
  Host: example.com\r
  Connection: close\r
  \r
  ' >&3
  cat <&3
You can even take out the \r though they should be there

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: