wpietri's comments | Hacker News

wpietri 6 hours ago | parent | context | | on: 2026 Predictions Scorecard

As somebody who has been reading them since the first year, I think you have it wrong. That self-driving prediction was always about Level 5 autonomy. What's changed between now and then is that we've basically stopped talking about that, instead accepting intervention-as-a-service companies as self driving.

ghaff 6 hours ago | |

Well, ans we're talking about within very specific locales.

Honestly, Brooks--who has been presented and self-presented as something of a skeptic with respect to autonomous self-driving--looks like something of an optimist at this point. (In the sense that your kid won't need to learn to drive.)

I think this is true for essential complexity. And indeed it's one of the best reasons to release early and often, because usage helps clarify which parts of the requirements are truly required.

But plenty of projects add quite a lot of incidental complexity, especially with technology choices. E.g., Resume Driven Development encourages picking impressive or novel tools, when something much simpler would do.

Another big source of unneeded complexity is code for possibilities that never come to fruition, or that are essentially historical. Sometimes that about requirements, but often it's about addressing engineer anxiety.

Have you considered that the problem here is not insufficient explanation of policy?

There's this thing that some programmers do a lot, where it's the users who are wrong. Using it wrong, approaching it wrong, thinking about it wrong, wanting the wrong thing. Just not understanding enough the masterwork that the programmers created.

What this view misses is that the users are the point. If one user gets it wrong, sure, maybe it's the user. But broadly the point of software is to serve and adapt to users, and developers who forget that are starting an argument that they cannot win in the long term.

It's especially wild to see you talking like this on an article about how Stack Overflow is just about dead. It needed changes a decade ago, but everyone just hunkered down and defended the existing approach. The policies you are somehow still defending are a big part of what doomed the site.

shagie 3 days ago | |

The site was a consensus of what Jeff and Joel and their associated blogging communities who started posting on Stack Overflow wanted. There was some tension between those two communities about what should be there, but that's where it started.

In the early days, onboarding was done fairly actively with a reasonable amount of the community participating in answering and community moderation - shaping it.

That portion of the community - both answering and moderating was key for onboarding.

However, as Stack Overflow got popular, a smaller and smaller percent of the community was actively answering and participating in community moderation - and onboarding of new people became more and more difficult.

Here I lay the responsibility nearly completely at the feet of corporate. The friction for moderation was increased at the same time that it became popular and thus harder for the community to moderate.

Making it easier moderate and help people understand the site meant that either you needed a larger part of the now very large number of people participating on the site or the ease of community moderation needed to be dialed back.

This is also where rudeness became more and more common. There are two parts to this - first rudeness takes no points to get to that level of moderation. It doesn't have any limited pool of votes that you deplete. Secondly, not everything was rude. With the smaller and smaller pool of community moderation people were shorter in their attempts to onboard a person. You couldn't write a paragraph in a comment and spend 10 minutes on one person when spending 1 minute on 10 different people was more likely to help someone. The shortness of responses was interpreted by the person asking was being perceived as rude.

Lastly, StackOverflow was designed as a Q&A site and attempted to minimize some of the things that were seen as failings described in A Group Is Its Own Worst Enemy ( https://news.ycombinator.com/item?id=23723205 ) - Clay Shirky was a mentor of Jeff and was on the original Stack Overflow board. It tried (and for a long time succeeded at) handling scale... though when Stack Overflow's ability to handle scale failed, it was the moderation tools and the ability for the people participating in community moderation to help surface the good questions to be answered and have the questions that needed work to be properly answerable in the Q&A format that Stack Overflow was designed around (not in a forum format) that suffered.

zahlman 3 days ago | | |

What you're missing is that random people who come to Stack Overflow to ask a question (of a sort that doesn't meet the site's standards) are not my "users". I don't care in the slightest about these metrics of "dead-ness", and showing them to me another hundred times will not change my mind about that.

Because from my perspective, it has never been about how many questions are asked per day, or how many ad impressions the site owners get. (I don't see a dime from it, after all.) From my perspective, way too many questions got asked. It is more than three times as many publicly visible and still-open questions, as there are articles on Wikipedia. For a scope of "practical matters about writing code", as compared to "any real-world phenomenon important enough for reliable sources to have written about it".

I am not trying to win the argument about what people want. I am only establishing that the goal is legitimate, and that people who share that goal should be permitted to congregate in public and try to accomplish something. I do not share your goals. The community is not like software, and "serving and adapting to users" does not benefit the people doing the work. We never arranged to have the kind of "users" you describe.

immibis 3 days ago | | |

Deadness is the symptom, not the cause. Users don't avoid SO because it's dead, but rather, SO is dead because users avoid it. It's up to you to figure out why users are avoiding it. Hint: They've been telling you quite loudly.

There's another thread on the front page about IPv6 where someone had a good analogy: IPv4 vs IPv6 is like Python 2 vs 3. The Python 2 diehards continued arguing furiously to an emptier and emptier room. They never felt they were proven wrong, and the intensity of the argument never diminished but the argument was with fewer and fewer people until they were just arguing with themselves as the world moved on without them.

And that's exactly what happened to Stack Overflow, and you're one of those guys still trying to promote the use of Python 2.7 in 2026, after the horse is long gone. Everyone has left, the lights are off in the empty debate hall and you're standing there at the podium telling a bunch of chairs and desks why everyone actually agrees with you. You might want to reflect on why you hold such fervent beliefs that are in direct contradiction with observable reality. Can I guess you had a lot of reputation points and you desperately don't want to believe they're worthless now?

The referenced comment: https://news.ycombinator.com/item?id=46477920

zahlman 3 days ago | | |

> It's up to you to figure out why users are avoiding it. Hint: They've been telling you quite loudly.

No, it is not up to me to figure that out. I have heard it said quite loudly many times, over a period of many years.

What you are missing is: I. Do. Not. Care.

The goal was never for the site to be "not dead". The goal was for the site to host useful information that is readily found.

The site already has tons of useful information. But it's drowning in... much less useful information, and Google has become much worse (to some extent intentionally) at surfacing the good parts.

> And that's exactly what happened to Stack Overflow, and you're one of those guys still trying to promote the use of Python 2.7 in 2026

This is a bizarre thing to say to me, of all people. I am always the one catching flak for telling people that 2.7 had to go, that the backwards-incompatible changes were vital, that the break wasn't radical enough, and that people were given way more time to switch over than they should have needed.

But really, the feedback for Stack Overflow is trying to take it in the direction of places that existed long beforehand. If you want forums, you know where to find them. And now you can also find LLMs. Which, as commonly used by people seeking programming help, are basically a grizzled forum guy in a can.

>Everyone has left, the lights are off in the empty debate hall and you're standing there at the podium telling a bunch of chairs and desks why everyone actually agrees with you.

"Everyone actually agrees with [me]" is the polar opposite of what I actually believe and am actually saying. I am well aware that the model is unpopular. My point is that the popularity of the model is irrelevant to me.

> Can I guess you had a lot of reputation points and you desperately don't want to believe they're worthless now?

I have a lot of reputation points (the site still exists), far more than I ever felt I deserved, and I never really felt like they were worth anything. A huge percentage of them come from an answer to a terrible question (that was still terrible after heroic attempts at editing; this all happened long before there was a common understanding of the purpose of question closure or what would make good standards for questions) that, once I understood things properly, I closed and tried to get deleted. Over the last few years, with that new understanding, I have been trying to give away my superfluous reputation points in bounties, trying to get missing answers written for the few really good questions lacking good answers that I identify, always to no avail (the bounty system promptly became a honeypot for ChatGPT hallucinations as soon as ChatGPT became available).

You do not know me or my motivations in the slightest.

immibis 3 days ago | | |

> The goal was never for the site to be "not dead"

ok? fine then. If you think it's fine for the site to be dead then please stop spamming comments defending it. It doesn't need any defence to stay dead and such defence is not useful.

Response to child comment: no, you are not replying to people telling you why you need to care about a thing. You are mostly replying randomly throughout the thread and telling people why they are wrong.

zahlman 3 days ago | | |

I am only responding to many people trying to explain why I should care about the thing I don't care about. The defense is useful because a) it being "dead" by these metrics is unimportant; b) people are blaming a community for mistreating them, when they came in without any intent of understanding or adapting to that community; c) other sites in this mold exist, and are trying to establish themselves.

wpietri 3 days ago | | | |

As a former Wikipedia administrator, I think one of the things that Wikipedia has done exactly right is to strongly prioritize readers first, editors second, and administrators third. The unofficial Wikipedia administrator symbol is a mop, because it's much more a position of responsibility than it is a position of power.

I obviously think you and other user-hostile people should be permitted to congregate and accomplish something. What I object to in Stack Overflow's case is the site being taken over by people like that, serving themselves and their own preferences with such vigor that they alienated vast numbers of potential contributors, putting the site on a path of decline from which is unlikely to recover.

Even by your own terms, having a place for some (conveniently unspecified) group to "congregate in public and try to accomplish something" looks certain to be a failure. However much you don't care about deadness or declining revenue, the people paying the bills surely do. Stack Overflow was only a success because it served and adapted to users.

But I give you points for being honest about your hostility to the entire point of the site. It not only makes it clear why it's failing, but it'll keep people from being sorry when it gets closed down.

Right? It's a perfect example of the problem.

In college, I worked tech support. My approach was to treat users as people. To see all questions as legitimate, and any knowledge differential on my part as a) the whole point of tech support, and b) an opportunity to help.

But there were some people who used any differential in knowledge or power as an opportunity to feel superior. And often, to act that way. To think of users as a problem and an interruption, even though they were the only reason we were getting paid.

I've been refusing to contribute to SO for so long that I can't even remember the details. But I still recall the feeling I got from their dismissive jackassery. Having their content ripped off by LLMs is the final blow, but they have richly earned their fate.

pasc1878 3 days ago | |

The point here is you worked tech support so you were paid to answer user questions.

However the answerers on So are not paid. Why should tyhy waste their time on a user who has not shown they have put any effort in and asks a question that they have already answered several times before?

wpietri 3 days ago | | |

Nobody, least of all me, is saying people should work for free. But not being paid to do something you don't want to do is a reason to go do something else, not hang around and be a hostile, superior dick about it, alienating the users.

pasc1878 2 days ago | | |

The answerers are just as much users as the questioners - possibly in fact more as they are the ones spending time whilst the askers often (especially the poor ones) just ask a question and then go away.

Unfortunately the SO management want money and so want the fly away askers more than the answerers who provide the benefit of the site.

dent9 3 days ago | | | |

> However the answerers on So are not paid. Why should tyhy waste their time on a user who has not shown they have put any effort in and asks a question that they have already answered several times before?

This is kind of a weird sentiment to put forth, because other sites namely Quora actually do pay their Answerer's. An acquintance of mine was at one time a top "Question Answerer" on Quora and got some kind of compensation for their work.

So this is not the Question-Asker's problem. This is the problem of Stack Overflow and the people answering the questions.

shagie 3 days ago | | |

When I worked technical support in college I often worked nights and weekends (long uninterrupted times to work on homework or play games) ... there was a person who would call and ask non-computer questions. They were potentially legitimate questions - "what cheese should I use for macaroni and cheese?" Sometimes they just wanted to talk.

Not every text area that you can type a question in is appropriate for asking questions. Not every phone number you can call is the right one for asking random questions. Not every site is set up for being able to cater to particular problems or even particular formats for problems that are otherwise appropriate and legitimate.

... I mean... we don't see coding questions here on HN because this site is not one that is designed for it despite many of the people reading and commenting here being quite capable of answering such questions.

Stack Overflow was set up with philosophy of website design that was attempting to not fall into the same pitfalls as those described in A Group Is Its Own Worst Enemy. https://news.ycombinator.com/item?id=23723205

Arguably, it succeeded at not having those same problems. It had different ones. It was remarkably successful while the tooling that it had was able to scale for its user base. When that tooling was unable to scale, the alternative methods of moderation (e.g. rudeness) became the way to not have to answer the 25th question of "how do I make a pyramid with asterisks?" in September and to try to keep the questions that were good and interesting and fit the format for the site visible for others to answer.

It wasn't good that rudeness was the moderation tool of last resort and represents a failing of the application and the company's ability to scale those tools to help handle the increased number of people asking questions - help onboard them and help the people who are trying to answer the questions that they want to answer to be able to find them.

The failing of the company to do this resulted in the number of people willing to answer and the number of people willing to try to keep the questions that were a good fit for the site visible.

Yes, it is important for the person answering a question to treat the person asking the question with respect. It is also critical for the newcomer to the site to treat the existing community there with respect. That respect broke down on both sides.

I would also stress that treating Stack Overflow as a help desk that is able to answer any question that someone has... that's not what it was designed for. It acts as a help desk really poorly. It was designed to be a library of questions and answers that was searchable. The questions were the seeds of content, and it was the answers - the good answers - that were the ones that were to stay and be curated. That was one ideal that described in https://blog.codinghorror.com/introducing-stackoverflow-com/

closewith 2 days ago | | |

> It wasn't good that rudeness was the moderation tool of last resort and represents a failing of the application and the company's ability to scale those tools to help handle the increased number of people asking questions - help onboard them and help the people who are trying to answer the questions that they want to answer to be able to find them.

This is a very charitable read of the situation. Much more likely is, as another commenter posted, a set of people experiencing a small amount of power for the first time immediately used it for status and took their "first opportunity to be the bully".

> It was designed to be a library of questions and answers that was searchable.

It obviously was only tolerated because of that, as evidenced by the exodus the moment a viable alternative became available.

mvdtnz 3 days ago | | |

I blame the Internet culture of the late 90s early 2000s. Referring to your customers as Lusers and dismissing their "dumb" questions was all the rage amongst a group of nerds who had their first opportunity to be the bully.

wpietri 3 days ago | | |

I think this "first opportunity to be the bully" thing is spot on. Everybody learns from being bullied. Some of us learn not to do it when we have power; others just learn how.

gn4d 2 days ago | | |

I think you have some good points, but you take it too far. The UN charter is the way it is not because it's the optimal approach, but because non-democratic countries had too much power for it to be otherwise.

As an example, the American Revolution had support from France, the Netherlands, and Spain. Britain saw this as shocking interference in an internal matter, as did loyalists in America.

Personally, I think it was a good thing, helping a people determine their own fate. Applying the same measure here, I simultaneously think it's great Maduro is out, but that the manner of it is terrible. As well as being foolishly shortsighted, both for the US and the world more broadly.

cmurf 4 days ago | |

The charter doesn't prohibit aiding people.

The charter limits the powerful nations. Rule #1 is nations cannot start wars. Starting a war is a crime.

The charter requires some consensus by the international community to authorize use of force against another country.

Article 51 acknowledges the right to self-defence. The only country that has a right to violence is the defending nation and those who aid it from aggression.

And this is, once again, American aggression. We aren't doing it because it's right. We're doing it because we can. In violation of international law.

vladms 4 days ago | | |

I doubt there is any other "optimal" approach, but do say what you would propose.

There will always be indirect interference anyhow (think social networks, books, press, people talking, tariffs, visas, etc.), so there is some possibility for states to push things in their direction.l

I think imagining there can be some "authority" that could decide when "direct interference" is allowed or not will be a disaster at some point, because even if at first is OK, as a society we don't seem to be at a point where we can have organizations that work well for hundreds of years.

wpietri 3 days ago | | |

I'm not proposing anything. I'm pointing out that in a complex world simple rules, however appealing they are cognitively, aren't sufficient.

joering2 4 days ago | | |

> but you take it too far.

you do know who the president of the United States currently is RIGHT ?

wpietri 3 days ago | | |

I think the last part of my post makes it clear that I do. But if not, let me just make clear that we have to struggle through the Harding administration as best we can, but better days are ahead.

calf 4 days ago | | |

Countries are not like IID random variables which is the basis of this sort of center-liberal argument.

bawolff 4 days ago | | |

> As an example, the American Revolution had support from France, the Netherlands, and Spain

But to what extent did they do it to "free" america vs to take Britian down a peg because they worried Britian was getting too powerful?

I think most people here are doubtful of Trumps motives or that this coup will actually lead to a free Venezuela.

America worked out really well. There are many many examples in history where imperial powers interfering in a local power struggle worked out very poorly for the average person of the country.

jacquesm 4 days ago | | |

> America worked out really well.

That really depends on who you are asking.

wpietri 3 days ago | | | |

Are these really separable? Even a an individual I generally have multiple motivations for an action. That has to be even more true for whole nations.

And I don't think there's any reason to be doubtful of Trump's motivations. He's a would-be tyrant and has made it clear that this is about world dominance, Venezuela's oil, and enriching American businessmen. He has no interest in a free, democratic Venezuela. If this does work out well for Venezuelans, it'll be more due to Trump's flaws (arrogance, laziness, increasing dementia, and the TACO phenomenon) than any intent on his part.

Is the distinction arbitrary? It sounded like issues are used for clear, completable jobs for the maintainers. A mysterious bug is not that. The other work you describe is clearly happening, so I'm not seeing a problem with this approach other than its novelty for users. But to me it looks both clearer than the usual "issue soup" on a popular open source project and more effective at using maintainer time, so next time I open-source something I'd be inclined to try it.

lokar 5 days ago | |

Some people see "bug tracker" and think "a vetted report of a problem that needs fixing", others see "bug tracker" and think "a task/todo list of stuff ready for an engineer to work on"

Both are valid, and it makes sense to be clear about what the teams view is

wpietri 4 days ago | | |

Agreed. Honestly, I think of those as two very different needs that should have very different systems. To me a bug tracker is about collecting user reports of problems and finding commonalities. But most work should be driven by other information.

I think the confusion of bug tracking with work tracking comes out of the bad old days where we didn't write tests and we shipped large globs of changes all at once. In that world, people spent months putting bugs in, so it makes sense they'd need a database to track them all after the release. Bugs were the majority of the work.

But I think a team with good practices that ships early and often can spend a lot more time on adding value. In which case, jamming everything into a jumped-up bug tracker is the wrong approach.

If there's somebody out there advocating for "unprotected sex with large numbers of people", you should go post at them, because I don't see that here.

The biggest barrier to disease transmission reduction, at least here in the US, is uncritical abstinence promoters like yourself. It works, at best, for a small fraction of the population, and leaves the rest woefully unprepared for the biological realities. The best solution to STDs is education. Which, yes, should emphasize that not having sex is an option, but cannot stop there.

I sometimes vouch for incorrectly flagged posts. You got me curious, so I took a look. What I found was a blog from an anonymous conspiracist vaccine opponent claiming to be a doctor. He's a decent writer but in my estimation a loon.

So I'm fine with it being flagged and decline to vouch for it.

This is not a great argument:

> But it is hard to argue against the value of current AI [...] it is getting 1ドルB dollar runway already.

The psychic services industry makes over 2ドル billion a year in the US [1], with about a quarter of the population being actual believers. [2].

[1] The https://www.ibisworld.com/united-states/industry/psychic-ser...

[2] https://news.gallup.com/poll/692738/paranormal-phenomena-met...

apexalpha 6 days ago | |

What if these provide actual value through placebo-effect?

wpietri 6 days ago | | |

I think we have different definitions of "actual value". But even if I pick the flaccid definition, that isn't proof of value of the thing itself, but of any placebo. In which case we can focus on the cheapest/least harmful placebo. Or, better, solving the underlying problem that the placebo "helps".

computably 6 days ago | | |

I'll preface by saying I fully agree that psychics aren't providing any non-placebo value to believers, although I think it's fine to provide entertainment for non-believers.

> Or, better, solving the underlying problem that the placebo "helps".

The underlying problems are often a lack of a decent education and a generally difficult/unsatisfying life. Systemic issues which can't be meaningfully "solved" without massive resources and political will.

wpietri 5 days ago | | |

If we look back over the last century or so, I think we've made excellent progress on that. The main current barrier is that we've lately let people with various pathologies run wild, but historically that creates enough problems that the political will emerges. See, e.g., the American and French revolutions, or India's independence, or the US civil war and Reconstruction.

jay_kyburz 6 days ago | | | |

Actually, I'd go one step further and say they are harmful to everybody else.

It might just be my circles, but I've seen Carl Sagans quote everywhere in the last couple of months.

""Science is more than a body of knowledge; it is a way of thinking. I have a foreboding of an America in my children’s or grandchildren’s time—when the United States is a service and information economy; when nearly all the key manufacturing industries have slipped away to other countries; when awesome technological powers are in the hands of a very few, and no one representing the public interest can even grasp the issues; when the people have lost the ability to set their own agendas or knowledgeably question those in authority; when, clutching our crystals and nervously consulting our horoscopes, our critical faculties in decline, unable to distinguish between what feels good and what’s true, we slide, almost without noticing, back into superstition and darkness.""

recursive 6 days ago | | | |

You talking about psychics or LLMs?

grosswait 6 days ago | | |

Yes

ctoth 6 days ago | | |

2022/2023: "It hallucinates, it's a toy, it's useless."

2024/2025: "Okay, it works, but it produces security vulnerabilities and makes junior devs lazy."

2026 (Current): "It is literally the same thing as a psychic scam."

Can we at least make predictions for 2027? What shall the cope be then! Lemme go ask my psychic.

wpietri 6 days ago | | |

I suppose it's appropriate that you hallucinated an argument I did not make, attacked the straw man, and declared victory.

ben_w 6 days ago | | |

Ironically, the human tendency to read far too much into things for which we have far too little data, does seem to still be one of the ways we (and all biological neural nets) are more sample-efficient than any machine learning.

I have no idea if those two points, ML and brains, are just different points on the same Pareto frontier of some useful metrics, but I am increasingly suspecting they might be.

bopbopbop7 6 days ago | | | |

2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?

ben_w 5 days ago | | |

I went from using ChatGPT 3.5 for functions and occasional scripts...

... to one of the models in Jan 2024 being able to repeatedly add features to the same single-page web app without corrupting its own work or hallucinating the APIs it had itself previously generated...

... to last month using a gifted free week of Claude Code to finish one project and then also have enough tokens left over to start another fresh project which, on that free left-over credit, reached a state that, while definitely not well engineered, was still better than some of the human-made pre-GenAI nonsense I've had to work with.

Wasn't 3 hours, and I won't be working on that thing more this month either because I am going to be doing intensive German language study with the goal of getting the language certificate I need for dual citizenship, but from the speed of work? 3 weeks to make a startup is already plausible.

I won't say that "software engineering" is dead. In a lot of cases however "writing code" is dead, and the job of the engineer should now be to do code review and to know what refactors to ask for.

bopbopbop7 5 days ago | | |

So you did some basic web development and built a "not well engineered" greenfield app that you didn't ship, and from that your conclusion is that "writing code is dead"?

ben_w 5 days ago | | |

In half a week with left-over credit.

What do you think the first half of the credit was spent on?

In addition to the other projects it finished off for me, the reason I say "coding is dead" is that even this mediocre quality code is already shippable. Customers do not give a toss if it has clean code or nicely refactored python backend, that kind of thing is a pain point purely for developers, and when the LLM is the developer then the LLM is the one who gets to be ordered to pay down the technical debt.

The other project (and a third one I might have done on a previous free trial) are as complete as I care to make them. They're "done" in a way I'm not used to being possible with manual coding, because LLMs can finish features faster than I can think of new useful features to add. The limiting factor is my ability to do code review, or would be if I got the more expensive option, as I was on a free trial I could do code review about twice as fast as I burned through tokens (given what others say about the more expensive option that either means I need to learn to code review faster, or my risk tolerance is lower than theirs).

Now, is my new 3-day web app a viable business idea? It would've been shippable as-is 5-6 years ago, I saw worse live around then. Today? Hard to say, if markets were efficient then everyone would know LLMs can create this kind of thing so easily and nobody could charge for them, but people like yourself who disbelieve are an example of markets not being efficient, people like you can have apps like these sold to them.

That said, I try not to look at where the ball is but where it is going. For business ideas, I have to figure out what *doesn't* scale, and do that. Coding *does* scale now, that's why coding is dead.

I expect to return to this project in a month. Have one of the LLMs expand it and develop it for more than 3 the days spent so far, turn it into something I'd actually be happy to sell. Like I said, it seems like we're at "3 weeks" not "3 hours" for a decent MVP by current standards, but the floor is rising fast.

aspenmartin 6 days ago | | | |

The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.

bopbopbop7 6 days ago | | |

Please do provide some data for this "obvious value of coding agents". Because right now the only thing obvious is the increase in vulnerabilities, people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

aspenmartin 6 days ago | | |

Sure: at my MAANG company, where I watch the data closely on adoption of CC and other internal coding agent tools, most (significant) LOC are written by agents, and most employees have adopted coding agents as WAU, and the adoption rate is positively correlated with seniority.

Like a lot of things LLM related (Simon Willison's pelican test, researchers + product leaders implementing AI features) I also heavily "vibe" check the capabilities myself on real work tasks. The fact of the matter is I am able to dramatically speed up my work. It may be actually writing production code + helping me review it, or it may be tasks like: write me a script to diagnose this bug I have, or build me a streamlit dashboard to analyze + visualize this ad hoc data instead of me taking 1 hour to make visualizations + munge data in a notebook.

> people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

what would satisfy you here? I feel you are strawmanning a bit by picking the most hyperbolic statements and then blanketing that on everyone else.

My workflow is now:

- Write code exclusively with Claude

- Review the code myself + use Claude as a sort of review assistant to help me understand decisions about parts of the code I'm confused about

- Provide feedback to Claude to change / steer it away or towards approaches

- Give up when Claude is hopelessly lost

It takes a bit to get the hang of the right balance but in my personal experience (which I doubt you will take seriously but nevertheless): it is quite the game changer and that's coming from someone who would have laughed at the idea of a 200ドル coding agent subscription 1 year ago

Denzel 6 days ago | | |

We probably work at the same company, given you used MAANG instead of FAANG.

As one of the WAU (really DAU) you’re talking about, I want to call out a couple things: 1) the LOC metrics are flawed, and anyone using the agents knows this - eg, ask CC to rewrite the 1 commit you wrote into 5 different commits, now you have 5 100% AI-written commits; 2) total speed up across the entire dev lifecycle is far below 10x, most likely below 2x, but I don’t see any evidence of anyone measuring the counterfactuals to prove speed up anyways, so there’s no clear data; 3) look at token spend for power users, you might be surprised by how many SWE-years they’re spending.

Overall it’s unclear whether LLM-assisted coding is ROI-positive.

ben_w 5 days ago | | |

To add to your point:

If the M stands for Meta, I would also like to note that as a user, I have been seeing increasingly poor UI, of the sort I'd expect from people committing code that wasn't properly checked before going live, as I would expect from vibe coding in the original sense of "blindly accept without review". Like, some posts have two copies of the sender's name in the same location on screen with slightly different fonts going out of sync with each other.

I can easily believe the metrics that all [MF]AANG bonuses are denominated in are going up, our profession has had jokes about engineers gaming those metrics even back when our comics were still printed in books: https://imgur.com/bug-free-programs-dilbert-classic-tyXXh1d

aspenmartin 5 days ago | | | |

Oh yes all of this I agree with. I had tried to clarify this above but your examples are clearer: my point is: all measures and studies I have personally seen of AI impact on productivity have been deeply flawed for one reason or another.

Total speed up is WAY less than 10x by any measure. 2x seems too high too.

By data alone it’s a bit unclear of impact I agree. But I will say there seems to be a clear picture that to me, starting from a prior formed from personal experience, indicates some real productivity impact today, with a trajectory that suggests these claims of a lot of SWE work being offloaded to agents over the next few years seems not that far fetched.

- adoption and retention numbers internally and externally. You can argue this is driven by perverse incentives and/or the perception performance mismatch but I’m highly skeptical of this even though the effects of both are probably really, it would be truly extraordinary to me if there weren’t at least a ~10-20% bump in productivity today and a lot of headroom to go as integration gets better and user skill gets better and model capabilities grow

- benchmark performance, again benchmarks are really problematic but there are a lot of them and all of them together paint a pretty clear picture of capabilities truly growing and growing quickly

- there are clearly biases we can think of that would cause us to overestimate AI impact, but there are also biases that may cause us to underestimate impact: e.g. I’m now able to do work that I would have never attempted before. Multitasking is easier. Experiments are quicker and easier. That may not be captured well by e.g. task completion time or other metrics.

I even agree: quality of agentic code can be a real risk, but:

- I think this ignores the fact that humans have also always written shitty code and always will; there is lots of garbage in production believe me, and that predates agentic code

- as models improve, they can correct earlier mistakes

- it’s also a muscle to grow: how to review and use humans in the loop to improve quality and set a high bar

bopbopbop7 6 days ago | | | |

Anecdotes don’t prove anything, ones without any metrics, and especially at MAANG where AI use is strongly incentivized.

Evidence is peer reviewed research, or at least something with metrics. Like the METR study that shows that experienced engineers often got slower on real tasks with AI tools, even though they thought they were faster.

aspenmartin 6 days ago | | |

That's why I gave you data! METR study was 16 people using Sonnet 3.5/3.7. Data I'm talking about is 10s of thousands of people and is much more up to date.

Some counter examples to METR that are in the literature but I'll just say: "rigor" here is very difficult (including METR) because outcomes are high dimensional and nuanced, or ecological validity is an issue. It's hard to have any approach that someone wouldn't be able to dismiss due to some issue they have with the methodology. The sources below also have methodological problems just like METR

https://arxiv.org/pdf/2302.06590 -- 55% faster implementing HTTP server in javascript with copilot (in 2023!) but this is a single task and not really representative.

https://demirermert.github.io/Papers/Demirer_AI_productivity... -- "Though each experiment is noisy, when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool. Notably, less experienced developers had higher adoption rates and greater productivity gains." (but e.g. "completed tasks" as the outcome measure is of course problematic)

To me, internal company measures for large tech companies will be most reliable -- they are easiest to track and measure, the scale is large enough, and the talent + task pool is diverse (junior -> senior, different product areas, different types of tasks). But then outcome measures are always a problem...commits per developer per month? LOC? task completion time? all of them are highly problematic, especially because its reasonable to expect AI tools would change the bias and variance of the proxy so its never clear if you're measuring the change in "style" or the change in the underlying latent measure of productivity you care about

bopbopbop7 6 days ago | | |

To be fair, I’ll take a non-biased 16 person study over "internal measures" from a MAANG company that burned 100s of billions on AI with no ROI that is now forcing its employees to use AI.

anorwell 6 days ago | | |

What do you think about the METR 50% task length results? About benchmark progress generally?

ben_w 5 days ago | | |

I don't speak for bopbopbop7, but I will say this: my experience of using Claude Code has been that it can do much longer tasks than the METR benchmark implies are possible.

The converse of this is that if those tasks are representative of software engineering as a whole, I would expect a lot of other tasks where it absolutely sucks.

This expectation is further supported by the number of times people pop up in conversations like this to say for any given LLM that it falls flat on its face even for something the poster thinks is simple, that it cost more time than it saved.

As with supposedly "full" self driving on Teslas, the anecdotes about the failure modes are much more interesting than the success: one person whose commute/coding problem happens to be easy, may mistake their own circumstances for normal. Until it does work everywhere, it doesn't work everywhere.

When I experiment with vibe coding (as in, properly unsupervised), it can break down large tasks into small ones and churn through each sub-task well enough, such that it can do a task I'd expect to take most of a sprint by itself. Now, that said, I will also say it seems to do these things a level of "that'll do" not "amazing!", but it does do them.

But I am very much aware this is like all the people posting "well my Tesla commute doesn't need any interventions!" in response to all the people pointing out how it's been a decade since Musk said "I think that within two years, you'll be able to summon your car from across the country. It will meet you wherever your phone is ... and it will just automatically charge itself along the entire journey."

It works on my [use case], but we can't always ship my [use case].

aspenmartin 6 days ago | | | |

I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers?

If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs.

All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.

bopbopbop7 6 days ago | | |

I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.

aspenmartin 6 days ago | | |

I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.

Ianjit 6 days ago | | | |

Meta internal study showed a 6-12% productivity uplift.

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

insin 6 days ago | | | |

> - Give up when Claude is hopelessly lost

You love to see "Maybe completely waste my time" as part of the normal flow for a productivity tool

aspenmartin 6 days ago | | |

That negates everything else? If you have a tool that can boost you for 80% of your work and for the other 20% you just have to do what you’re already doing, is that bad?

shimman 6 days ago | | | |

There's a reason why sunk cost IS a fallacy and not a sound strategy.

Ianjit 6 days ago | | | |

The productivity uplift is massive, Meta got a 6-12% productivity uplift from AI coding!

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

ben_w 5 days ago | | | |

> You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks)

This is a surprising claim. There's only 3 orders of magnitude between US data centre electricity consumption and worldwide primary energy (as in, not just electricity) production. Worldwide electricity supply is about 3/20ths of world primary energy, so without very rapid increases in electricity supply there's really only a little more than 2 orders of magnitude growth possible in compute.

Renewables are growing fast, but "fast" means "will approach 100% of current electricity demand by about 2032". Which trend is faster, growth of renewable electricity or growth of compute? Trick question, compute is always constrained by electricity supply, and renewable electricity is growing faster than anything else can right now.

aspenmartin 5 days ago | | |

This is not my own claim, it’s based on the following analysis from Epoch: https://epoch.ai/blog/can-ai-scaling-continue-through-2030

But I forgot how old that article is: it’s 4 orders of magnitude past GPT-4 in terms of total compute which is I think only 3.5 orders of magnitude from where we are today (based on 4.4x scaling/yr)

nsxwolf 6 days ago | | | |

The nature of my job has always been fighting red tape, process, and stake holders to deploy very small units of code to production. AI really did not help with much of that for me in 2025.

I'd imagine I'm not the only one who has a similar situation. Until all those people and processes can be swept away in favor of letting LLMS YOLO everything into production, I don't see how that changes.

aspenmartin 6 days ago | | |

No I think that's extremely correct. I work at a MAANG where we have the resources to hook up custom internal LLMs and agents to actually deal with that but that is unique to an org of our scale.

A clinical diagnosis isn't the only way to look at what's going on here. We can have differences that aren't medical problems. Differences that are measurable and nameable, even. Those categories can overlap with or be congruent to medical terms while still being valid and useful.