The "upside" description: On the other you have a non-technical executive who's ...

majormajor · 2026-02-02T01:41:53 1769996513

One of the dirty secrets of a lot of these "code adjacent" areas is that they have very little testing.

If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.

If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.

It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.

benjijay · 2026-02-02T11:11:10 1770030670

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output?

I recently watched a demo from a data science guy about the impending proliferation of AI in just about all related fields, his position was highly sceptical but with a "let's make the most of it while we can"

The part that stood out to me which I have repeated to colleagues since, was a demo where the guy fed his tame robot a .csv of price trends for apples and bananas, and asked it to visualise this. Sure enough, out comes a nice looking graph with two jagged lines. Pack it ship it move on..

But then he reveals that, as he wrote the data himself, he knows that both lines should just be an upward trend. Expands the axis labels - the LLM has alphabetized the months but said nothing of it in any of the outputs.

senordevnyc · 2026-02-02T13:42:20 1770039740

Like every anecdote out there where an LLM makes a basic mistake, this one is worthless without knowing the model and prompt.

gipp · 2026-02-02T15:42:32 1770046952

If choosing the "wrong" model, or not wording your prompt in just the right way, is sufficient to not just degrade your output but make it actively misleading and worse than useless, then what does that say about the narrative that all this sort of work is about to be replaced?

benjijay · 2026-02-02T16:07:47 1770048467

I don't recall the bot he was using, it was a rushed portion of the presentation to make the point that "yes these tools exist, but be mindful of the output - they're not a magic wand"

jihadjihad · 2026-02-02T14:21:45 1770042105

Always a good idea to spot check the labels and make sure you've got JFMAMJ..JASON Derulo

Hammershaft · 2026-02-02T06:30:46 1770013846

This has had tremendous real world consequences. The European austerity wave of the early 2010s was largely downstream of an excel spreadsheet errors that changed the result of a major study on the impact of debt/gdp.

https://www.newscientist.com/article/dn23448-how-to-stop-exc...

tharkun__ · 2026-02-02T01:52:10 1769997130

This is a pet peeve of mine at work.

Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.

aschla · 2026-02-02T01:58:18 1769997498

It seems to be an ever-present trait of modern business. There is no rigor, probably partly because most business professionals have never learned how to properly approach and analyze data.

Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.

p_v_doom · 2026-02-02T07:54:16 1770018856

Also rigor is slow. Looks like a waste of time.

What are you optimizing all that code for, it works doesnt it? Dont let perfect be the enemy of good. If it works 80% thats enough, just push it. What is technical debt?

gyomu · 2026-02-02T05:23:10 1770009790

If what you're saying 1) is true and 2) does matter in the success of a business, then wouldn't anyone be able to displace an incumbent trivially by applying a bit of rigor?

I think 1) holds (as my experience matches your cynicism :), but I have a feeling that data minded people tend to overestimate the importance of 2)...

laserlight · 2026-02-02T07:05:38 1770015938

> does matter in the success of a business

In many experience, many of the statistics these people use doesn't matter in the success of a business --- they are vanity metrics. But people use statistics, and especially the wrong statistics, to pass their agenda. Regardless, it's important to fix the statistics.

mettamage · 2026-02-02T07:08:26 1770016106

Rigor helps for better insights about data. That can help for entrepreneurship.

What also can help for entrepreneurship is having a bias for action. So even if your insights are wrong, if you act and keep acting you will keep acting then you will partially shape reality to your will and bend to its will.

So there are certain forces where you can compensate for your lack of rigor.

The best companies have both of those things by their side.

defrost · 2026-02-02T02:52:23 1770000743

I've frequently found, over a few decades, that numerical systems are cyclically 'corrected' until results and performance match prior expectations.

There are often more errors. Sometimes the actual results are wildly different in reality to what a model expects .. but the data treatment has been bug hunted until it does what was expected .. and then attention fades away.

pprotas · 2026-02-02T06:35:10 1770014110

Or the company just changes the definition of success, so that the metrics (that used to be bad last quarter) are suddenly good

skywhopper · 2026-02-02T11:52:00 1770033120

This is, unfortunately, a feature of a lot of these systems. The sponsors don’t want truth, they want validation. Generative AI means there don’t even have to be data engineers in the mix to create fake numbers.

riskable · 2026-02-02T15:37:28 1770046648

> Any and I mean any statistic someone throws at me I will try and dig in.

I bet you do this only 75% of the time.

obscurette · 2026-02-02T04:50:48 1770007848

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output?

The local statistics office here recently presented salary statistics claiming that teachers' salaries had unexpectedly increased by 50%. All the press releases went out, and it was only questions raised by the public that forced the statistics office to review and correct the data.

theshrike79 · 2026-02-03T07:37:51 1770104271

Lies, damn lies, and (unsourced) statistics.

p_v_doom · 2026-02-02T07:51:52 1770018712

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late.

Back in my data scientist days I used to push for testing and verification of models. Got told off for reducing the teams speed. If the model works well enough to get money in, and the managers that make the final calls do not understand the implications of being wrong, this would be the majority of cases.

AdamN · 2026-02-02T13:26:35 1770038795

I would say that although Claude may hallucinate at least it can be told to test the scripts. Many data scientists will just balk at the idea of testing a crazy excel workbook with lots of formulas that they themselves inherited.

theshrike79 · 2026-02-03T07:38:24 1770104304

Excel doesn't have any standard tooling for testing or verification, they're all just "trust me bro".

singingbard · 2026-02-02T04:12:27 1770005547

I did a fair about of data analysis and deciding when or if my report was correct was a huge adrenaline rush.

A huge test for me was to have people review my analyses and poke holes. You feel good when your last 50 reports didn’t have a single thing anyone could point out.

I’ve been seeing a lot of people try to build analyses with AI who haven’t been burned with the “just because it sounds correct doesn’t mean it’s right” dilemma who haven’t realized what it takes before you can stamp your name on an analysis.

decimalenough · 2026-02-02T01:37:22 1769996242

I'm almost certain it will be significantly worse.

The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.

The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.

defrost · 2026-02-02T01:44:25 1769996665

I won't disagree - I suffered from insufficient damning praise in my last sentence above.

IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.

The automated AI assisted transcoding will be ... interesting.

holoduke · 2026-02-02T06:31:14 1770013874

My assumption is that with the right approach you can create a much much better and reliable program using only Claude code. You are referring to yolo coding results

bitwize · 2026-02-02T05:46:44 1770011204

The thing is, when you use AI, you're not really doing things, you're having things done. AI isn't a tool, it's a service.

Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.

So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.

Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo

chrisjj · 2026-02-02T09:59:23 1770026363

> be thinking twice before trusting anything an LLM tells me

You're too modest. You'd be thinking once.

However when the parrot is hidden in a shiny box made up to look like a regular, relatively trustworthy program...

xmcqdpt2 · 2026-02-02T12:09:46 1770034186

I work in finance and we have prod excel spreadsheets. Those spreadsheets are versioned like code artifacts, with automated testing and everything. Converting them to real applications is a major part of the work the technology division does.

They usually happen because some new and exciting line of business is started by a small team as a POC. Those teams don't get full technology backing, it would slow down the early iteration and cost a lot of money for an idea that may not be lucrative. Eventually they make a lot of money and by then risk controls are basically requiring them to document every single change they make in excel. This eventually sucks enough that they complain and get a tech team to convert the spreadsheet.

defrost · 2026-02-02T12:36:26 1770035786

I too have seen such things.

My experience being they are an exception rather than the rule and many more businesses have sheets that tend further toward Heath Robinson than would be admitted in public.

* https://en.wikipedia.org/wiki/W._Heath_Robinson

PunchyHamster · 2026-02-02T08:31:48 1770021108

we're going from "bad excel sheet caused recession" to "bad vibe-coded financial thing caused recession"

ruszki · 2026-02-02T18:56:47 1770058607

I heard already a story like that. My home country’s government made already some decisions based on some vibe code verified by nobody. It was made by one of my friends. Nobody cares.

theshrike79 · 2026-02-03T07:35:47 1770104147

At least the Python version can be verified with existing static analysis tooling and can have comprehensive unit tests.

The Excel never had any tests and people just trusted it.

simonebrunozzi · 2026-02-02T10:59:18 1770029958

> Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.

You made me laugh hard. :)

biophysboy · 2026-02-02T15:33:30 1770046410

Ha! I also have a physics background and had the same gag reflex.

ChrisMarshallNY · 2026-02-02T01:46:53 1769996813

Obligatory xkcd: https://xkcd.com/1667/