The Taco Tuesday Singularity
How Goodhart’s Law, bad metrics, and AI combine to optimize exactly the wrong thing.
One of the business world’s favorite maxims is Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” There’s a classic joke that if revenue per employee is your north star, the mathematically optimal move is to fire everyone except one poor soul (and probably do a stock buyback to boot).
And now, with AI agents everywhere, it suddenly feels like the business world is eyeing a way to actually try it.
Prescient jokes aside, we do this because it’s genuinely hard to align an organization. Picking a single north star metric and saying “go forth and maximize” makes intuitive sense. It gives you something to measure against, compare, and make decisions around. The problem is that you often end up getting outcomes that are directly counter to what actually matters.
The Taco Tuesday AI Detector
My first professional experience with this was what I’ll call The Taco Tuesday AI Detector. A startup I worked for was all about detecting events from public social media posts. Bad stuff like fires and shootings. Good stuff like concerts and sports games.
Our company metric was, naturally, “events detected and validated per day” (we had a really incredible human-in-the-loop team). That number kept getting more aggressive.
One day I was analyzing historical data and noticed we were detecting and validating a lot of Taco Tuesday sale events on Tuesdays. A lot. It was one of those moments where you know you have to reveal something, but you can see the entire horrible sequence of meetings that will follow, so you go for a short walk and briefly consider quitting tech to take up farming.
What had happened was that, in a desperate attempt to hit the daily event number, the human team had started turning Taco Tuesday posts into “events” and categorizing them as “gatherings” (technically true). Our lovely AI models got a stream of positive feedback about what a great job they were doing and started creating more of these and related “events.”
We had, quite unintentionally, built the world’s best food discount detector, which management was less pleased about than we were.
Writing this now sounds absurd, but I think people would be shocked by what happens inside high-pressure environments where you are told to live and die by a couple of key metrics. Everyone did exactly what the spreadsheet said to do. The spreadsheet just wasn’t connected to what mattered anymore.
Optimizing the Wrong Thing (At Home)
It turns out we as individuals fall for the same phenomenon.
Since quitting my job to work on a startup, I knew I needed to track my monthly burn closely. That started out as a sensible guardrail and quietly turned into a fixation: more and more time spent shaving off ever-smaller costs, while my stress and brainspace quietly bled out in the background.
It continued until one day I closed the garage door on my car and realized all those five-dollar wins did nothing for the only things that actually mattered: time, runway, and sanity.
The metric started as a helpful simplification. Then one day I was optimizing the budget instead of the life the budget was supposed to support.
It gets even trickier when you start tackling the real questions in life: “Is my career working?” “Is my life going in the right direction?” “Am I being productive?” “Am I making the most of college?” None of these come with clean metrics, so we latch onto whatever we can see: LinkedIn likes, salary, whether or not we are in a relationship, calendar fullness, lines of code written, and so on.
Metrics start as helpful simplifications. Then one day you wake up optimizing the spreadsheet instead of the thing the spreadsheet was supposed to represent.
AI: Same Failure Mode, Faster
Our latest experiment in solving these eternal questions is throwing chatbots and AI agents at them, which I am increasingly convinced are governed by the prime Silicon Valley directive: “Move fast and break things.” The problem is that AI has the same tendency to optimize the “wrong” metric, only much faster and better than we can.
You learn this viscerally from training models. When I tried to train a Neon Racer agent, it discovered that the fastest way to “win” was to immediately die, respawn, drive off the track while invincible, and then cruise to the finish. From the model’s perspective, this was genius. It had found a loophole that maximized the reward. From a human perspective, it was completely missing the point of the game.
AI doesn’t have “common sense.” It has “whatever you told the loss function to care about.” Designing AI systems around higher-level principles is incredibly difficult and delicate.
LLMs Answer the Prompt, Not the Problem
LLMs in particular are trained to be confident and give you an answer to whatever strange request I have that day. You can see this in how hard it is for companies to build reasonable safety systems: models either err on the side of puritanical hall monitor bot or “would you like a five-step plan for bringing about nuclear armageddon” bot.
Despite all my prompting and directives, whenever I ask my chat project whether I should switch frontend frameworks, it is going to happily give me a neat pro and con list. It won’t say, “Hey Josh, I think the fastest way to get more customers probably isn’t migrating your frontend code to React.” The model is optimized to answer the prompt, not to check whether the prompt makes sense.
And that is exactly the point. These systems are incredible at executing against whatever objective you hand them. They will not tap you on the shoulder and ask, “Are we sure this is the right hill to climb?”
The New Bottleneck: Clarity
This is why I am so unconvinced that “getting good at AI prompts” is the real game. The valuable skill (which took me way too long, especially as a philosophy student, to internalize) is asking, over and over:
What am I actually trying to do here?
If this worked perfectly, would I even care?
Am I just making progress on something easy to count?
With LLMs, agents, APIs, and a ridiculous amount of tooling, you can now throw an alarming amount of force at almost any task. So the bottleneck has quietly shifted from “Can I execute this?” to “Is this even the right thing to execute?”
The new scarce resource is clarity. Clarity about what actually matters, and what is just there because it fits nicely in a spreadsheet.
It is ironic, but perhaps not surprising, that we are so enamored with a technology that shares our favorite failure mode. Humans and AI are both extremely good at optimizing whatever we are pointed at. The only difference is that AI does it faster and with nicer formatting. That does not make clear thinking obsolete. If anything, it makes it the most important thing. The more power we have to execute, the costlier it is to point that power at the wrong hill.
AI is amazing at getting to an answer. Our job is still to understand what the question is and what we are actually trying to do.
Right now we are very good at building Deep Thoughts that can give us “42” on command. The hard part is still being honest about what we are really asking.
More simply:
Tools are no longer the leverage. Clarity is.

