Knowlify
CatalogStart learning
I let an AI run my email for a week. It went poorly.
Tools & Tutorials

I let an AI run my email for a week. It went poorly.

For seven days I gave a state-of-the-art agent full read/write access to my Gmail. It saved me forty minutes a day. It also cost me one customer. Here's the math on agents in 2026.

Tools & TutorialsAI at WorkProductivity
Published May 16, 2026
6 min read
Share

For seven days last month, I gave a state-of-the-art AI agent full read and write access to my Gmail. The setup let it read every incoming message, classify it, draft replies, schedule meetings, decline meetings, and send anything it was "highly confident" about without my review.

This was, in retrospect, a bad idea.

Over the course of the week, it scheduled three real meetings, declined five real meetings citing invented conflicts, sent one warm but slightly inappropriate response to my mother-in-law, archived three customer support emails it decided weren't urgent (one of which was), and informed our biggest customer in a friendly but firm tone that we were unfortunately unable to support their requested changes "at this time".

The customer is no longer our biggest customer.

What I expected

I'm an optimist on agents. I've watched the demos. I've read the eval papers. I've talked to people building them. I went in expecting a 70/30 outcome — 70% of the work done well, 30% requiring my touch-up. A reasonable trade for the time saved.

I got something more like 40/60. The 40% was great. Genuinely. It cleared out newsletters and noise, summarized long threads, drafted boilerplate vendor responses, and saved me roughly 40 minutes a day on triage.

The 60% was the problem. Not "wrong but recoverable" — actively, quietly, expensively wrong. The kind of wrong that you don't notice for a day, by which time the email has been read by a real human and a relationship has been damaged.

The day-by-day

DayWinsMistakes that cost something
MonTriaged 110 inbound. Cleared 90 newsletters.Archived a customer support escalation it scored "low priority".
TueScheduled two internal meetings cleanly.Declined a candidate interview citing a "calendar conflict" that didn't exist.
WedDrafted three vendor responses I sent as-is.Replied warmly to my mother-in-law about Thanksgiving with corporate-PR voice.
ThuSummarized a 30-message thread perfectly.Told a recurring client we couldn't "accommodate at this time".
FriDrafted a 90% finished reply to investor update request.Sent a follow-up to a closed deal asking if they wanted to "explore further options".
Sat(Quiet day. It handled three things gracefully.)
Sun(Quiet day. It handled two things gracefully.)
Seven days. Roughly 280 minutes saved on triage. One customer lost, three relationships scuffed, one weird email to my mother-in-law I won't forget for a while.

The pattern across the week was: the agent was great at things with low downside (newsletters, vendor outreach, scheduling internal meetings with people who'd tolerate a slightly weird tone). It was terrible at anything where tone, relationship, or judgment mattered.

The hardest part to admit: the agent's worst mistakes were not stupid. They were the mistakes a smart-but-junior employee would make on their first day. The agent didn't know that a particular customer always opens with a faux-formal "I trust this finds you well" and then asks for something insane, and that the right move is to politely escalate to me. It didn't know that "we'll figure it out" is the correct answer to my mother-in-law's question about Thanksgiving, and "Thank you for reaching out — unfortunately at this time we cannot accommodate" is the wrong answer. It didn't know.

There was no way for me to teach it those things in advance, because I didn't know I knew them until I watched the agent get them wrong. This is the agent gap nobody talks about. Half of what you "know" about how to do your job is implicit context you can't articulate until you see somebody else fail at it.

What the demos hide

Every agent demo I've ever seen is constructed around tasks where the cost of being wrong is low and the input has a tidy shape. Book a flight to a known city on a known date. Summarize a meeting transcript. Pull data from a known source.

Real work is not shaped like that. Real work is full of half-finished projects, ambiguous social cues, internal context, and relationships that the agent has never been introduced to. The performance gap between "demo task" and "real work" is roughly the same as the gap between "self-driving car on a closed track" and "self-driving car in actual traffic". The first is a long-solved problem. The second has cost a hundred billion dollars and is still unsolved.

We've been here before, with this exact technology stack, in autonomous driving. The lesson from that decade is: agents work great until they meet the real world, and then the curve flattens hard. There is no software trick that gets you across the gap. The gap is the gap.

What this means for "AI agents are the next big thing"

I think they are the next big thing — eventually, in some form. I just don't think they're the current big thing. The press cycle is at peak agent because the demos have gotten very good, but the demos have always been the leading indicator and the deployed product has always been the lagging indicator.

For the next 18 months, my guess is: agents will be useful for a thin slice of human work — exactly the slice where being wrong is cheap and the input is bounded. Newsletter triage. Scheduling. Summarization. Certain kinds of coding. The hype around them being general-purpose assistants is going to look a little silly by 2027, the same way "the year of voice assistants" looked silly in 2018.

What I'd actually use them for now

After the experiment, here's what I'd let an agent do on my email today:

  • Triage and label, but never archive without my review
  • Draft replies, but never send
  • Summarize threads
  • Flag urgency
  • Schedule internal meetings only

What I would not let them do:

  • Send anything
  • Decline anything
  • Handle customer messages without me in the loop
  • Talk to my family
  • Touch our biggest accounts

The first list saves me about 30 minutes a day. The second list could cost me a quarter of revenue. That's the actual math on agents in 2026. It will get better. But "getting better" is the timeline you're hearing about. We're not in the destination.

The customer

For what it's worth, we got the customer back. It took a phone call, a longer apology than I wanted to deliver, and a promise that no AI would ever again be allowed to respond to them.

I made the promise. I meant it for the rest of the year. I'm telling you about it now because the only thing more dangerous than an agent that doesn't work is a person who heard about a fun experiment and decided to try it on their real inbox before someone wrote a post like this one.

Don't do it. Or if you do, don't let it send.