Tools & Tutorials

I tried Claude Code for a week. Here's what actually changed.

Not the 10x velocity story you've heard. A week-long, honest log of the wins, the dead ends, and the one moment where the tool genuinely thought more carefully than I would have.

Tools & TutorialsProductivityAI at Work

Published May 17, 2026

5 min read

I work on a small dev team at a Series A startup. Three engineers, one designer, a backlog longer than any of us want to talk about. I'd been seeing the Claude Code demos for months and rolling my eyes a little. Every six weeks there's a new AI coding tool that's going to change everything, and most of them end up in the same drawer as my ergonomic mouse.

A friend told me to just try it for a week before forming an opinion, so I did.

Monday

I installed it before standup. Standup ran long. I forgot it was installed until the afternoon, when I sat down to fix a test that had been failing intermittently in CI for two weeks.

Day	Task	What actually happened
1	Setup + first PR	Slow to ramp on our codebase conventions. Helpful once it had.
2	Bug triage	Surfaced 2 real bugs in test files I'd been ignoring for weeks.
3	Race-condition fix	Suggested a defensive lock I hadn't considered. Shipped it. Held up.
4	Refactor (one)	I overrode its recommendation. Was wrong. Hit prod 8 hours later.
5	Docs	Wrote a fine first draft. Tone needed a heavy hand.
6	Friday cleanup	Caught 3 dead imports, 1 stale TODO, 1 typo in a config key.
7	Postmortem	The race-condition save was worth the week. Net positive.

Day-by-day log of the experiment. Not a velocity story — a careful-thinking story.

The test was for an email-rendering helper that occasionally returned the wrong template. I'd looked at it twice and given up because the bug only happened in CI, never on my machine. I typed: "this test fails about 10% of the time in CI, can you find the race condition?"

It found it in about forty seconds. The helper read the locale from process.env, and an earlier test in the same file mutated that env. I'd genuinely lost two hours to this last month.

That's the first time I felt the tool was real.

Tuesday

I tried it on a feature I understood: a CSV export endpoint for our admin panel. I gave it the route layout, the auth middleware, and asked it to write the handler.

It wrote a working handler. It also added pagination I didn't ask for, a streaming response so we wouldn't OOM on 100k-row tables, and a unit test that actually exercised the streaming behaviour.

I read it carefully. It was better than what I would have written. The streaming choice in particular was something I'd have skipped on the first pass and rewritten a week later when an account manager complained about a download timing out.

This was the second moment where it surprised me. It didn't just type faster. It thought more carefully than I would have, on a task I knew well.

Wednesday

Burned three hours on a Stripe webhook handler. I gave it incomplete context about our existing webhook router and it confidently built around the wrong abstraction.

This is the failure mode nobody talks about enough. When you don't deeply understand the surface you're touching, you don't know what context to give the agent, and the agent will confidently build the wrong thing. The lesson isn't "AI is bad at Stripe". The lesson is the same one we've known for fifteen years: shortcut-driven development falls apart the moment the shortcut leaves you somewhere you can't navigate alone.

Thursday

A senior dev on another team pinged me about a query taking 12 seconds. I pasted it in and asked for index suggestions.

It suggested two. One was wrong (the path was already covered by an existing index). One dropped the query to 30ms.

It felt the way pair-programming with a really experienced senior used to feel. Half of what they suggest, you knew. The other half makes you better. The trick is knowing the difference.

What actually changed

The win wasn't speed — it was noticing. The AI flagged things I'd stopped seeing in my own code. I kept the tool. I didn't keep all of its suggestions.

Friday

I asked it to review a utility for normalising user-uploaded URLs that I'd refined over six months and was quietly proud of.

It suggested seven improvements. Five were genuinely good. Two were wrong in subtle ways that would have introduced real bugs. I accepted three immediately, sat with two more, and rejected the rest.

I went home thinking about whether I'd still be the kind of engineer who can tell the difference in five years. I think I am today. The gap between "engineer who knows when AI is wrong" and "engineer who has been replaced by AI" is mostly about who keeps writing enough code by hand to maintain that judgement.

What actually changed

Three things, honestly.

My personal velocity on understood problems went up maybe 30 to 40 percent. Not 10x. Real engineering work has too many decisions in it for that.

My patience for boilerplate dropped to zero. I now feel mildly insulted when I have to write a CRUD endpoint by hand. It's like being asked to compile your own code.

I started reviewing my own work more carefully, because I had to review the agent's work carefully. This is the real surprise. The tool made me a more disciplined reader of code, which I think makes me a better writer of it.

What didn't change

I still spend most of my day in meetings, debugging, and arguing about what we're actually building. AI doesn't help with any of that yet. The hard parts of software are still mostly human-shaped.

Next week I'll keep using it. I'll also keep writing some code by hand, because the engineers worth hiring in two years are the ones who can switch between modes.