My week with Claude Code – hard work and gotchas
I’m a firm believer in the idea that you have to use a tool to fully critique it. With that in mind I committed to using Claude Code for an entire week, putting it through its paces to see what it could – and couldn’t – do.
First, the numbers: I used Claude Code for 62 hours. By Claude’s own estimate I completed 344 human-equivalent hours of work – a full four and a half months of work. This wasn’t asking Claude to give me a recipe for brownies – it was straight devops work managing WPConcierge. Much of this work had been deferred for a long time.
This is my story of what Claude did, what Claude got wrong, and how I see AI working going forward.
The Setup
I’m a solo founder working on a WordPress hosting service called WPConcierge. We host dozens of clients on infrastructure that grew organically – no Terraform, no scripting, just point-and-click inside the AWS user interface. I’m looking to grow the business substantially, but I know to do that I need tooling.
Ultimately my code spans twelve different repositories: everything from Terraform for the new setup to must-use plugins that I install on every client site. This wasn’t greenfield work either – I gave Claude half-finished projects that were legitimate and some are even in production. My goal was to use Claude aggressively to see how much work I could complete and how good the outcome was.
What Went Right
Claude was very effective at turning API endpoints into actionable items. On February 26th I went through all my domain names in DNSimple and set the ones I no longer want to no longer auto-renew – a savings of more than $2,000 per year. That alone would have made Claude worth its salt, but I didn’t stop there. We pushed on, analyzing Elastic File System bursting credits and making decisions to adjust the throughput to match cluster needs.
Many days looked similar: pop open Claude Code and give it instructions, architecting and deciding as I went along. I wasn’t a passive observer in any task: I was actively writing instructions, guiding decisions, and asking Claude to defend its positions. The code may be Claude but the architecture and design decisions are fully mine. I think this is the best way to work with AI, because it leaves a human in the driver’s seat.
When things went wrong, they went very wrong.
Unfortunately, Claude Code’s outputs are only as good as its inputs. And that became very apparent when a change I made caused production to go down.
Remember that I asked Claude for a throughput measure? It gave me one, for steady state. My architecture is predicated on the notion that EFS is available for an unknown number of pods running sites at any given time. It also ran on spot infrastructure to save costs, with a fallback to on-demand. The autoscaling group would try to provision spot, and at the same time, it could fall back to on-demand, if needed.
Meanwhile, AWS started hitting the spot market hard. My workloads were consistently evicted, which meant spot restarts. These starts are cold: they were pulling the entire plugin architecture on first load, and this overwhelmed my steady-state throughput and even the bursting throughput that AWS assigned. This resulted in unhealthy health checks, resulting in another start while the unhealthy version was cleaned up. And then the process repeats.
I had considered only the steady-state throughput. I had not considered a mass eviction that restarted ten services at the same time. And it was causing a cascade failure.
AI was also limited help in diagnosing the problem. It correctly detected that database connections exceeded the available pool, and recommended increasing the size of the database box (which I did). But that didn’t fix the problem. It wasn’t until I decided to check bursting credits to see if we were bursting on EFS that the problem became apparent. And when I asked AI to tell me if we were evicting spot instances, the problem came into clear focus for both of us.
The fix was simple: we bumped provisioned throughput to 50 MiB/sec to allow for us to burst above even that, and restarted the workloads in batches. But the consequences lingered: I had to let customers know that we’d experienced our first outage ever, and that it had lasted almost two hours.
In defense of Claude’s recommendation
The truth is that Claude didn’t fail in making its recommendation. In fact it was right: the steady-state called for what it prescribed. The problem was that it didn’t have the context or the ability to fully consider the consequences of that decision in the scope of the full picture. It answered the question I asked it – correctly – but it didn’t answer the broader question that needed asking.
This is a limit of the tool, not a flaw. The flaw, if there was one, was in my inability to consider all the potential side effects and ask AI to evaluate them. Had I done that it surely would have provided a different recommendation. And that’s the point: I am the person in charge of the tool, not the other way around. I have to be responsible for what it does, no matter how well it persuades me that my action is correct.
The limits and the benefits
As already discussed, AI isn’t perfect, it isn’t omniscient, and it isn’t clairvoyant. It can’t know what you don’t tell it. And it can’t make decisions without the data that it needs, and the context that is required.
Every session with Claude starts from zero. This means that you have to teach it what it needs to know. I did this through adding CLAUDE.md files and writing docs. I told it where to find the information. And over the course of that time I wrote twelve skills that it used to solve problems on my behalf – with my direct and constant guidance and supervision. I didn’t spend sixty-plus hours watching AI churn. I spent it designing and letting something else implement.
Claude is sufficiently good at grunt work. I tagged 171 resources in AWS in a matter of minutes using the API, where writing the script to do it would have taken a bit of trial and error and probably hours of writing. I created first drafts of commands, configs, etc. Then I reviewed them and demanded changes. It’s good at following architectural patterns that I define, requiring a bit of skill on my part to make those kind of recommendations. And AI didn’t get tired and make focus-based mistakes. The limit was on my comprehension and focus.
The bottom line
With 61 hours I was able to do four and a half months of work. That’s an incredible output, and yet it reflects the power of the tool. Thorough testing was required to ensure the tool did the right things, and yet it was sufficiently good at them. It also proves that development can shift away from code-as-value and into design-as-value. Code talks to computers. People talk to people.
I’m a solo founder which means there’s an effort on my part to streamline my work and punch above my weight. And by building my skills in Claude, I ensured that the tasks I repeat over and over again are streamlined and have established procedures, so we’re not reinventing the wheel on every session start.
But at the end of the day, AI is not a substitute for human judgement. Humans need to run the tool, guide the process, and stand strong against persuasive-sounding rhetoric that is illogical even if well-presented. This is hard – and we have to become better at it.
Of course, I still have ethical concerns about AI. I see its impacts on the environment and the theft of copyrighted material. I worry about developers who are being displaced by its use in corporations. And I, like everyone else, see the consequences on the job market. AI isn’t perfect. No tool is. Will it get better, more efficient, less disruptive and power-hungry? Sure. That day is not today, and we need to acknowledge it.
Still, Claude Code generally pulled code from other places and docs that I would have ordinarily read and then implemented. It did the same process I would do – just faster. And it made mistakes, self-corrected, and eventually got to the right answer. And with a decent amount of the work being done locally with Bash commands, the power consumption was better than asking it to develop a recipe for avocado toast.
What’s next
Will I incorporate AI into my workflow? Undoubtedly yes. It makes me faster, more agile, and able to command a vast amount of resources. On writing this I had Claude work in the background on well-scoped issues (nine of them), all of which completed in the time it took me to write this blog post. But I still wrote the blog post myself. I refuse to relinquish creative pursuit to a chatbot that has an attitude.
I look forward to AI engineers solving the environmental and social concerns with the tool. But waiting for them to do it before learning how to use the tool handcuffs our ability to be on the forefront of technical advancement. AI isn’t going away. It won’t go back in the box because we don’t like it. We have to adapt, or be left behind.
AI will continue to evolve. It will make mistakes, and we must correct them. But as a tool, it’s come a long way. And that is worth exploring.
