My experience coding with AI tooling
I’m a big believer in the notion that you can’t really critique something until you’ve tried it. With that in mind, I spent some time over the holiday break with AI, working on a couple projects. Here are the results of my experience and my thoughts on AI as a tool for developers.
The Setup
I had two tasks in mind: first, I wanted to have AI help me write some scripts that would build Docker images for my business, WPConcierge. I also wanted some help with Kubernetes configuration on AWS, for the same business. I understood that these tasks might be unique for my setup, but I was prepared to properly draft a prompt and instruct AI in exactly what I wanted.
I used both Claude Code’s Opus 4.5 and ChatGPT’s GPT-5.1. Both are “optimized for coding” according to their manufacturers. I wanted to test the latest models and evaluate whether or not they were truly suitable for coding.
I was careful to ask questions that I could have answered with enough time and patience on my own by reading documentation and/or trial and error. This way, I could properly evaluate the responses to determine whether or not they were accurate and correct.
ChatGPT and Docker Builds
The Prompts
I asked ChatGPT to work on the Docker build script. What I asked it to do was create a script that could take a comma-separated list of PHP versions and WordPress versions, and build a matrix of Docker images with those instructions. If no flags were passed, it would build the latest version of PHP and WordPress only (as provided in the configuration of the script). A --debug flag would build a debugging version of the containers listed.
The Result
ChatGPT was able to produce a script for me that would build the default version, but it wasn’t able to produce a working script that accepted the --debug flag correctly. Passing the flag seemed to be too much for ChatGPT, and the flag had no effect. Also, it had trouble exploding the list of PHP versions and WordPress versions, often defaulting to the first listed version, and ignoring the rest.
ChatGPT chose to write a Bash script, which was fine. It didn’t prompt me for language type, but it was correct in deducing that I wanted it to be portable across operating systems (I use a Mac but it would work just fine on Linux). When I edited the Bash script, it took me about three hours to get the script to run correctly, and I estimate that it saved me about an hour of coding to where ChatGPT got. I estimated that 90 minutes was spent fixing bugs and edge-cases not parsed by ChatGPT, which may or may not have been spent by me anyway, if I coded from scratch (though some of them were silly mistakes I likely wouldn’t have made).
The Drawbacks
ChatGPT followed my directions exactly, but it didn’t do anything that a senior engineer would do: it didn’t propose alternatives like Bake. It didn’t develop for the edge cases and corner cases. And it most certainly didn’t produce working code on the first pass for the happy path.
Ultimately, I feel that using ChatGPT led to me spending more time on the project, rather than less. Upon doing some research I discovered that Bake did exactly what I wanted, and I was able to write a Bake configuration in about 90 minutes – which would have saved me half the time that I spent customizing the script that ChatGPT produced (and up to 2 hours 30 minutes if I had written from scratch).
The Grade
Ultimately I have to give ChatGPT credit: it did produce a working prototype that I could edit and modify. Where it fell flat was in evaluating the entire problem, making alternative recommendations, and developing production-ready code. If I shipped what I was given, I would have been sorely dissapointed.
I give ChatGPT a C- for its efforts to produce a working script.
Claude and Kubernetes
The Background
I have a fairly complicated setup for Kubernetes, including using Amazon Web Services, Karpenter, Flux and DNS/Load Balancer management. This creates an environment that is more suitable to a human mind that can see the big picture, but I wanted to see what Claude could do with this advanced setup. I did make sure to explain the setup to Claude before we started, and I let it explore the code locally through it’s code feature, which allowed it to internalize and reason about what I had already built.
The Prompts
The initial set of prompts was to help me get the cluster running on AWS. There’s a bootstrapping problem on AWS, which is that you can’t use IAM roles with service accounts without installing their driver, but their driver runs on the nodes it can’t communicate with. I shared this problem with Claude to help solve it.
Once the cluster was running, I was having trouble with Karpenter not automatically deploying resources, and I was also having trouble with understanding why it couldn’t. I provided Claude with the output of the logs and the various commands that I was familiar with. I asked it to provide troubleshooting steps for Karpenter and Kubernetes on AWS.
The Results
Claude wasn’t very effective at solving either problem right the first time, but it was effective at remembering the state of the conversation between my prompts and feedback. Specifically, it failed to identify the right setting that was needed to get the cluster running until I found it and shared it. Additionally, Claude was unable to deduce that the reason Karpenter was not running was because the new nodes were being launched with an encrypted disk that Kubernetes couldn’t read, leading them to shut down. Since Karpenter couldn’t spin up additional nodes, it couldn’t manage the load.
Ultimately Claude was able to reason about the problems after I discovered their root causes. Once I provided it enough context it could tell me what the next steps were, but those next steps were obvious to me by that point. Claude was able to provide me with some helpful prompts to suss out the problem (commands for logs, data, etc). Unfortunately, it wasn’t very effective at reasoning about the problem and seeing the whole systems picture, even with the entire code base in its mind.
The Grade
I enjoyed working with Claude more than I enjoyed working with ChatGPT, and that made a difference to me. I also wanted to carefully take into account the fact that Claude got a harder problem, with less clear-cut edges. And, it was able to reason about the problem once I had an inkling of the issue, which ChatGPT was unable to do when prompted for the mistakes it made.
Ultimately I award Claude a C+ for its efforts to resolve the Kubernetes bugs and help me stand up a cluster. Again, this is based on the fact that Claude got a harder problem and was able to reason about it when given the right clues.
My Thoughts
The world seems to think that AI is replacing software developers because you can prompt an AI to produce the code that was previously written by that developer. I don’t think this is true. Companies that claim that most of their code is written by AI either must have a next-generation model customized and trained on their coding standards or they’re lying.
As an aside, the reason that the developer world is shedding jobs is more tied to interest rates than AI: higher interest rates makes investing more expensive for wealthy people, meaning that there’s less investment and less “free money” to drive new, unprofitable companies. Those companies in turn have to balance their budgets. Companies that are making record profits are still facing higher borrowing costs (and most use short-term lending to cover expenses until revenues can be realized) that make layoffs a key element of their strategy. And corporate greed plays a role too – shareholder value over human value.
I see the value in using AI for research and providing search engine functionality, but it’s not ready for prime time reasoning about hard problems. The human mind is still the best computer we have, and the experience that a developer has will always outweigh the AI’s ability to think. AI isn’t experienced and doesn’t have the same creativity as a human. It can’t cobble together a few different pieces of data and reason about the whole picture the same way an experienced person can.
Ultimately, I don’t think AI is ready to replace software developers or to be used as a primary tool for development of software. Yes, AI can generally get the “happy path” right. But real developers with reasoning skills need to be able to evaluate the corner and edge cases. AI isn’t a senior developer, or even a junior developer – it’s a script kiddie.
Ethical Considerations
I struggled with the idea of testing AI, even though I wanted to evaluate its performance, over ethical concerns. Specifically, AI is killing open source and abusing the planet. That all being said, testing the technology felt worthwhile, if only to determine its utility and current state of being. I wasn’t sure what to expect from AI. Ultimately I won’t use it routinely, because the answers it provides are not acceptable and the environmental/social cost is too high to justify such poor responses.
