Rendered at 14:50:12 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jmathai 2 days ago [-]
I think Claude Code can write very good end to end tests given the right constructs.
I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.
It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).
I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.
Every new feature adds tests, and verifies it hasn’t broken existing features.
I don’t even bother with unit tests. It’s working amazing.
Aamir21 5 hours ago [-]
I tried claude code, and it did write some good quality e2e tests but my biggest worry was the full coverage. Its really difficult to quantify e2e test coverage the way developers do unit test coverage. its really impossible. specs is just one artifact just like code is just one of many artifacts that full system wide e2e coverage needs. addng production logs + producton incidents which I tried also give me some sense of full e2e coverage. if you are using claude code for dev and testing both, its like having cake and eat it too. If claude for whatever reason misrepresent or misinterpret a requirement, that will percolate in code and testing as well. having a 3rd party testing tool is appropiate with allthe data flowing in it like specs, legacy tests, prod incidents, code and then perhaps we can expect full unbiased test coevrage. I am not talking about wanna be enterprise apps or hobby apps, i am talking about >v0 enterprise apps that have real customers and real downside if they go down with rich data set of past incidents and not so perfect code but now they are increasingly using agentic ai to produce more non-human code. they need a 3rd party tool that ingest their data, create a KG understanding of their data and prevent crtical bugs leak into production by geenrating small number of high quality high coverage tests.
allinonetools_ 1 days ago [-]
Interesting approach. I have noticed the same issue — AI tools generate a lot of code and unit tests, but real user-flow or edge-case testing often gets skipped. Having something that reads the PR context and suggests missing scenarios could actually catch problems earlier.
Aamir21 5 hours ago [-]
i agree, but i wwant to add that perhaps just specs might not give you full testing coverage, have to add other artifacts too, like prod logss and incidents and using some layer of ontology + KG to produce meaningful data connectins and understanding. vector db alone will only give semantic search and grossly incompetent to connect data artifacts. for example for vector db, word apple and company apple might both be same without outlininig the context.
aialok 23 hours ago [-]
Interesting Man!
Aamir21 5 hours ago [-]
lets connect if you like to see some lessons learned?
I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.
It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).
I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.
Every new feature adds tests, and verifies it hasn’t broken existing features.
I don’t even bother with unit tests. It’s working amazing.