Rendered at 23:51:26 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
bushido 3 hours ago [-]
I think the main thing which a lot of these articles miss is it's not just your Agents.md which can give you a model upgrade or the inverse.
But everything your harness looks at could be this. So the skills in your code base, the commands that you've added, the memories that were auto created, they all work towards improving or completely destroying your productivity.
And most of it is hidden. You hear people talk about this all the time where they'll be like, Oh, I use GSD or I use Superpowers and my results have gotten worse.
Your results might have gotten worse precisely because you use them (along with your memories and other skills).
rgbrgb 4 hours ago [-]
I'd guess the same has always been true for READMEs / human dev docs. Of course it doesn't transfer directly but still feels incredible to be in an age where we can measure such (previously) theoretical things with synthetic programmers.
Neywiny 2 hours ago [-]
Yeah isn't this is obvious? Bad docs create triple work: you do it wrong (1) you figure out it's not working because the doc is wrong (2) you do it the right way (3). Between 2 and 3 is figuring out what the right way is, which a good doc ideally shortcuts.
But obviously if you tell somebody "make a boiled egg. To boil an egg you have to crack it into the pan first." That's a lot worse than "make a boiled egg." Especially when you have an infinitely trusting, 0 common sense executor like an agentic model.
simonw 3 hours ago [-]
Most of my projects are without an AGENTS.md/CLAUDE.md at the moment. I've found that if the project itself is in good shape - clear docs, comprehensive tests - you don't need to tell the coding agent much in order for it to be productive.
I start a whole lot of my sessions with "Run tests with 'uv run pytest'" and once they've done that they get the idea that they should write tests in a style that fits the existing ones.
qingcharles 2 hours ago [-]
That's wild. I couldn't live without my AGENTS to make sure it keeps to the coding styles I prefer. Especially needed on greenfield projects.
A lot of my projects are built with platform versions from the last 12 months which had zero or very small amounts in the core training for the LLM, so they'll tend to avoid using the latest language options unless you prescribe them in AGENTS.
simonw 57 minutes ago [-]
Most of my projects start from a template that has just enough details (like a tests/ folder and a pyproject.toml adding pytest as a dev dependency) for my preferences to start being picked up.
asdfasgasdgasdg 2 hours ago [-]
Wouldn't the AGENTS.md containing the line, "When you make changes, they should be tested. Run tests with `uv run pytest`" basically have the same effect and save you some typing? I've never used AGENTS.md myself but I'd like to look into it because I find my agent rediscovering using a bunch of file reads very frequently in my current project.
simonw 58 minutes ago [-]
It would, but then I'd have to copy that file into 100+ repos.
I don't want it in a single global config because I like to stay with the defaults to avoid confusing myself, especially when I'm writing about how coding agents work for other people.
Basically a structured context file, that can be used to generate AGENTS.md, and also can be validated and scored.
I think it could help with this problem.
weiliddat 3 hours ago [-]
I suspect the harness (of which AGENTS and skills and similar things) should be abstracted for better overall performance. This article doesn't really go into detail about model preferences, but some other benchmarks show that different models have differnt preferences of how to use certain tools (probably related to their post training material), and it should really be managed invisibly to me as the end user.
Also curious how well LLMs can self-reflect in a loop, in terms of, here's how the previous iteration went, here's what didn't go well, here's feedback from the human, how do I modify the docs I use in a way that I know I'll do better next time.
I know you can somewhat hillclimb via DSPy but that's hard to generalize.
chickensong 3 hours ago [-]
Claude self-reflects and updates based on feedback pretty well these days, but seems to lean on memory more than updating CLAUDE.md. I don't know how well it adheres to memory, but it seems to work sometimes. I don't like how the memory is stored outside of the project directory though.
weiliddat 3 hours ago [-]
Hmm I would hope that's for better quality (if there's somehow model-specific optimizations) or search/retrieval methods down the line. But can't help but feel like the labs/providers might try to lock-in customers by making things non-portable/opaque.
chickensong 3 hours ago [-]
Oh yeah, it definitely feels like a scramble to add lock-in features.
chickensong 3 hours ago [-]
It's cool that they did some measurements, but unfortunately there's not much to learn from the article unless you're using really outdated files that you wrote by hand. The agent should know how to write a good file.
For existing files, the agent will carry on a bad structure unless you specifically ask it to refactor and think about what's actually helpful.
In general, it should be a lean file that tells the agent how to work with the project (short description, table of commands, index of key docs, supporting infra, handful of high-level rules and conventions that apply to everything). Occasionally ask the agent to review and optimize the file, particularly after model upgrades.
acgourley 3 hours ago [-]
Everytime I've asked a model to write it's Agents/Claude file it's been pretty bad actually, are you sure writing these files is actually in distribution right now?
kajman 2 hours ago [-]
I don't have a ton of experience with this, but every attempt I've made to quickly get an LLM to one-shot an AGENTS file has been too verbose in all the wrong areas. I'm not convinced LLMs are actually good at summarizing anything complex. Maybe some "blessed" prompts will bubble up in time that change my mind.
forgotusername6 4 hours ago [-]
Interesting that they had a 100% read rate of agents.md. In my test repo lower down agents.md files were occasionally missed by vscode copilot. That fact put me off putting too much effort into nesting agents.md files too much within the repo and I've been focusing on agent skills instead.
weiliddat 4 hours ago [-]
This is more a harness thing signaling the presence or forcing a read on AGENTS/CLAUDE.md right?
stavros 3 hours ago [-]
Yes it is, the main feature that differentiates AGENTS.md from other files is that the former is usually loaded into the context automatically.
qingcharles 1 hours ago [-]
I often start a new session with tagging AGENTS.md in the prompt just to make sure because I've had the same issue happen a couple of times.
lelandbatey 3 hours ago [-]
The 100% read rate is very harness/CLI dependent. The "original" idea for AGENTS.md was: the AGENTS.md file will be included as-is in the system prompt by the harness, so the agent doesn't have any choice in whether it'll be read or not. For example, this is a shortened form of what opencode sends as a system prompt for a new session when interacting with a provider (displayed in YAML for formatting, and edited for formatting):
model: foo-model
max_tokens: 32000
top_p: 1
messages:
- role: system
content: |
You are opencode, an interactive CLI tool that helps users with software engineering tasks.
Use the instructions below and the tools available to you
# ... snip ...
Here is some useful information about the environment you are running in:
<env>
Working directory: /home/user/dir
Workspace root folder: /
Is directory a git repo: no
Platform: linux
Today's date: Tue Apr 28 2026
</env>
Skills provide specialized instructions and workflows for specific tasks.
Use the skill tool to load a skill when a task matches its description.
No skills are currently available.
Instructions from: /home/user/dir/AGENTS.md
# Overview
This directory holds the entirety of the code for the <dayjob> company. All code lives in Github
under the `<dayjob>` organization, and beneath that Organization is a wide-and-flat set of all
the Git repositories of all source code at <dayjob>. That Github repo structure is replicated in
this directory via `ghorg`.
My AGENTS.md file contents start at the "# Overview" line.
Notice that the harness is just unceremoniously dumping the AGENTS.md file into the exact same text stream as the system prompt, barely contextualizing that hey, starting now, this text is from AGENTS.md and not from the harness.
If you want AGENTS.md to work (likewise, if you want skills or anything else to work) you have to know how the harness is handling/feeding them to the LLM, because no LLM will reliably look on their own.
verdverm 4 hours ago [-]
IME, multiple (good) AGENTS.md is even better. I mostly see them only at the root of a repository, but I spread more out into important subdirectories. They act as a table of contents and spark notes. Putting more focussed AGENTS.md in important places has been even more helpful.
Bonus points if you can force them into context without needing the agent to make a tool call, based on touching the files or systems near them. (my homegrown agent has this feature)
therobots927 2 hours ago [-]
Will people ever get tired of writing AI how-to slop?
themafia 4 hours ago [-]
The models are so terrible you have to think ahead of them so they don't make mistakes. This is not an upgrade. This is coping behavior.
readitalready 3 hours ago [-]
That's like saying "the programmers are so terrible you have to think ahead of them so they don't make mistakes".
avereveard 3 hours ago [-]
eh, good programmer are goal oriented, today SOTA models still need for the most part step by step guidance, so there's a gap still.
the AGENTS.md pieces that pin specific tool-call shapes or force chain-of-thought before action are coping that ages out, same lifecycle as the retry-with-different-prompt loops or chains of thought prompt most stacks shipped in 2024 to compensate for brittle instruction-following.
not quite there yet, but it's nice to see them being shorter and shorter as model release until all the basic are peeled out by the march of progress and one day only the invariants will be left there
Rekindle8090 3 hours ago [-]
No it's not actually anything like that whatsoever. Programmers are objectively, infinitely more capable than llms. Stop anthropomorphizing algorithms.
weiliddat 3 hours ago [-]
I would be very curious which programmers you have in mind when comparing to llms. Like the median programmer, or like the top 10%.
I feel like we've passed the point where an average-effort Claude Code / Cursor / Codex initialized (like basic docs, skills) project would produce a better product (not just code) than if you hired a median programmer to work on that project.
readitalready 3 hours ago [-]
lol no. LLMs are infinitely more capable than programmers.
People really do think too highly of themselves.
httpdemon 3 hours ago [-]
This is like saying programmers are so terrible that you have to think ahead of them and document your code/project so devs don’t make mistakes and anyone who thinks README files are a good thing are coping.
But everything your harness looks at could be this. So the skills in your code base, the commands that you've added, the memories that were auto created, they all work towards improving or completely destroying your productivity.
And most of it is hidden. You hear people talk about this all the time where they'll be like, Oh, I use GSD or I use Superpowers and my results have gotten worse.
Your results might have gotten worse precisely because you use them (along with your memories and other skills).
But obviously if you tell somebody "make a boiled egg. To boil an egg you have to crack it into the pan first." That's a lot worse than "make a boiled egg." Especially when you have an infinitely trusting, 0 common sense executor like an agentic model.
I start a whole lot of my sessions with "Run tests with 'uv run pytest'" and once they've done that they get the idea that they should write tests in a style that fits the existing ones.
A lot of my projects are built with platform versions from the last 12 months which had zero or very small amounts in the core training for the LLM, so they'll tend to avoid using the latest language options unless you prescribe them in AGENTS.
I don't want it in a single global config because I like to stay with the defaults to avoid confusing myself, especially when I'm writing about how coding agents work for other people.
Basically a structured context file, that can be used to generate AGENTS.md, and also can be validated and scored.
I think it could help with this problem.
Also curious how well LLMs can self-reflect in a loop, in terms of, here's how the previous iteration went, here's what didn't go well, here's feedback from the human, how do I modify the docs I use in a way that I know I'll do better next time.
I know you can somewhat hillclimb via DSPy but that's hard to generalize.
For existing files, the agent will carry on a bad structure unless you specifically ask it to refactor and think about what's actually helpful.
In general, it should be a lean file that tells the agent how to work with the project (short description, table of commands, index of key docs, supporting infra, handful of high-level rules and conventions that apply to everything). Occasionally ask the agent to review and optimize the file, particularly after model upgrades.
Notice that the harness is just unceremoniously dumping the AGENTS.md file into the exact same text stream as the system prompt, barely contextualizing that hey, starting now, this text is from AGENTS.md and not from the harness.
If you want AGENTS.md to work (likewise, if you want skills or anything else to work) you have to know how the harness is handling/feeding them to the LLM, because no LLM will reliably look on their own.
Bonus points if you can force them into context without needing the agent to make a tool call, based on touching the files or systems near them. (my homegrown agent has this feature)
the AGENTS.md pieces that pin specific tool-call shapes or force chain-of-thought before action are coping that ages out, same lifecycle as the retry-with-different-prompt loops or chains of thought prompt most stacks shipped in 2024 to compensate for brittle instruction-following.
not quite there yet, but it's nice to see them being shorter and shorter as model release until all the basic are peeled out by the march of progress and one day only the invariants will be left there
I feel like we've passed the point where an average-effort Claude Code / Cursor / Codex initialized (like basic docs, skills) project would produce a better product (not just code) than if you hired a median programmer to work on that project.
People really do think too highly of themselves.