You might as well just write instructions in English in any old format, as long as it's comprehensible. Exactly as you'd do for human readers! Nothing has really changed about what constitutes good documentation. (Edit to add: my parochialism is showing there, it doesn't have to be English)
Is any of this standardization really needed? Who does it benefit, except the people who enjoy writing specs and establishing standards like this? If it really is a productivity win, it ought to be possible to run a comparison study and prove it. Even then, it might not be worthwhile in the longer run.
For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.
.claude/skills
.codex/skills
.opencode/skills
.github/skillsI wrote a rant about skills a while ago that's still relevant in some ways: https://sibylline.dev/articles/2025-10-20-claude-skills-cons...
The pattern that works: skills that represent complete, self-contained sequences - "do X, then Y, then Z, then verify" - with clear trigger conditions. The agent recognizes these as distinct modes of operation rather than optional reference material.
What doesn't work: skills as general guidelines or "best practices" documents. These get lost in context or ignored entirely because the agent has no clear signal for when to apply them.
The mental model shift: think of skills less like documentation and more like subroutines you'd explicitly invoke. If you wouldn't write a function for it, it probably shouldn't be a skill.
[1] https://skills.sh/vercel-labs/agent-skills/web-design-guidel... [2] https://github.com/vercel-labs/agent-skills/blob/main/skills...
All of these SKILLS.md/AGENTS.md/COMMANDS.md are just simple prompts, maybe even some with context links.
And quite dangerous.
My tooling was previously adding in AI hints with CLAUDE.md, Cursor Rules, Windsurf Rules, AGENTS.md, etc., but I recently switched to using only AGENTS.md and SKILLS. I appreciate the standardization from this perspective.
Skills as a pattern let the agent scan a lightweight index of descriptions, then pull in the full instructions only when relevant. Whether that's a .skills/ folder or a README index pointing to separate docs doesn't matter much. What matters is the separation between "what capabilities exist" and "how to execute this specific one."
The standardization part is mostly useful for distribution — being able to install and share skills across projects without manually wiring them up. Same reason we standardize package formats even though you could just copy-paste code.
My model for skills is similar to this, but I extended it to have explicit use when and don’t use when examples and counter examples. This helped the small model which tended to not get the nuances of a free form text description.
I have a feeling that otherwise it becomes too messy for agents to reliably handle a lot of complex stuff.
For example, I have OpenClaw automatically looking for trending papers, turning them into fun stories, and then sending me the text via Telegram so I can listen to it in the ElevenLabs app.
I'm not sure whether it's better to have the story-generating system behind an API or to code it as a skill — especially since OpenClaw already does a lot of other stuff for me.
I prefer to completely invert this problem and provoke the model into surfacing whatever desired behavior & capability by having the environment push back on it over time.
You get way more interesting behavior from agents when you allow them to probe their environment for a few turns and feed them errors about how their actions are inappropriate. It doesn't take very long for the model to "lock on" to the expected behavior if you are detailed in your tool feedback. I can get high quality outcomes using blank system prompts with good tool feedback.
- There is no reason you have to expose the skills through the file system. Just as easy to add tool-call to load a skill. Just put a skill ID in the instruction metadata. Or have a `discover_skills` tool if you want to keep skills out of the instructions all together.
- Another variation is to put a "skills selector" inference in front of your agent invocation. This inference would receive the current inquiry/transcript + the skills metadata and return a list of potentially relevant skills. Same concept as a tool selection, this can save context bandwidth when there are a large number of skills
Whenever there's an agent best practice (skill) or 'pre-prompt' that you want to use all the time, turn it into a text expansion snippet so that it works no matter where you are.
As an example, I have a design 'pre-prompt' that dictates a bunch of steering for agents re: how to pick style components, typography, layout, etc. It's a few paragraphs long and I always send it alongside requests for design implementation to get way-better-than-average output.
I could turn it into a skill, but then I'd have to make sure whatever I'm using supported skills -- and install it every time or in a way that was universally seen on my system (no, symlinking doesn't really solve this).
So I use AutoHotkey (you might use Raycast, Espanso, etc) to config that every time I type '/dsn', it auto-expands into my pre-prompt snippet.
Now, no matter whether I'm using an agent on the web/cloud, in my terminal window, or in an IDE, I've memorized my most important 'pre-prompts' and they're a few seconds away.
It's anti-fragile steering by design. Call it universal skill injection.
I liked that idea to have something more CLI agnostic
I have been trying to build skills to do various things on our internal tools, and more often then not, when it doesn't work, it is as much a problem with _our tools_ as it is with the LLM. You can't do obvious things, the documentation sucks, api's return opaque error messages. These are problems that humans can work around because of tribal knowledge, but LLMs absolutely cannot, and fixing it for LLM's also improves it for your human users, who probably have been quietly dealing with friction and bullshit without complaining -- or not dealing with it and going elsewhere.
If you are building a product today, the feature you are working on _is not done_ until Claude Code can use it. A skill and an MCP isn't a "nice to have", it is going to be as important as SEO and accessibility, with extremely similar work to do to enable it.
Your product might as well not exist in a few years if it isn't discoverable by agents and usable by agents.
You can have the perfect scraping skill, but if the target blocks your requests, you're stuck. The hard problems are downstream.
Neo: Ju jitsu? I'm gonna learn Ju jitsu.
[Tank winks and loads the program] Neo: Holy shit!
…including, apparently, the clueless enthusiasm for people to “share” skills.
MCP is also perfectly fine when you run your own MCP locally. It’s bad when you install some arbitrary MCP from some random person. It fails when you have too many installed.
Same for skills.
It’s only a matter of time (maybe it already exists?) until someone makes a “package manager” for skills that has all of the stupid of MCP.