Showboat seems like it could actually be quite useful for humans too, just for making quick notes from a CLI without opening an editor. The "pop" command makes me wonder if there would be a benefit to also having an array-like in addition to the stack-like interface. It seems like it would be fairly trivial to generate an index of markdown blocks so that they could be edited individually.
I like the idea of Rodney, but I wonder if you might actually have better results by asking the agent to generate equivalent Selenium scripts instead. I'm specifically suggesting Selenium because it's been around so long so I assume there's a lot of Selenium in the LLMs training data, but there are other options that might work too.
Passing tests in your repo are great documentation of the tool at a microscopic level. And rerunning tests only burns tokens on failures (since passed tests just print a dot) so it’s token efficient too.
Some other neat tricks:
- For greater efficiency configure your test runner to print nothing (not even a dot/filename) for test successes. Agents don’t need progress dots, only the exit code & failure details
- Have your agent implement a 10ms timeout per test. pytest has hooks to do this. The agent will see tests time out and mock out all I/O and third party code - why test what one assumes third parties tested already! Your test suite is CPU-bound without a shared database, has no shared data and no tests that interfere with or depend on each other, so tests can run in parallel.
It's basically an automated test, but at a higher abstraction level and with manual verification--using CLI tools rather than a test harness. Really great work!
It's also interesting that you've shifted to Go for your agent-coded CLI tools, Simon.
https://github.com/microsoft/playwright-cli
Different from the cli used for running tests etc that comes bundled with PlayWright
Sample use:
playwright-cli open https://demo.playwright.dev/todomvc/ --headed
playwright-cli type "Buy groceries"
playwright-cli press Enter
playwright-cli type "Water flowers"
playwright-cli press Enter
playwright-cli check e21
playwright-cli check e35
playwright-cli screenshot- E2E testing of browser components
- Taking screenshots before and after and having Claude look at them to double check things
- Driving it with an API and CLI as a headless browser
Will definitely give Rodney a look.
Also, I am sure you must already know about Playwright mcp so why this? If your goal isn't to make the cli human-friendly, which is the only advantage clis have over mcps doing the same thing, then why not just use the mcp? It doesn't even handle multiple sessions and has a single global state file––this is slop.
Or alternatively, just be a skill versus a tool.
My “agents” already demo stuff all the time by just being prompted to do so. I have notations in my standard Agents.md for how I want my documentation, testing etc.