Remix Hacker News Clone

Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs

The agent-agnostic approach is interesting, but I think the bigger architectural question is what happens when you move beyond code generation into domains where the agent's output has real-world consequences — booking a flight, executing a trade, dispensing medical advice.

For code, the worst case is a bad PR that gets caught in review. For domain-specific agents handling real transactions, you need a fundamentally different trust model. The LLM can't be making the decisions — it needs to be constrained to intent parsing while deterministic logic handles execution. Sandboxing the runtime (what you're doing) is necessary but not sufficient. You also need to sandbox the decision space.

Curious whether you've seen demand for non-SWE agent workloads, or if the "prompt to PR" pattern is where most of the traction is right now.

by telivity-real1775851967

I’ve been developing an open-source version of something similar[1] and used it quite extensively (well over 1k PRs)[2]. I’m definitely believer of the “prompt to PR model”. Very liberating to not have to think about managing the agent sessions. Seems that you have built a lot of useful tooling (e.g., session videos) around this core idea.

Couple of learnings to share that I hope could be of use:

1) Execution sandboxing is just the start. For any enterprise usage you want fairly tight network egress control as well to limit chances of accidental leaks or malicious exfiltration if theres any risk of untrusted material getting into model context. Speaking as a decision maker at a tech company we do actually review stuff like this when evaluating tools.

2) Once you have proper network sandboxing, you could secure credentials much better: give agent only dummy surrogates and swap them to real creds on the way out.

3) Sandboxed agents with automatic provisioning of workspace from git can be used for more than just development tasks. In fact, it might be easier to find initial traction with a more constrained and thus predictable tasks. E.g., “ask my codebase” or “debug CI failures”.

[1] https://airut.org [2] https://haulos.com/blog/building-agents-over-email/

by hardsnow1775846118

Congrats on the launch, the agentbox-sdk looks interesting, but seeing as the first commit was 3 days ago - I feel a little wary to use it just yet!

One question, do you have plans for any other forms of sandboxing that are a little more "lightweight"?

Also how do you add more agent types, do you support just ACP?

by dennisy1775850955

So instead of using my Claude Code subscription, I can pay the vastly higher API rates to you so you can run Claude Code for me?

by gbnwl1775847775

Does it support running Docker images inside the sandbox?

by a_t481775851812

24/7 running coding agents are pretty clearly the direction the industry is going now. I think we'll need either on-premises or cloud solutions, since obviously if you need an agent to run 24/7 then it can't live on your laptop.

Obviously cloud is better for making money, and some kind of VPC or local cloud solution is best for enterprise, but perhaps for individual devs, a self-hosted system on a home desktop computer running 24/7 (hybrid desktop / server) would be the best solution?

by 2001zhaozhao1775847237

How does this compare to Claude Managed Agents?

by Mr_P1775845403

> Run the same agent n times to increase success rate.

Are there benchmarks out there that back this claim?

by hmokiguess1775845901

[dead]

by korix1775848925

[dead]

by jheriko1775846806