The demonstration-based approach is interesting for the handoff problem. The hardest part of agentic automation isnt the first run -- its making the agent robust to the cases the demonstrator never showed it. How do you handle edge cases or failures mid-task? Does it fall back to asking the user, or does it have some recovery heuristic? Asking because we found that the failure mode surface matters more than happy-path coverage when you actually deploy these in production.