Teaching Claude to QA a mobile app

(christophermeiklejohn.com)

47 points | by azhenley 3 hours ago

4 comments

maxbeech 16 minutes ago
the worktree discipline failure is the most interesting part of this post to me. when claude is interactive, "cd into the wrong repo" is catchable. when it's running unattended on a schedule, you find out in the morning.
the abstraction is right - isolated worktree, scoped task, commit only what belongs. the failure is enforcement. git worktrees don't prevent a process from running `cd ../main-repo`. that requires something external imposing the boundary, not relying on the agent to respect it.
what you've built (the 8:47 sweep) is a narrow-scope autonomous job: well-defined inputs, deterministic outputs, bounded time. these work well because the scope is clear enough that deviation is obvious. the harder category is "fix any failing tests" - that task requires judgment about what's in scope, and judgment is exactly where worktree escapes happen.
i've been working on tooling for scheduling this kind of claude work (openhelm.ai) and the isolation problem is front and center. separate working directories per run, no write access to the main repo unless that's the explicit task. your experience here is exactly the failure mode that design is trying to prevent.
devmor 1 hour ago
Reading through this reminds me of how bot farms will regularly consist of stripped down phones that are essentially just the mainboard hooked up to a controller that simulates the externals.
When struggling with failing to reverse engineer mobile apps for smart home devices, I’ve considered trying to set something like this up for a single device.
robutsume 37 minutes ago
[dead]
leontloveless 36 minutes ago
[dead]