Building an internal agent: Code-driven vs. LLM-driven workflows

(lethain.com)

37 points | by pavel_lishin 3 hours ago

7 comments

galaxyLogic 2 minutes ago
What I'm struggling with is, when you ask AI to do something that answer is always undeterministically different, more or less.
If I start out with a "spec" that tells AI what I want it can create working software for me. Seems great. But let's say some weeks, or months or even years later I realize I need to change my spec a bit. I would like to give the new spec to the AI and have it produce an improved version of "my" software. But there's seems to be no way to then evaluate how (much, where, how) the solution has changed/improved because of the changed/improved spec. But, becauze AI's outputs are undeterministic, the new solution might be totally different. So AI would not seem to support "iterative development" does it?
My question then really is, why can't there be an LLM that would always give the exact same output for the exact same input? I could then still explore multiple answers by chaging my input incrementally. It just seems to me that a small change in inputs/specs should only produce only a small change in outputs. Does any current LLM support this way of working?
David 2 hours ago
> We still start all workflows using the LLM, which works for many cases. When we do rewrite, Claude Code can almost always rewrite the prompt into the code workflow in one-shot.
Why always start with an LLM to solve problems? Using an LLM adds a judgment call, and (at least for now) those judgment calls are not reliable. For something like the motivating example in this article of "is this PR approved" it seems straightforward to get the deterministic right answer using the github API without muddying the waters with an LLM.
[-]
- soccernee 28 minutes ago
  Likely because it's just easier to see if the LLM solution works. When it doesn't, then it makes more sense to move into deterministic workflows (which isn't all the hard to build to be honest with Claude Code).
  It's the old principle of avoiding premature optimization.
jaynate 1 hour ago
It’s sort of difficult to understand why this is even a question - LLM-based / judgment dependent workflows vs script-based / deterministic workflows.
In mapping out the problems that need to be solved with internal workflows, it’s wise to clarify where probabilistic judgments are helpful / required vs. not upfront. If the process is fixed and requires determinism why not just write scripts (code-gen’ed, of course).
[-]
- David 26 minutes ago
  This bothered me at first but I think it's about ease of implementation. If you've built a good harness with access to lots of tools, it's very easy to plug in a request like "if the linked PR is approved, please react to the slack message with :checkmark:". For a lot of things I can see how it'd actually be harder to generate a script that uses the APIs correctly than to rely on the LLM to figure it out, and maybe that lets you figure out if it's worth spending an hour automating properly.
  Of course the specific example in the post seems like it could be one-shotted pretty easily, so it's a strange motivating example.
mayop100 1 hour ago
This is the basic idea we built Tasklet.ai on. LLMs are great at problem solving but less great at cost and reliability — but they are great at writing code that is!
So we gave the Tasklet agent a filesystem, shell, code runtime, general purpose triggering system, etc so that it could build the automation system it needed.
Edmond 2 hours ago
There is a third option, letting AI write workflow code:
https://youtu.be/zzkSC26fPPE
You get the benefit of AI CodeGen along with the determinism of conventional logic.
dmarwicke 1 hour ago
hit this with support ticket filtering. llm kept missing weird edge cases. wrote some janky regex instead, works fine
retinaros 49 minutes ago
its just a form of structured output. you still need an env to run the code. secure it. maintain it. upgrade it. its some work. easier to build a rule based workflow for simple stuff like this.