Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
JOHNWICK
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
AI Automation vs Ad-Hoc Tasks with LLM
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
When developing solutions based on agent frameworks, you face a dilemma: how to make the agent’s work reliable and predictable without going overboard with tool integrations (since those are what largely determine agent capabilities). Those of you choosing the easy path of ready-made agents extensible through MCP are martyrs. This kind of instant-fix mess falls apart quickly and causes major indigestion. [[file:AI_Automation_vs_Ad-Hoc_Tasks_with_LLM.jpg|500px]] Need repeatable results, stability, predictability, consistency, and easy debugging? Forget about it. An agent loop based on ReAct + MCP tools is pure alchemy. Only prayers and a lucky rabbit’s foot will help you here. The main reason is that: 🔹 The model itself is non-deterministic, yet you’re asking it to make the same decision given identical input data. There will be no idempotency. It will periodically go down different paths, calling tools with wrong parameters or completely wrong tools (especially when there are many tools from the same domain). 🔹 You end up with way too many failure points. Each MCP tool individually is a wrapper on top of an API, and the ReAct agent involves LLM calls in a loop. You get at least these failure points: API internals, API↔MCP server integration, MCP server↔agent loop integration, failures within the agent loop itself (wrong tool call decisions, bad reflection, infinite loops, model API failures), plus actual code bugs. If you calculate the failure probability of each node separately, you get depressing numbers. Let’s say you’re a former pacemaker developer, and all the services you use were also built by people like you — meaning all nodes have 99.99% reliability. If you multiply all the reliabilities (and raise this to the power of the number of agent flow steps), you get the expected workflow reliability. Roughly, for a 5-step system flow you get: 0.9999 (API) × 0.9999 (MCP) × 0.9999 (Agent) × 0.9999 (Framework) ^ 5 = 99.8%. Not even “three nines” anymore. But realistically, all these components will be lucky to have 95% reliability. Then the total reliability of such a flow comes out to around 35%. Meaning every third request will complete successfully, while two out of three will fail. This is roughly what we’re seeing in quick-fix agent solutions today. The problem is that business doesn’t need 35% reliability. It needs at least 90%. So if you’re aiming for B2B success, you need to look straight at the complex and long-term approach. What does this mean? ✓ Forget ReAct and CodeAct — build your own step planner (e.g., as a DAG) and a runner to execute them. ✓ Forget MCP tools — build your own integrations with the APIs you need. ✓ Don’t let the model make decisions — do that in code. Use the model for three things it excels at: 1) extracting facts from unstructured data and returning them in structured form 2) transforming one data representation into another 3) generating content. Overall, frame it as an engineering problem: “I need to keep everything under control,” making it more rule-based rather than alchemical, AI-based. What are the downsides? Yes, you’ll have to buckle down, do some design work, and write code. An agent solution based on your own business tools is harder to scale compared to an agent based on MCP tools, of which there are millions now (though 90% of that is garbage). But if clients come to you saying, “Why don’t you just build an agent based on Claude Desktop + MCP tools?”, you need to clarify first what scenarios they’re planning for and what reliability they expect. Because it’s quite possible they don’t need high accuracy and reliability, and the main use case for the agent is routine ad-hoc tasks. In that case, Claude Desktop + MCP or ReAct + MCP tools could very well be the solution. Read the full article here: https://medium.com/@mne/ai-automation-vs-ad-hoc-tasks-with-llm-dfb6867ca64c
Summary:
Please note that all contributions to JOHNWICK may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
JOHNWICK:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
AI Automation vs Ad-Hoc Tasks with LLM
Add topic