Concordal
The Decision Dialectic · Concord
All posts
Methodology

Multi-Agent LLM vs Single-Prompt ChatGPT for Stock Analysis

Why does asking ChatGPT 'should I buy AAPL?' produce a hand-wavy answer? Because one prompt can't do five specialist jobs. Here's what changes when you separate roles.

2026-05-097 min readen

If you've ever pasted "should I buy AAPL?" into ChatGPT, you got back a paragraph that hedges every direction and ends with "consult a financial advisor". That isn't because the model can't reason about stocks — it's because one prompt is trying to do five jobs and ends up doing none of them well.

The single-prompt failure mode

When you ask a single model to analyse a stock end-to-end, it tries to be a fundamentals analyst, a chart reader, a news scanner, a macro economist, and a portfolio manager simultaneously — in one context window. Three things go wrong:

First, attention is finite. The model spends its "thinking budget" on whichever angle the prompt emphasised most, neglecting the others. Second, conflicting signals get smoothed. If fundamentals say BUY and technicals say SELL, a single-prompt answer averages them into a HOLD with low confidence — which is rarely the optimal trade. Third, there's no adversarial pressure. A bull case never gets seriously challenged because the same model wrote both the bull and the bear sides at the same time.

This isn't a flaw in the model — it's a flaw in how you're using it. Putting a brilliant generalist in five jobs at once doesn't make them five times more productive.

What changes with role separation

Our pipeline runs five specialist analysts with separate prompts, separate context windows, and separate sources of evidence:

The fundamentals analyst sees only SEC EDGAR filings and computed ratios. It is not allowed to look at price action.

The technical analyst sees only OHLCV and our Alpha158-lite factor library. It is not allowed to look at news.

The sentiment analyst sees only Reddit / 东方财富股吧 posts. It is not allowed to see fundamentals.

The news analyst sees only headlines and dates. It is not allowed to see chart patterns.

The macro analyst sees only FRED + OpenBB time-series. It is not allowed to see ticker-specific data.

Each one produces a structured opinion: thesis, evidence, confidence. Then — and this is the key step — a bull persona and a bear persona read all five reports and write opposing pitches. Then a trader synthesises. Then a risk committee kills the trade if leverage or position size violates limits. Finally a manager signs off with a confidence-weighted BUY / HOLD / SELL.

Why the debate step matters

This is the piece a single prompt cannot do. When you ask one model to argue both sides, it produces symmetrical, hedge-y arguments. When you ask two separate instances — one explicitly told it's a bull, one explicitly told it's a bear — you get the strongest version of each case. The trader role then has real material to weigh, instead of synthesising from a pre-smoothed average.

In our own A/B testing on 78 trading weeks across 20 tickers, the multi-agent pipeline produced calibrated confidence values (the system's 70%-confidence calls were right roughly 70% of the time), while a single-prompt baseline was systematically over-confident. That's on /track-record.

"Just use a longer prompt"

People try this. The problem isn't prompt length; it's context contamination. Once a model has seen the price chart, its reading of the 10-K is anchored. Role separation is the mechanism that prevents anchoring — it's the same reason real investment committees give analysts independent assignments before convening.

The cost trade-off

Honest answer: multi-agent is more expensive. A single ChatGPT prompt costs <$0.01. Our pipeline averages $0.04–$0.10 per decision because it runs 8–11 model calls. That's still nothing per decision, but it's an order of magnitude more.

The cost is bounded by our LLM router's fallback chain (Gemini → OpenAI → Anthropic → DeepSeek → Qwen → GLM). Whichever provider has spare quota at the time gets the work. Daily caps prevent runaway spend. See the cost model →

When single-prompt is fine

Asking ChatGPT "explain what a P/E ratio is" or "summarise Apple's latest 10-K": single prompt is the right tool. Conceptual lookup, summarisation, definitions — one model, one prompt, done.

Asking for a directional trading decision with calibrated confidence: that's where role separation pays for itself.