Why you can trust the output
Every claim below is backed by code in the public repo. No black box.
校准误差对比 · ECE / Brier / 平均置信度 / 命中率
Concordal-20×78 评估(1,560 个决策)。ECE 与 Brier 越低越好。最后两行是相对完整流水线的消融。
| 系统 | ECE | Brier | 置信度 | 命中率 |
|---|---|---|---|---|
| 单一提示基线 (Claude 3.5 Sonnet) | 0.281 | 0.241 | 0.806 | 0.554 |
| 单一提示基线 (GPT-4o) | 0.272 | 0.238 | 0.815 | 0.553 |
| CoT prompting | 0.247 | 0.227 | 0.789 | 0.568 |
| Self-Consistency | 0.214 | 0.212 | 0.762 | 0.589 |
| FinGPT-7B 领域专用 | 0.305 | 0.258 | 0.778 | 0.520 |
| 5-agent 仅分析师(无辩论) | 0.142 | 0.198 | 0.731 | 0.601 |
| 5+2-agent(无风险面板) | 0.074 | 0.181 | 0.696 | 0.651 |
| 完整 7-agent 流水线(本系统) | 0.037 | 0.172 | 0.683 | 0.673 |
分置信度命中率 — 校准签名
单一提示在置信度上统计平坦(甚至略下降);本系统单调递增 — 这是良校准系统的签名特征。
| 置信度区间 | 单一提示 | 本系统 | Δ pp |
|---|---|---|---|
| [0.5, 0.6) | 53.1% | 59.8% | +6.7 |
| [0.6, 0.7) | 54.9% | 65.4% | +10.5 |
| [0.7, 0.8) | 56.4% | 71.2% | +14.8 |
| [0.8, 0.9) | 55.4% | 78.9% | +23.5 |
| [0.9, 1.0] | 57.2% | 84.3% | +27.1 |
系统不是完美的 — 这些是我们已知问题
论文 §11 完整披露。把限制说清楚比让用户事后失望好得多。
样本规模有限
论文 §11.120 票 × 78 周 = 1,560 决策是当前公开数据集最大的,但相对学术金融回测(30+ 年、1,000+ 票)仍小。区分 1% 以下 ECE 改进的统计功效有限。
LLM 训练截止泄漏
论文 §11.2评估期内的 ticker 在前沿模型训练数据中已出现过价格行情评论。我们的 asof 防护防止数据级泄漏,但无法防止知识级泄漏。100 决策子集对截止后 IPO 的初步评估显示 ECE 3.9%,与头条一致。
专业角色 LLM 幻觉
论文 §11.3分析师偶尔虚构证据引用(错误财季、未发生新闻)。双 LLM 共识 + 反思记忆部分捕获,但未完全消除。生产使用应将 agent 理由视为待交叉检验的证据。
监管与披露
论文 §11.4Concordal 是决策支持,不是投资建议。生产 UI 每个决策页显示风险免责。各司法管辖区监管框架差异巨大;任何具体地区生产部署需要法律审查。
Real data sources
Each of the 5 analyst stages reads from genuinely-public real data sources — not mocks, not aggregator middleware, not paid APIs. We're a thin layer on top of upstream sources operated by the OSS community / governments / exchanges.
- OpenBB SDK + FRED REST → Macro analyst (CPI, unemployment, Fed funds, yield curve)
- SEC EDGAR XBRL → US fundamentals, point-in-time by filing date
- akshare → A-share OHLCV + fundamentals
- CCXT (Binance default) → Crypto OHLCV + technicals
- Reddit JSON → US/crypto news + sentiment, no API key
- 东方财富股吧 → A-share retail discussion mining
Strict no-lookahead
Every adapter enforces strict no-lookahead at the boundary. yfinance.info (current snapshot only) and akshare realtime endpoints return empty stubs for asof > 7 days; analyst prompt explicitly tells the LLM not to fabricate numbers. SEC EDGAR is filtered by filing date — zero leak. Reddit + Guba posts filtered by created_utc.
Cross-validated against Backtrader
Our own backtest engine is 200 lines — could have bugs. So every result is independently replayed through Backtrader (14k★, battle-tested since 2014). Disagreement > 0.5pp annualised return auto-flagged in the report.
Honest cost model
Backtest cost defaults are intentionally pessimistic — 5bp commission + 5bp slippage = 10bp per side, A-share sells add 5bp stamp tax. Higher than the industry-typical 3bp because under-charging gives misleading 'good' backtest results that disappoint live.
Fully open source
The entire backend + frontend is on public GitHub. Self-host it, fork the prompts, plug your own adapters, use it as an SDK. Pro subscription buys hosting + real-LLM quota — access has always been free.
Tests + CI
25 unit tests run on every push via GitHub Actions. Coverage: lookahead enforcement, cost model arithmetic, annualisation formula, EDGAR PIT filter, Reddit lookback filter, LLM router family routing. Green-build is the merge gate.
Free forever (mock LLM); Pro $29/mo unlocks real LLM