fix(BLOCKER): LLM 連續失敗 — 4 個違反設計處全部修復

統帥盤點發現飛輪沉默真因：4 個違反既定架構設計的 bug 同時撞車。 P0a — Ollama timeout 違反 GAP-B4 設計 config.py:OPENCLAW_TIMEOUT 從 120s 改 30s 原 120s 違反 ADR-052 GAP-B4 (LLM 25s hard timeout) 設計致 Ollama 過載時 thread 飢餓 120s 才降級 P0b — AI Router silent skip 觀測性修復 ai_router.py: not_registered/circuit_open/rate_limit/privacy_skip 全部累積到 errors 陣列，log all_providers_failed 時可知為何 skip 原本 errors=["ollama: Timeout"] 但 tried=4 個，無法診斷 P1a — send_text 方法不存在 bug ai_router.py:1005 tg.send_text() → tg.send_notification(parse_mode=HTML) TelegramGateway 只有 send_notification 沒 send_text 致 fallback 失敗通知本身失敗（雙重靜默） P1b — resend_stale_ready_tokens 並發爆炸 decision_manager.py: 加 asyncio.Semaphore(5) + 200ms throttle 原本 fire_and_forget N 個 task 同時跑，N=108 時 Ollama embedding 全部 timeout，包括我打的 live-fire 也被擠爆改：max 5 並發 + 每完成喘 200ms CD 流程審查 (Blocker 1): 完全符合 ADR-039 設計，10-15 min 是預期不需修，是設計就需要這時間。 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 19:37:02 +08:00
parent 35736315ce
commit 8b7e9cbfb8
3 changed files with 49 additions and 12 deletions
--- a/apps/api/src/core/config.py
+++ b/apps/api/src/core/config.py
@@ -358,8 +358,10 @@ class Settings(BaseSettings):
        description="Default Ollama model for RCA analysis",
    )
    OPENCLAW_TIMEOUT: int = Field(
-        default=120,  # 2026-04-08 ogt: deepseek-r1:14b 實測最慢 54s，120s 含 buffer
-        description="Timeout for OpenClaw AI calls (seconds)",
+        default=30,  # 2026-04-14 Claude Sonnet 4.6: 從 120s 改 30s，配合 ADR-052 GAP-B4
+        # 25s LLM hard timeout + 5s buffer。原 120s 違反 defense-in-depth 設計，
+        # 導致 Ollama 過載時 thread 飢餓 120s 才降級 fallback。
+        description="Timeout for OpenClaw AI calls (seconds, aligned with GAP-B4 25s)",
    )

    # ==========================================================================