Commit Graph

7 Commits

Author SHA1 Message Date
Your Name
9ef9633aff fix(alerts): bypass proxy timeout for GCP Ollama 2026-05-06 08:55:14 +08:00
Your Name
c2c0b1ec82 fix(alerts): let GCP Ollama finish before cloud fallback
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m9s
CD Pipeline / build-and-deploy (push) Successful in 4m21s
CD Pipeline / post-deploy-checks (push) Successful in 1m16s
2026-05-06 05:27:55 +08:00
Your Name
7f200aff5f fix(solver): 注入告警 labels 讓 params 模板填充真實值
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 10m45s
根因:Solver LLM 不知道 namespace/pod/deployment/instance 真實值,
      recommended_actions.params 模板({labels.namespace} 等)填不出來
      → Telegram 顯示 kubectl scale deployment  --replicas=(空白)

修復:
- solver.run() 加 incident_labels 參數
- _build_prompt() 把 labels 顯式列出給 LLM 參考
- orchestrator 從 snapshot.alert_info.labels 取出後傳入

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 15:05:06 +08:00
OG T
7e3cc8b3b0 fix(agents): 移除人工 per-agent timeout,LLM 必須等完整回應
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
原設計 asyncio.wait_for(timeout_sec=25s) 是任意截斷,
只要 LLM 超過時限就降級為 confidence=20%,根本沒有分析。

正確做法:
- 移除所有 4 個 agent 的 asyncio.wait_for() 包裝
- 只留 except Exception 捕真實異常(連線失敗、模型崩潰)
- 全流程由 Orchestrator GLOBAL_TIMEOUT_SEC=90s 防掛死
- _PER_AGENT_TIMEOUT_SEC 常數廢棄移除

影響:LLM 推理多久就等多久,不再人工截斷,
      deepseek-r1:14b 等模型得以完整輸出分析結果。

2026-04-16 ogt + Claude Sonnet 4.6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:54:34 +08:00
OG T
9538f6cca4 fix(agents): 修正 Agent 5s timeout 導致 LLM 推理全部失敗
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 16m14s
根本原因: deepseek-r1:14b 實測推理 2.2-27.3s avg 10.6s
但 Diagnostician/Critic/Solver/Reviewer 全部使用 timeout_sec=5.0 (開發機測試值)
→ 67% 的 Agent 推理 timeout → 降級 confidence=20% → 自動修復從不觸發

修復:
- _PER_AGENT_TIMEOUT_SEC: 5s → 25s (覆蓋 avg 的 2.3x buffer)
- GLOBAL_TIMEOUT_SEC: 30s → 90s (3個序列Agent × 25s + buffer)
- 明確傳遞 timeout_sec 給所有 4 個 Agent 呼叫

預期效果: 正常告警 AI 分析 confidence ≥ 0.5 → 觸發自動修復

2026-04-16 Claude Sonnet 4.6 Asia/Taipei

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:28:05 +08:00
OG T
66c4eda27a feat(Phase 3): AgentSession 學習接線 — record_agent_session() + orchestrator 辯證訊號
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- learning_service.py: 新增 record_agent_session() — 5-Agent 辯證結果 → Redis analytics
  Critic 質疑 + matched_playbook_id → 輕度負向 EWMA;all_agents_degraded 記錄治理事件
- agent_orchestrator.py: run_agent_debate() 完成後 best-effort 呼叫 record_agent_session()
  Phase 3 L7×D2 學習訊號全部接線完成

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:00:18 +08:00
OG T
5ddba6d6e0 feat(adr-082): Phase 2 多 Agent 協作 — 5 角色辯證系統骨架上線
新增 5 個 Agent + Orchestrator + DecisionManager 接線:
- protocol.py: DiagnosisReport / ActionPlan / ReviewVerdict / CriticReport / DecisionPackage 型別系統
- DiagnosticianAgent: RCA 根因分析,confidence < 0.4 → ABSTAIN
- SolverAgent: 修復方案軍師,blast_radius 評分 + 降級 rule-based mock
- ReviewerAgent: 安全審查,HARD_RULES 靜態 pattern + blast_radius 閾值 (>50 revision, >80 reject)
- CriticAgent: 刻意唱反調,強制 3 問批判性思維,critical challenge → REJECT
- CoordinatorAgent: 純規則聚合,6 級決策閘,REQUEST_REVISION → 強制人工
- AgentOrchestrator: 30s 全局超時,Reviewer ‖ Critic 並行,DB Immutable Event Sourcing + Redis Streams
- DecisionManager: AIOPS_P2_ENABLED gate + _package_to_proposal_data 橋接既有 proposal_data 格式
- AgentSession DB table + 4 個複合 index
- ADR-082 決策記錄

Gate 2 修復(7 項):
- CRITICAL: DELETE FROM regex lookahead 位置錯誤(移至 FROM 後)
- CRITICAL: REQUEST_REVISION 可抵達 auto-execute 路徑(改回 requires_human_approval=True)
- IMPORTANT: _extract_json flat regex 不支援巢狀 JSON(改 find/rfind 邊界提取)
- IMPORTANT: all_degraded 遺漏 verdict.degraded(補全 4 個 Agent)
- IMPORTANT: Solver ABSTAIN guard 放行降級假設(改為無論 hypotheses 有無均跳過)
- IMPORTANT: dataclasses.asdict() Enum 未序列化導致 DB 寫入靜默失敗(加 json.dumps default handler)
- IMPORTANT: P2 gate 直讀屬性繞過父 Phase 守衛(改用 is_phase_enabled(2))

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 13:48:55 +08:00