awoooi

Author	SHA1	Message	Date
OG T	0388e50d0e	fix(p1-backlog): 修復「待分析」死結與 Telegram 訊息截斷 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 30m25s Details 問題 1：REQUEST_REVISION → 待分析根因：safe_candidates=[] → selected=None → recommended_action=None → decision_manager action="" → TG 卡顯示「待分析」（資訊流斷裂）修復 coordinator_agent.py：無安全候選時回退至 Solver 原始最優方案標記「[Reviewer 未核准，僅供參考] {action}」 SRE 永遠能看到 AI 建議，資訊流絕不中斷問題 2：debate_summary 在 (blast_radius... 中間截斷顯示 (bl 根因：root_cause=reasoning[:150] — 150 字元對中文 debate_summary 過短修復 decision_manager.py： root_cause 截斷 150 → 300 suggested_action 截斷 80 → 120 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 11:12:02 +08:00
OG T	0077ff9758	fix(solver): 傳遞 hypothesis 作為 alert_context 給 OPENCLAW_NEMO Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 根因：solver 呼叫 openclaw.call(prompt) 不傳 context → nemo fallback 把 prompt[:500]（系統說明「軍師 Agent」）當 signal description → LLM 回傳垃圾方案描述修復：把 top.description 放進 alert_context.signals 讓 nemo 看到真實根因假設（與 diagnostician 同模式 7eb8375） Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 22:51:30 +08:00
OG T	7eb837567d	fix(diagnostician): 修復 'AI 深度診斷' 垃圾根因顯示 Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 根因三層鏈： 1. openclaw.call(prompt) 不傳 context 2. OPENCLAW_NEMO fallback 把 prompt[:500]（系統說明文字）當 signal description 3. Nemo LLM 回傳 action_title="調查 AWOOOI SRE 系統的偵探 Agent"（任務描述） 4. _extract_hypotheses() 用 action_title 作為根因假設描述 → Telegram 顯示垃圾修復： - openclaw.call() 新增 alert_context 可選參數，透傳給 _call_with_fallback - diagnostician._analyze() 建立 alert_context（incident_id + evidence_summary as signal） → nemo 使用結構化 API 收到真實感應器資料而非系統說明文字 - _extract_hypotheses() nemo 格式轉換：優先用 reasoning（為什麼）作為假設描述而非 action_title（做什麼）— reasoning 更接近根因分析 2026-04-16 ogt + Claude Sonnet 4.6 (台北時區) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 22:34:48 +08:00
OG T	d294caf830	fix(solver): 相容 openclaw_nemo 回傳格式 → candidates 格式轉換 Some checks failed CD Pipeline / build-and-deploy (push) Failing after 3m51s Details 與 diagnostician 同步：openclaw_nemo 回傳 action_title/risk_level/confidence， solver 的 _extract_candidates() 找不到 candidates key → 空方案 → no_candidates 修復: 檢測 action_title 存在時轉換為 candidates 格式 risk_level → blast_radius 映射: critical=60, high=40, medium=25, low=10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:34:50 +08:00
OG T	c27709d11b	fix(diagnostician): 相容 openclaw_nemo 回傳格式 → 解除全面 ABSTAIN 根因: AI Router DIAGNOSE→openclaw_nemo 回傳 ClawBot 格式： {"action_title":"...","risk_level":"...","confidence":0.85} Diagnostician 只解析 {"hypotheses":[...]} → 永遠 0 hypotheses → ABSTAIN 修復: _extract_hypotheses() 新增 openclaw_nemo 格式檢測與轉換 action_title→description, confidence→confidence, risk_level→category 影響: 所有 critical alert 自 2026-04-15 收到後一律 ABSTAIN，無任何修復動作 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:32:32 +08:00
OG T	7e3cc8b3b0	fix(agents): 移除人工 per-agent timeout，LLM 必須等完整回應 Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 原設計 asyncio.wait_for(timeout_sec=25s) 是任意截斷，只要 LLM 超過時限就降級為 confidence=20%，根本沒有分析。正確做法： - 移除所有 4 個 agent 的 asyncio.wait_for() 包裝 - 只留 except Exception 捕真實異常（連線失敗、模型崩潰） - 全流程由 Orchestrator GLOBAL_TIMEOUT_SEC=90s 防掛死 - _PER_AGENT_TIMEOUT_SEC 常數廢棄移除影響：LLM 推理多久就等多久，不再人工截斷， deepseek-r1:14b 等模型得以完整輸出分析結果。 2026-04-16 ogt + Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 02:54:34 +08:00
OG T	14a02263ae	feat(Phase 4): 主動巡檢 + 趨勢預測 + 8D 感官升級全部完成 Some checks failed CD Pipeline / build-and-deploy (push) Failing after 12m32s Details ## Phase 4 完整交付（ADR-084） ### 新增服務 - trend_predictor.py: numpy 線性回歸，4h 閾值突破預警，R² 信心評分 - proactive_inspector.py: 每 5 分鐘主動巡檢協調器 - DynamicBaselineService（3σ 偏離） - LogAnomalyDetector（新 Drain3 pattern） - TrendPredictor（斜率外推 4h 預測） - Shadow Mode + 30 分鐘去重 + Holt-Winters 背景重訓 ### 8D 感官升級（EvidenceSnapshot Phase 4 增強） - PreDecisionInvestigator._collect_phase4_anomalies(): 決策前讀取 ProactiveInspector 最近巡檢快取 + LogAnomalyDetector 新 pattern - EvidenceSnapshot.anomaly_context: 新欄位，Phase 4 動態異常上下文 - DiagnosticianAgent._build_prompt(): prompt 包含 anomaly_context， LLM RCA 可參考動態基線偏差與趨勢預警 ### 資料庫遷移 - incident_evidence: ADD COLUMN anomaly_context JSONB（冪等） ### main.py - 啟動 run_proactive_inspector_loop() asyncio task 2026-04-15 ogt + Claude Sonnet 4.6（亞太）: Phase 4 全部完成 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 15:47:05 +08:00
OG T	42bc1df9f9	fix(phase2): 驗證發現兩處安全漏洞並修正 Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 手動驗證執行中發現： 1. reviewer_agent.py: force push regex 只覆蓋「force push」文字順序，漏掉 git 實際格式「git push --force」(push 先, --force/-f 後) → 修正為雙向 pattern：(?:force.{0,5}push\|push.{0,30}(?:--force\|-f\b)).{0,30}main 2. coordinator_agent.py: Critic critical challenge 僅施 0.3 penalty，當原始信心 > 0.7（如 0.82）時 penalty 後仍 > 0.4 閾值， critical challenge 穿透到 auto-execute 路徑（驗證確認：0.82→0.52>0.4） → 新增 Critic REJECT 硬閘（等同 Reviewer REJECT 效力），在 penalty 邏輯前強制 requires_human_approval=True Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 13:48:55 +08:00
OG T	5ddba6d6e0	feat(adr-082): Phase 2 多 Agent 協作 — 5 角色辯證系統骨架上線新增 5 個 Agent + Orchestrator + DecisionManager 接線： - protocol.py: DiagnosisReport / ActionPlan / ReviewVerdict / CriticReport / DecisionPackage 型別系統 - DiagnosticianAgent: RCA 根因分析，confidence < 0.4 → ABSTAIN - SolverAgent: 修復方案軍師，blast_radius 評分 + 降級 rule-based mock - ReviewerAgent: 安全審查，HARD_RULES 靜態 pattern + blast_radius 閾值 (>50 revision, >80 reject) - CriticAgent: 刻意唱反調，強制 3 問批判性思維，critical challenge → REJECT - CoordinatorAgent: 純規則聚合，6 級決策閘，REQUEST_REVISION → 強制人工 - AgentOrchestrator: 30s 全局超時，Reviewer ‖ Critic 並行，DB Immutable Event Sourcing + Redis Streams - DecisionManager: AIOPS_P2_ENABLED gate + _package_to_proposal_data 橋接既有 proposal_data 格式 - AgentSession DB table + 4 個複合 index - ADR-082 決策記錄 Gate 2 修復（7 項）: - CRITICAL: DELETE FROM regex lookahead 位置錯誤（移至 FROM 後） - CRITICAL: REQUEST_REVISION 可抵達 auto-execute 路徑（改回 requires_human_approval=True） - IMPORTANT: _extract_json flat regex 不支援巢狀 JSON（改 find/rfind 邊界提取） - IMPORTANT: all_degraded 遺漏 verdict.degraded（補全 4 個 Agent） - IMPORTANT: Solver ABSTAIN guard 放行降級假設（改為無論 hypotheses 有無均跳過） - IMPORTANT: dataclasses.asdict() Enum 未序列化導致 DB 寫入靜默失敗（加 json.dumps default handler） - IMPORTANT: P2 gate 直讀屬性繞過父 Phase 守衛（改用 is_phase_enabled(2)） Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 13:48:55 +08:00
OG T	938df7f291	fix(api): 全面清除假信心分數 - 遵循 feedback_confidence_truthfulness.md 🔴 違規修正: 規則匹配/Expert System 不是 AI 分析，confidence 必須 = 0.0 修正檔案: - agents/action_planner.py: 0.9 → 0.0 - agents/blast_radius.py: 0.85/0.5/0.9 → 0.0 - agents/security.py: 計算公式 → 0.0 - signoz_webhook.py: 0.7 → 0.0 - auto_approve.py: default 0.5 → 0.0 - ci_auto_repair.py: 整個計算函數 → return 0.0 - error_analyzer_service.py: default 0.5 → 0.0 - intent_classifier.py: 計算公式 → 0.0 - openclaw.py: default 0.5 → 0.0 - resource_resolver.py: 0.8 → 0.0 - k8s_naming.py: 0.9/0.7 → 0.0 只有 LLM 真實分析返回的 confidence 才能 > 0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-29 16:00:46 +08:00
OG T	6f049877fc	fix(lint): ruff auto-fix + lewooogo-core src 加入 git - Python: ruff --fix 修復 280 個 lint 錯誤 - lewooogo-core: src/ 目錄未追蹤，導致 CI eslint 失敗 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:51:37 +08:00
OG T	7478dc0254	feat(phase6-9): Complete modular architecture and Agent Teams Phase 6.4 - Modular Architecture: - Add lewooogo-brain adapters for LLM providers - Add lewooogo-data dual memory (Redis + PostgreSQL) - Implement consensus engine for multi-agent decisions - Add incident memory service for historical context Phase 9 - Agent Teams (Claude Agent SDK): - Add base agent class with Claude Sonnet 4 integration - Implement action planner, blast radius, and security agents - Add agent API endpoints and proposal workflow - Integrate ADR-009 OpenClaw Agent Teams architecture DevOps & CI/CD: - Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml) - Add pre-commit hooks and secrets baseline - Add docker-compose for local development - Update Kubernetes network policies Frontend Improvements: - Add auto-healing error boundary component - Update i18n messages for agent features - Enhance dual-state incident card with execution feedback Documentation: - Add 7 ADRs covering MCP, design system, architecture decisions - Update ARCHITECTURE_MEMORY.md with modular design - Add GLOBAL_RULES.md and SOUL.md for project identity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 18:40:36 +08:00

12 Commits