Commit Graph

4 Commits

Author SHA1 Message Date
Your Name
7f4088bcd0 fix(aiops-p0): 六大病根 P0 全面修復(ADR-092 B4)
【P0.1】knowledge_extractor_service.py:210 — AttributeError 修復
- Signal.description 欄位不存在(100% 失敗,KM 每天+5 根因)
- 改用 alert_name + annotations.summary 拼接文字

【P0.2+P0.3】Gate 9+11 唯讀指令鬆綁
- blast_radius_calculator: kubectl get/top/describe/logs/version → score=1(非 50)
- operation_parser: 增加 INVESTIGATE 類型識別(唯讀 kubectl 不回 None)
- executor.py: OperationType 新增 INVESTIGATE enum
- approval_execution.py: INVESTIGATE 路徑直接呼叫 execute_kubectl_command

【P0.4】MCP SSH/K8s Provider 修復
- decision_manager: params= → parameters=(符合 MCPToolProvider.execute 簽名)
- decision_manager: MCPToolResult .get() → .success/.output(dataclass 用法)
- decision_manager + ssh_provider: 補入 hosts 120/121(原 default 缺失)
- auto_approve: phase2_agent_debate source bypass confidence 閾值

【P0.5】告警規則語義矛盾修復
- alert_rules.yaml: 8 條 kubectl 查詢規則 RESTART_DEPLOYMENT → NO_ACTION
  (CrashLoopBackOff/PostgreSQL 連線/慢查詢/MinIO 磁碟/K3s 節點/告警鏈路/SSL/CoreDNS 等)
- incident_service.py: cAdvisor/CoreDNS 從 general 拆出獨立分類

【P0.6】proactive_inspector 動態基線 PromQL 全修
- 5 個 MONITORED_METRICS PromQL 全部修正(cadvisor label/datname/blackbox)
- db_connection_pool: datname="awoooi" → "awoooi_prod"
- http_error_rate: 無效 http_requests_total → blackbox probe_success
- cpu/memory: namespace label → name=~"k8s_api_awoooi-api.*"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 15:32:23 +08:00
OG T
8582439d2d fix(kb): Signal 無 description 欄位,改用 alert_name + annotations
knowledge_extractor_service 兩處直接訪問 s.description:
- L87 signals_text 組裝:改用 alert_name + annotations.summary/description
- L198 Fallback 標題:改用 alert_name[:60]

Signal model 只有 alert_name, annotations(dict),無 description 屬性。
此修復防止 KB 萃取時 AttributeError 導致草稿無法建立。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 08:54:11 +08:00
OG T
27ba97e586 fix(ollama): 清除所有硬寫 188:11434 fallback — 全部改指向 111 GPU
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 15m59s
- decision_manager.py: 兩處 getattr fallback 188 → 111
- routes/agent.py: OLLAMA_BASE_URL 188 → 111
- knowledge_extractor_service.py: _OLLAMA_BASE 188 → 111

config.py 預設早已是 111,此次清掉 code 層殘留的 188 硬寫值。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:01:31 +08:00
OG T
c1834a7156 feat(kb+apm): KB Phase 2-A 自動萃取 + KB-D Markdown 詳情面板 + APM 趨勢圖
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m28s
- KB-A: 新增 knowledge_extractor_service.py (Ollama llama3.2:3b 本地推理)
- KB-A: incident_service.py resolve hook (fire-and-forget asyncio.create_task)
- KB-D: 引入 react-markdown + remark-gfm,知識庫詳情面板 Markdown 渲染
- KB-D: 批准/封存按鈕串接 API (POST /knowledge/{id}/approve, PATCH status)
- KB-D: i18n 新增 approving/archiving 載入狀態文字
- APM: apm/page.tsx 整合 TimeSeriesChart sparkline (使用 trend[] 欄位)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 14:40:27 +08:00