OG T
|
2a6977343a
|
fix(telegram): 補傳 incident_id 至所有 _push_to_telegram_background 呼叫點
CD Pipeline / build-and-deploy (push) Has been cancelled
規則匹配有六顆按鈕但 Ollama/OpenClaw 路徑只有三顆,根因是
alertmanager 和 fallback 路徑呼叫 _push_to_telegram_background 時
未傳 incident_id,導致詳情/重診/歷史按鈕不顯示。
- _push_to_telegram_background: 新增 incident_id 參數
- alertmanager 主路徑: 補傳 incident_id
- alertmanager fallback 路徑: 存回傳值並補傳
- /alerts 路徑: 尚無 incident,明確傳空字串
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 22:40:22 +08:00 |
|
OG T
|
ef17720dfe
|
fix(web): 首頁 Tab 切換同步修正 — activeTabId 追蹤 URL query 變化
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-08 22:36:39 +08:00 |
|
OG T
|
286df4b3e3
|
fix(web): Sidebar section label 修正 — main 不顯示標題,legacy 用分隔線
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-08 22:33:17 +08:00 |
|
OG T
|
4aa7c179c1
|
feat(k8s): Sprint 5.1 Guardrail — service-registry ConfigMap 掛載到 API 容器
CD Pipeline / build-and-deploy (push) Successful in 16m36s
問題: Docker 容器無 ops/ 目錄,service_registry.py 找不到 YAML → 全部降級 AUTO
解法: ConfigMap 掛載 service-registry.yaml 到 /app/ops/config/
變更:
- k8s/awoooi-prod/15-service-registry-configmap.yaml (新增 ConfigMap)
- k8s/awoooi-prod/06-deployment-api.yaml (volumeMount + volume)
- .gitea/workflows/cd.yaml (Step 1c apply ConfigMap)
效果: _find_registry_path() 可找到 YAML → BLOCK/CRITICAL_HITL/STANDARD_HITL 生效
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 22:12:29 +08:00 |
|
OG T
|
9188e499cc
|
feat(web): Sprint 5 Phase 3+4 — 整合頁面完成 + 舊路由保留並存
CD Pipeline / build-and-deploy (push) Has been cancelled
Phase 3: 5 個整合頁面 (lazy import 現有內容)
Phase 4: 舊路由暫保留獨立可用,新舊並存
- /monitoring 仍可訪問 (原始頁面)
- /observability?tab=monitoring (整合入口)
- 避免 redirect 循環問題
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 22:10:46 +08:00 |
|
OG T
|
1413804378
|
feat(web): Sprint 5 Phase 3 — 5 個整合頁面 + Sidebar 路由更新
新增頁面:
- /observability: 服務監控 + APM + 錯誤追蹤 + 應用 + 服務目錄 (5 Tab)
- /automation: 自動修復 + 神經指揮 + Drift (3 Tab)
- /operations: 部署 + 工單 + 成本 + 行動日誌 + 計費 (5 Tab)
- /security-compliance: 安全 + 合規 (2 Tab)
- /knowledge: 知識庫
所有 Tab 用 React.lazy + Suspense 載入現有頁面內容
零假數據: 每個 Tab 都是現有真實頁面
Sidebar 路由更新指向新整合頁面
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 22:09:53 +08:00 |
|
OG T
|
8b5db2f58e
|
feat(infra): 切換 Ollama 到 M1 Pro 192.168.0.111 + NetworkPolicy 更新
CD Pipeline / build-and-deploy (push) Has been cancelled
- OLLAMA_URL: 188 → 111 (M1 Pro, 40+ tok/s vs 0.45 tok/s)
- OPENCLAW_DEFAULT_MODEL: qwen2.5:7b-instruct → deepseek-r1:14b (SRE最強推理)
- OPENCLAW_TIMEOUT: 90s → 120s (deepseek-r1:14b 實測最慢 54s)
- NetworkPolicy v1.3: 新增 192.168.0.111:11434 egress,移除 188 的 Ollama port
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 22:05:14 +08:00 |
|
OG T
|
c9f1bcd122
|
fix(api): service_registry 安全降級 — Docker 無 YAML 時不 crash,fallback AUTO
CD Pipeline / build-and-deploy (push) Successful in 11m37s
|
2026-04-08 21:47:38 +08:00 |
|
OG T
|
3cab16a681
|
fix(cd): 強制觸發 CD — 部署 service_registry 路徑修正 + OLLAMA_URL=192.168.0.111
|
2026-04-08 21:42:42 +08:00 |
|
OG T
|
db4b28c49d
|
fix(ci): 強制觸發 CD — service_registry.py Docker 路徑修正已包含於 1f9eea5
CD Pipeline / build-and-deploy (push) Failing after 8m45s
Pod CrashLoopBackOff: IndexError parents[5]
修復: _find_registry_path() 安全搜尋 (parents[4]/parents[3]/絕對路徑)
1f9eea5 已修復但未觸發 CI,此 commit 強制重新 build
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 21:37:49 +08:00 |
|
OG T
|
1f9eea5b74
|
fix(api): service_registry.py Path 索引修正 — 相容 Docker 容器環境
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-08 21:34:40 +08:00 |
|
OG T
|
f7c1c46f96
|
chore: 觸發 CD 部署 Sprint 5 前端
CD Pipeline / build-and-deploy (push) Failing after 10m29s
|
2026-04-08 21:23:13 +08:00 |
|
OG T
|
3c6807d79c
|
ops(monitoring): 觸發 deploy-alerts — database_detail_alerts 6條規則補部署
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 39s
d9e0fab 新增了 6 條 DB 詳細告警規則但 deploy-alerts 因 pyyaml 未安裝失敗
0f86c5c 已修復 workflow,此 commit 觸發重新部署
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 21:17:26 +08:00 |
|
OG T
|
14cb015826
|
fix(openclaw): Nemotron 重試邏輯 + exhausted log key (未提交的修改)
CD Pipeline / build-and-deploy (push) Has been cancelled
- generate_incident_proposal_with_tools: 單次 try/except → 2次重試迴圈
- 失敗 log key: nemotron_collaboration_failed → nemotron_collaboration_exhausted
- 失敗時 nemotron_enabled=True (讓統帥看到失敗狀態)
- _call_nemotron_tools: timeout 超時改為拋出異常(讓外層重試)
- 這是之前 Session 的本地修改,修正測試與實際實作不一致問題
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 21:16:34 +08:00 |
|
OG T
|
d276b39bd5
|
feat(web): Sprint 5 Phase 2 — React Flow 拓撲圖元件 (串接真實 dashboard API)
新增 7 個檔案:
- ServiceTopology.tsx: 主元件 (ReactFlow + Controls + MiniMap + 空狀態)
- GroupNode.tsx: 群組節點 (memo + 收合摘要 + CPU/RAM 指標)
- ServiceNode.tsx: 服務節點 (memo + 狀態燈 + 端口 + 延遲)
- TopologyEdge.tsx: 自定義邊線 (漸層 + 虛線)
- useTopologyData.ts: 從 dashboard store 讀取真實資料 → nodes/edges
- index.ts: 匯出
資料來源: useDashboardStore → hosts[] (HostAggregator 真實 TCP/HTTP 探測)
依賴關係: 靜態定義 (對應 ConfigMap 環境變數)
零假數據: 所有節點資料來自真實 API
TypeScript: 零新增錯誤
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 21:14:29 +08:00 |
|
OG T
|
eaa6102e69
|
feat(web): Sprint 5 Phase 1.3 — Sidebar 精簡 25→6+2+經典
導航重組 (統帥批准 2026-04-08):
- 指令中心 / → 整合: 儀表板+授權+告警+報表 (4 Tab)
- 可觀測性 → 整合: 監控+APM+錯誤+應用+服務 (5 Tab)
- 自動化 → 整合: 自動修復+神經指揮+Drift (3 Tab)
- 營運 → 整合: 部署+工單+成本+行動日誌+計費 (5 Tab)
- 安全合規 → 整合: 安全+合規 (2 Tab)
- 知識 → 知識庫
- Legacy: 經典 AI 中心 (/classic)
- 底部: 終端 + 設定
i18n: zh-TW + en 新增 7 個導航 key
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 21:10:11 +08:00 |
|
OG T
|
0f86c5c2fb
|
fix(ci): deploy-alerts 補 pyyaml 安裝步驟
CD Pipeline / build-and-deploy (push) Failing after 1m35s
Validate alerts YAML 步驟在 runner 的 python3 沒有 yaml 模組
加入 pip3 install pyyaml 前置確保環境就緒
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 21:09:53 +08:00 |
|
OG T
|
b380b6a34c
|
fix(ci): 修正 nemotron 測試函數體截斷 5000→10000 字元
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 21:09:19 +08:00 |
|
OG T
|
d9e0fab3fe
|
feat(monitoring): Sprint 5.2 Plan B — 資料庫詳細告警規則
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Failing after 17s
新增 database_detail_alerts 規則群組:
PostgreSQL:
- PostgreSQLSlowQueries: 慢查詢 >60s
- PostgreSQLDeadlocks: 死鎖發生
- PostgreSQLTooManyConnections: 連接數 >50
Redis:
- RedisKeyEviction: Key 驅逐
- RedisConnectionsHigh: 連接數 >100
- RedisCommandLatencyHigh: 命令延遲 >10ms
前置: postgres-exporter:9187 + redis-exporter:9121 已在 188 部署 ✅
Prometheus scrape 已更新 ✅
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 18:19:03 +08:00 |
|
OG T
|
170ce2f11d
|
fix(ci): 修正測試與 Sprint 5.2 部署腳本
CD Pipeline / build-and-deploy (push) Failing after 1m38s
tests/test_auto_repair_service.py:
- 更新 3個測試符合 2026-04-07 統帥指令移除門檻
- APPROVED Playbook 直接通過 (低相似度/低品質/高風險均通過)
tests/test_phase22_nemotron_collab.py:
- 更新 log key: nemotron_collaboration_failed → exhausted
ops/monitoring/docker-compose.exporters.yaml:
- 修正 postgres DSN: awoooi:awoooi_prod_2026@localhost:5432/awoooi_prod
Sprint 5.2 新增腳本:
- scripts/sprint51_e2e_validation.py: L7 E2E 驗收腳本 (T1-T5)
- scripts/ops/deploy-docker-health-monitor.sh: Plan A 一鍵部署腳本
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 18:17:48 +08:00 |
|
OG T
|
4f2f9e176f
|
feat(web): Sprint 5 Phase 1.2 — 首頁 4-Tab 結構 (全部串接真實 API)
Tab 1 戰情總覽: 保留現有首頁所有元素 (MetricsStrip + IncidentCard + OpenClaw + HostGrid + MonitoringTools)
Tab 2 告警 & 授權: 串接 /api/v1/incidents + /api/v1/approvals (真實數據)
Tab 3 活動串流: 串接 SSE /api/v1/dashboard/stream (EventSource 即時)
Tab 4 處置統計: 串接 /api/v1/stats/disposition (Sprint 4 API)
零假數據: 所有 Tab 無資料時顯示空狀態,不用 Mock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 18:17:10 +08:00 |
|
OG T
|
46ca2eadc3
|
feat(web): Sprint 5 Phase 1.1 — PageTabs 共用頁籤元件
|
2026-04-08 18:12:43 +08:00 |
|
OG T
|
11ff517406
|
feat(web): Sprint 5 Phase 0 — 安裝 React Flow + elkjs + 保留經典首頁
Phase 0:
- 安裝 @xyflow/react 12.10.2 + elkjs 0.11.1
- import 驗證通過
經典首頁保留:
- 複製現有首頁到 /classic/page.tsx (815行)
- 統帥指示: 新指令中心部署後,舊版保留供對照
零假數據鐵律: 所有新頁面必須串接真實 API
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 18:07:59 +08:00 |
|
OG T
|
39499c6be3
|
design: Sprint 5 指令中心設計稿 — 統帥批准版本
|
2026-04-08 18:03:51 +08:00 |
|
OG T
|
18452ceb9f
|
fix(ci): 補 pyyaml 依賴 + 同步 Sprint 5.1 Pydantic → TypeScript 型別
CD Pipeline / build-and-deploy (push) Failing after 1m43s
Type Sync Check / check-type-sync (push) Successful in 57s
- pyproject.toml: 新增 pyyaml>=6.0.0 (service_registry.py 需要)
- shared-types: 同步 PlaybookAction 三個新欄位
(requires_approval_level / stateful_targets / requires_pre_backup)
- shared-types: 同步 ApprovalRecord 三個新欄位
(approval_level / approval_votes / required_votes)
修正: build-and-deploy 因 import yaml 失敗 + check-type-sync 因模型未同步
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 17:06:44 +08:00 |
|
OG T
|
0847fa3a60
|
feat(sprint5.1): L2-2 — alerts-unified.yml 補 DockerContainerUnhealthy/Exited 規則
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Failing after 19s
新增 docker_health_alerts group:
- DockerContainerUnhealthy: container_health_status==0, for 2m, auto_repair=true
- DockerContainerExited: container_running_status==0, for 1m, auto_repair=true
標籤 auto_repair=true 讓 AWOOOI API 進入 Guardrail 決策鏈路,
實際修復動作由 Service Registry 分級(ADR-062)決定,
docker-health-monitor.sh(純感知層)送 webhook 後由此規則補充 Prometheus 路徑。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:40:44 +08:00 |
|
OG T
|
0af5c2e89c
|
docs(sprint5.1): LOGBOOK + ADR-062 + Skill 02 更新(首席架構師審查記錄)
- docs/LOGBOOK.md: 當前狀態更新至 L1-L5+審查完成,里程碑補充審查修正記錄
- docs/adr/ADR-062: 新增實施記錄章節(執行清單+審查問題+修正方式)
- .agents/skills/02-lewooogo-backend-core.md v2.5→v2.6:
加入 Sprint 5.1 Service Registry 模式
加入 Guardrail 保守原則(失敗 block 不放行)
加入新 Service 標準樣板(structlog/now_taipei/DI setter/try-except)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:38:31 +08:00 |
|
OG T
|
0f5fecfef5
|
fix(sprint5.1): 首席架構師審查修正 — S1×4 S2×2 S3×1
CD Pipeline / build-and-deploy (push) Failing after 1m40s
S1-1: service_registry/velero_client/preflight_service 改用 structlog
S1-2: velero_client datetime.now(UTC) 改用 now_taipei()(台北時區鐵律)
S1-3: Guardrail 失敗改為保守拒絕(原放行方向與安全目標相悖)
S1-4: service_registry import 移至模組頂部(移除函數內 import)
S2-1: telegram_gateway T1-T6 六個通知方法補齊 try/except
S2-2: webhooks.py Langfuse URL 改用 settings.LANGFUSE_URL(移除硬寫內網 IP)
S3-3: velero_client trigger_emergency_backup 改為 kubectl apply Backup CRD
(原 kubectl create backup 語法不存在,審查發現靜默失敗風險)
審查評分: 70/100 → 修正後預計 90+/100
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:36:18 +08:00 |
|
OG T
|
88696dba9b
|
feat(sprint5.1): Data Safety Guardrails 全鏈路整合 (L1-L5)
CD Pipeline / build-and-deploy (push) Failing after 1m33s
Type Sync Check / check-type-sync (push) Failing after 58s
Layer 0 - K8s RBAC:
- k8s/rbac/api-velero-reader.yaml: awoooi-executor SA Velero backup reader
Layer 1 - DB Migration (已在 188 執行):
- M-002: approval_records 新增 approval_level/votes/required_votes
- M-003: alert_event_type ENUM 新增 8 個值
Layer 2 - IaC:
- ops/config/service-registry.yaml: 全服務 Stateful 分級清單 (BLOCK/CRITICAL_HITL/STANDARD_HITL/AUTO)
Layer 3 - Python Services:
- service_registry.py: 讀取 YAML,提供 is_blocked/requires_multisig/get_required_votes
- velero_client.py: kubectl 查詢 Velero 備份年齡,失敗 fallback 999h
- preflight_service.py: Pre-flight 安全檢查 (Q2/Q4 決策)
Layer 1-M001 - Playbook model:
- playbook.py: 新增 requires_approval_level/stateful_targets/requires_pre_backup
Layer 4 - 業務邏輯:
- alert_operation_log_repository.py: 新增 8 個 event_type (Guardrail/Pre-flight/MultiSig/備份)
- auto_repair_service.py: 注入 Service Registry Guardrail 檢查 (BLOCK → 直接拒絕)
- webhooks.py: ALERT_RECEIVED 溯源記錄 + auto_repair flag Q9 + Langfuse trace_id Q10
- db/models.py: ApprovalRecord 同步 approval_level/votes/required_votes 欄位
- docker-health-monitor.sh: 純感知層改造(移除所有 docker restart 邏輯)
Layer 5 - Telegram 通知:
- telegram_gateway.py: T1-T6 六個新通知方法 (Guardrail/Pre-flight/Backup/MultiSig/ChangeApplied)
參考: ADR-062 Data Safety Guardrails, ADR-063 Service Registry IaC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:24:09 +08:00 |
|
OG T
|
6f7a4be2c7
|
docs: Sprint 5.1 資料安全護欄 — ADR-062/063 + 方案規範驗證
- ADR-062: Data Safety Guardrails (服務分級/Pre-flight/MultiSig)
- ADR-063: Service Registry IaC 設計規範
- Sprint 5.1 方案文件: 規範驗證通過,P1-P5 問題修正
- P1: Playbook 存 Redis(非 SQL),M-001 改為 Pydantic model 修改
- P2: velero_client.py 命名維持(與 signoz_client 慣例一致)
- P3: docker-health-monitor 狀態釐清
- P4/P5: DI setter + Deployment Verification 補充
- LOGBOOK: 當前焦點更新為 Sprint 5.1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:07:12 +08:00 |
|
OG T
|
83e9d3eef8
|
docs(specs): Sprint 5 四份技術文檔 — Tab 規格/路由對照/元件抽取/API 變更
1. Tab 結構規格書: 每個新頁面的 Tab 配置、區塊佈局、元件複用方式
2. 路由對照表: 26 個舊 URL → 新位置的精確映射 + redirect 實作方式
3. 元件抽取計畫: 17 個頁面抽取為 Panel 元件的步驟和目錄結構
4. API 變更規格: DashboardResponse +3 欄位 + SSE +1 事件 (不新增 API)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-08 16:03:58 +08:00 |
|
OG T
|
bb6a57dd87
|
docs(plan): Sprint 5 前端資訊架構重組 — 完整解決方案
涵蓋:
- 第一章: 現有 26 頁面 + 62 元件完整資產清單
- 第二章: 重組對照表 (25→6+2 導航,零功能遺失)
- 第三章: 6 個新頁面的 Tab 結構與元件整合
- 第四章: 舊路由向後兼容 (20+ redirect)
- 第五章: 共用 Tab 容器元件規格
- 第六章: 新導航 Sidebar 結構
- 第七章: 互動模式規範 (Tab/Drawer/Modal/Toggle)
- 第八章: 細化實施步驟 (6 Phase, 30 Step)
- 第九章: 檔案影響清單 (15 新增 + 5 修改)
- 第十章: 8 份技術文檔清單
- 第十一章: 風險矩陣
- 第十二章: 時程預估 (~10天, 3批交付)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-08 16:01:38 +08:00 |
|
OG T
|
8788c720e4
|
docs(plan): Sprint 5 完整解決方案 — 與現有架構整合的細化實施計畫
|
2026-04-08 12:22:05 +08:00 |
|
OG T
|
f2b3a7129f
|
docs(plan): Sprint 5 指令中心重設計 — 完整解決方案與細化實施步驟
|
2026-04-08 12:01:14 +08:00 |
|
OG T
|
876aa9a441
|
docs(adr): ADR-060 React Flow + elkjs 拓撲圖引擎技術選型 (方案 D+ 批准)
|
2026-04-08 11:56:58 +08:00 |
|
OG T
|
a421d2c5b8
|
feat(ops): Plan A docker-health-monitor.sh — Docker 容器健康監控自動修復
- 偵測 unhealthy / exited / dead 容器
- 排除清單: DB(PG/Redis)、Gitea、監控棧
- Prometheus/Grafana/Alertmanager exited → docker start (保護 WAL)
- 必須三段式通知: Intent→Action→Result (首席架構師裁示)
- HMAC-SHA256 簽章 → AWOOOI API /api/v1/webhooks/custom-alert
- Fallback: API down → 直接 Telegram Bot API
- 冷卻期 300s,防止重複修復
部署: cron */5 * * * * on 192.168.0.110 + 192.168.0.188
設定: /etc/awoooi-ops/secrets.env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 11:48:39 +08:00 |
|
OG T
|
f525e657ca
|
docs: ADR-060/061 全面監控+Event Sourcing架構決策記錄
- ADR-060: 全面基礎設施監控規劃 (Plan A/B/C/D/E)
- ADR-061: Alert Operation Log Event Sourcing 架構
- LOGBOOK: 2026-04-08 里程碑記錄更新
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 11:44:06 +08:00 |
|
OG T
|
f20121ad41
|
feat(audit): Phase 11 告警操作完整溯源 — alert_operation_log + 歷史回填
CD Pipeline / build-and-deploy (push) Failing after 1m29s
統帥指令「所有告警訊息通通寫入資料庫,並記錄相關操作」
變更:
- phase11_alert_operation_log.sql: 新表 (Event Sourcing,不可變)
- phase11b_backfill_alert_operation_log.sql: 歷史回填 654 筆
- 14 筆 ALERT_RECEIVED (incidents)
- 265 筆 TELEGRAM_SENT (approval_records)
- 265 筆 USER_ACTION (approval_records)
- 110 筆 EXECUTION_COMPLETED (audit_logs)
- db/models.py: AlertOperationLog SQLAlchemy model
- repositories/alert_operation_log_repository.py: append/list_by_incident/get_stats
- webhooks.py: _try_auto_repair_background 寫入 AUTO_REPAIR_TRIGGERED + EXECUTION_COMPLETED + TELEGRAM_RESULT_SENT
- webhooks.py: _push_to_telegram_background 寫入 TELEGRAM_SENT
- telegram.py: handle_callback 寫入 USER_ACTION (approve/reject)
已執行 migration: awoooi_prod@192.168.0.188 ✅
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:22:03 +08:00 |
|
OG T
|
eee6f06215
|
feat(auto-repair): 所有操作強制寫入 DB — auto_repair_executions 表
CD Pipeline / build-and-deploy (push) Failing after 1m32s
統帥指令: 所有自動修復操作(成功/失敗)必須持久化
變更:
- migrations/phase10_auto_repair_executions.sql: 新增表 + 4 個索引
- db/models.py: 新增 AutoRepairExecution SQLAlchemy model
- repositories/audit_log_repository.py: 新增 AutoRepairExecutionRepository (create/list_by_incident/get_stats)
- auto_repair_service.py: execute_auto_repair 成功/失敗分支都寫入 DB
- 新增 similarity_score 參數傳遞
- AutoRepairDecision 新增 similarity_score 欄位
- webhooks.py: 傳入 similarity_score 到 execute_auto_repair
已執行 migration: awoooi_prod@192.168.0.188:5432 ✅
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:16:37 +08:00 |
|
OG T
|
68a2fff746
|
feat(auto-repair): 移除所有阻擋門檻 — 直接全部跳成自動修復
CD Pipeline / build-and-deploy (push) Failing after 1m38s
統帥指令: 所有 APPROVED Playbook 直接執行,不再檢查:
- 相似度門檻 (MIN_SIMILARITY_SCORE 0.7 → 0.0)
- is_high_quality 品質門檻
- 冷啟動信任機制
- 動作風險等級門檻 (evaluate + execute 兩層)
保留: P0/P1 嚴重度人工審核、全域冷卻熔斷、APPROVED 狀態檢查
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:10:09 +08:00 |
|
OG T
|
8fcb66eb52
|
chore(api): trigger CD — Sprint 3+4+F deploy
CD Pipeline / build-and-deploy (push) Successful in 11m28s
E2E Health Check / e2e-health (push) Successful in 34s
|
2026-04-07 16:00:12 +08:00 |
|
OG T
|
4c45961c4f
|
chore: trigger CD deploy (Sprint 3+4+F)
|
2026-04-07 13:25:36 +08:00 |
|
OG T
|
b7ea362efc
|
fix(api): Review #2 技術債清理 — I1/S1/S2/S3 全數修正
CD Pipeline / build-and-deploy (push) Successful in 12m13s
I1: error_type 欄位補全
- AnomalyCounter.derive_key_from_incident() 新增
從 signal.labels 提取 reason/error_type,確保四欄位完整
S1: 三處 signature 建構邏輯統一
- auto_repair_service._derive_anomaly_key() → 委託 derive_key_from_incident()
- approval_execution._get_anomaly_key_from_approval() → 同上
- incident_service.resolve_incident() B4 → 同上
- 消除 3 處重複的 signature 建構程式碼
S2: Redis Pipeline 批次查詢
- get_all_disposition_stats() 從 N+1 hgetall 改為 2 次 Pipeline
- Pipeline 1: 批次 hgetall 所有 disposition key
- Pipeline 2: 批次 hget metadata (alert_name)
- 效能從 O(2N) Redis round-trip 降至 O(2)
S3: auto_repair.py get_incident AttributeError 修復
- get_incident() → get_from_working_memory() (pre-existing bug)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 13:13:42 +08:00 |
|
OG T
|
b20a619a3d
|
fix(ci): CD 修復 — shared-types 型別同步 + 測試冷啟動衝突
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 1m2s
1. pnpm shared-types generate — 同步 Sprint 4 新增的 Pydantic model
2. test_evaluate_not_high_quality 修復 — 加 MEDIUM risk step 避免
意外走冷啟動路徑 (Redis 未初始化 → COLD_START_DAILY_LIMIT)
11/11 auto_repair 測試通過
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:09:17 +08:00 |
|
OG T
|
3a3f9cf70c
|
docs(logbook): Sprint 4 全棧完成記錄 — 6 Phase / 19 工作項
|
2026-04-07 13:02:59 +08:00 |
|
OG T
|
de3935d1d4
|
feat: Sprint 4 Phase E+F — 前端處置統計 + 週報處置分佈
CD Pipeline / build-and-deploy (push) Failing after 1m26s
Type Sync Check / check-type-sync (push) Failing after 1m2s
Phase E: 前端頁面
- E1: /reports 完整處置統計儀表板 (已在 Sprint F 完成)
- E2: 首頁 Metrics Strip — 從 disposition API 取得真實自動化率
優先使用 /stats/disposition auto_rate,fallback 到 incidents 推算
- E3: /auto-repair 處置概況卡片 (已在 Sprint F 完成)
- E4: /neural-command stats tab 處置分佈 (已在 Sprint F 完成)
- E5: i18n 翻譯 zh-TW + en (已在 Sprint F 完成)
Phase F: 週報 + 文件
- F1: WeeklyReportMessage 新增 disposition 5 欄位
週報格式加「📋 處置分佈」區塊 (自動/冷啟動/人工/手動 + 自動化率)
weekly_report_service 整合 get_all_disposition_stats()
- message 字數上限從 900 提升到 1200 (適應處置區塊)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 13:02:20 +08:00 |
|
OG T
|
37bddbb430
|
docs(logbook): Sprint 4 Phase E 前端處置統計完成記錄
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:01:22 +08:00 |
|
OG T
|
22bc384b28
|
feat(web): Sprint 4 Phase E — 前端處置統計儀表板
E1: /reports 頁面升級為完整處置統計儀表板
- 頂部 3 KPI (處置總次數/自動化率/人工介入率)
- 四大計數卡片 (自動修復/人工審核/手動處理/冷啟動信任)
- 堆疊分佈條 (百分比視覺化)
- 按異常類型明細表格
- 串接 GET /api/v1/stats/disposition
E3: /auto-repair 頁面加入處置概況 4 卡片
E4: /neural-command stats tab 加入處置分佈區塊
E5: 新增 25+ i18n 翻譯鍵 (zh-TW + en)
全部頁面 next build 通過,統帥鐵律: 無假數據,無資料顯示 '--'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:00:41 +08:00 |
|
OG T
|
246587a401
|
fix(web): Sprint F 前端打假行動 — 29處假數據全面清除 (首席架構師 98/100)
P0: Neural Command 三個子組件移除所有 MOCK 常數,接上真實 API props
- NeuralLiveCenter: 假歷史/假KPI/假雷達 → 從 stats/history/incidents 即時計算
- NeuralStats: MOCK_HISTORY/SCHEME_STATS/PLAYBOOK_RANKINGS → useMemo 聚合
- NeuralApprovalPanel: MOCK_PENDING → 真實 /api/v1/approvals 簽核操作
P1: 10+處假用戶身份 (demo-user/user-001/War Room User) → CURRENT_USER 常數統一
P2: 刪除 6 個 Demo 匯出 (GlobalPulseChartDemo/MOCK_APPROVAL/DEMO_DECISION_CHAIN)
P3: /demo 頁面加 NEXT_PUBLIC_ENABLE_DEMO 環境變數保護
i18n: 新增 22 個翻譯鍵 (zh-TW + en)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 12:53:52 +08:00 |
|
OG T
|
561bcb638b
|
fix(api): Sprint 4 首席架構師 Review P0 修正 — hash 統一 + 積木化合規
P0-1: anomaly_key hash 推導統一
- B1: 新增 _derive_anomaly_key() 使用 AnomalyCounter.hash_signature()
取代 symptoms.compute_hash()
- B3/B4: namespace 改用 signal.labels.get("namespace", "")
修正 getattr(signal, "namespace", "") 永遠回傳空字串
P0-2: Router 層積木化合規
- C1/C2: 封裝 get_all_disposition_stats() 到 AnomalyCounter
- Router 不再直接存取 counter.redis
- stats.py 移除未使用的 days/stats 參數
P1: get_frequency() 填充 disposition 欄位
- 與 _record_anomaly_impl() 一致,回傳完整處置統計
首席架構師評分: 82/100 → P0 全數修正
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 12:53:12 +08:00 |
|