OG T
|
f2b3a7129f
|
docs(plan): Sprint 5 指令中心重設計 — 完整解決方案與細化實施步驟
|
2026-04-08 12:01:14 +08:00 |
|
OG T
|
876aa9a441
|
docs(adr): ADR-060 React Flow + elkjs 拓撲圖引擎技術選型 (方案 D+ 批准)
|
2026-04-08 11:56:58 +08:00 |
|
OG T
|
a421d2c5b8
|
feat(ops): Plan A docker-health-monitor.sh — Docker 容器健康監控自動修復
- 偵測 unhealthy / exited / dead 容器
- 排除清單: DB(PG/Redis)、Gitea、監控棧
- Prometheus/Grafana/Alertmanager exited → docker start (保護 WAL)
- 必須三段式通知: Intent→Action→Result (首席架構師裁示)
- HMAC-SHA256 簽章 → AWOOOI API /api/v1/webhooks/custom-alert
- Fallback: API down → 直接 Telegram Bot API
- 冷卻期 300s,防止重複修復
部署: cron */5 * * * * on 192.168.0.110 + 192.168.0.188
設定: /etc/awoooi-ops/secrets.env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 11:48:39 +08:00 |
|
OG T
|
f525e657ca
|
docs: ADR-060/061 全面監控+Event Sourcing架構決策記錄
- ADR-060: 全面基礎設施監控規劃 (Plan A/B/C/D/E)
- ADR-061: Alert Operation Log Event Sourcing 架構
- LOGBOOK: 2026-04-08 里程碑記錄更新
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 11:44:06 +08:00 |
|
OG T
|
f20121ad41
|
feat(audit): Phase 11 告警操作完整溯源 — alert_operation_log + 歷史回填
CD Pipeline / build-and-deploy (push) Failing after 1m29s
統帥指令「所有告警訊息通通寫入資料庫,並記錄相關操作」
變更:
- phase11_alert_operation_log.sql: 新表 (Event Sourcing,不可變)
- phase11b_backfill_alert_operation_log.sql: 歷史回填 654 筆
- 14 筆 ALERT_RECEIVED (incidents)
- 265 筆 TELEGRAM_SENT (approval_records)
- 265 筆 USER_ACTION (approval_records)
- 110 筆 EXECUTION_COMPLETED (audit_logs)
- db/models.py: AlertOperationLog SQLAlchemy model
- repositories/alert_operation_log_repository.py: append/list_by_incident/get_stats
- webhooks.py: _try_auto_repair_background 寫入 AUTO_REPAIR_TRIGGERED + EXECUTION_COMPLETED + TELEGRAM_RESULT_SENT
- webhooks.py: _push_to_telegram_background 寫入 TELEGRAM_SENT
- telegram.py: handle_callback 寫入 USER_ACTION (approve/reject)
已執行 migration: awoooi_prod@192.168.0.188 ✅
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:22:03 +08:00 |
|
OG T
|
eee6f06215
|
feat(auto-repair): 所有操作強制寫入 DB — auto_repair_executions 表
CD Pipeline / build-and-deploy (push) Failing after 1m32s
統帥指令: 所有自動修復操作(成功/失敗)必須持久化
變更:
- migrations/phase10_auto_repair_executions.sql: 新增表 + 4 個索引
- db/models.py: 新增 AutoRepairExecution SQLAlchemy model
- repositories/audit_log_repository.py: 新增 AutoRepairExecutionRepository (create/list_by_incident/get_stats)
- auto_repair_service.py: execute_auto_repair 成功/失敗分支都寫入 DB
- 新增 similarity_score 參數傳遞
- AutoRepairDecision 新增 similarity_score 欄位
- webhooks.py: 傳入 similarity_score 到 execute_auto_repair
已執行 migration: awoooi_prod@192.168.0.188:5432 ✅
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:16:37 +08:00 |
|
OG T
|
68a2fff746
|
feat(auto-repair): 移除所有阻擋門檻 — 直接全部跳成自動修復
CD Pipeline / build-and-deploy (push) Failing after 1m38s
統帥指令: 所有 APPROVED Playbook 直接執行,不再檢查:
- 相似度門檻 (MIN_SIMILARITY_SCORE 0.7 → 0.0)
- is_high_quality 品質門檻
- 冷啟動信任機制
- 動作風險等級門檻 (evaluate + execute 兩層)
保留: P0/P1 嚴重度人工審核、全域冷卻熔斷、APPROVED 狀態檢查
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 11:10:09 +08:00 |
|
OG T
|
8fcb66eb52
|
chore(api): trigger CD — Sprint 3+4+F deploy
CD Pipeline / build-and-deploy (push) Successful in 11m28s
E2E Health Check / e2e-health (push) Successful in 34s
|
2026-04-07 16:00:12 +08:00 |
|
OG T
|
4c45961c4f
|
chore: trigger CD deploy (Sprint 3+4+F)
|
2026-04-07 13:25:36 +08:00 |
|
OG T
|
b7ea362efc
|
fix(api): Review #2 技術債清理 — I1/S1/S2/S3 全數修正
CD Pipeline / build-and-deploy (push) Successful in 12m13s
I1: error_type 欄位補全
- AnomalyCounter.derive_key_from_incident() 新增
從 signal.labels 提取 reason/error_type,確保四欄位完整
S1: 三處 signature 建構邏輯統一
- auto_repair_service._derive_anomaly_key() → 委託 derive_key_from_incident()
- approval_execution._get_anomaly_key_from_approval() → 同上
- incident_service.resolve_incident() B4 → 同上
- 消除 3 處重複的 signature 建構程式碼
S2: Redis Pipeline 批次查詢
- get_all_disposition_stats() 從 N+1 hgetall 改為 2 次 Pipeline
- Pipeline 1: 批次 hgetall 所有 disposition key
- Pipeline 2: 批次 hget metadata (alert_name)
- 效能從 O(2N) Redis round-trip 降至 O(2)
S3: auto_repair.py get_incident AttributeError 修復
- get_incident() → get_from_working_memory() (pre-existing bug)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 13:13:42 +08:00 |
|
OG T
|
b20a619a3d
|
fix(ci): CD 修復 — shared-types 型別同步 + 測試冷啟動衝突
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 1m2s
1. pnpm shared-types generate — 同步 Sprint 4 新增的 Pydantic model
2. test_evaluate_not_high_quality 修復 — 加 MEDIUM risk step 避免
意外走冷啟動路徑 (Redis 未初始化 → COLD_START_DAILY_LIMIT)
11/11 auto_repair 測試通過
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:09:17 +08:00 |
|
OG T
|
3a3f9cf70c
|
docs(logbook): Sprint 4 全棧完成記錄 — 6 Phase / 19 工作項
|
2026-04-07 13:02:59 +08:00 |
|
OG T
|
de3935d1d4
|
feat: Sprint 4 Phase E+F — 前端處置統計 + 週報處置分佈
CD Pipeline / build-and-deploy (push) Failing after 1m26s
Type Sync Check / check-type-sync (push) Failing after 1m2s
Phase E: 前端頁面
- E1: /reports 完整處置統計儀表板 (已在 Sprint F 完成)
- E2: 首頁 Metrics Strip — 從 disposition API 取得真實自動化率
優先使用 /stats/disposition auto_rate,fallback 到 incidents 推算
- E3: /auto-repair 處置概況卡片 (已在 Sprint F 完成)
- E4: /neural-command stats tab 處置分佈 (已在 Sprint F 完成)
- E5: i18n 翻譯 zh-TW + en (已在 Sprint F 完成)
Phase F: 週報 + 文件
- F1: WeeklyReportMessage 新增 disposition 5 欄位
週報格式加「📋 處置分佈」區塊 (自動/冷啟動/人工/手動 + 自動化率)
weekly_report_service 整合 get_all_disposition_stats()
- message 字數上限從 900 提升到 1200 (適應處置區塊)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 13:02:20 +08:00 |
|
OG T
|
37bddbb430
|
docs(logbook): Sprint 4 Phase E 前端處置統計完成記錄
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:01:22 +08:00 |
|
OG T
|
22bc384b28
|
feat(web): Sprint 4 Phase E — 前端處置統計儀表板
E1: /reports 頁面升級為完整處置統計儀表板
- 頂部 3 KPI (處置總次數/自動化率/人工介入率)
- 四大計數卡片 (自動修復/人工審核/手動處理/冷啟動信任)
- 堆疊分佈條 (百分比視覺化)
- 按異常類型明細表格
- 串接 GET /api/v1/stats/disposition
E3: /auto-repair 頁面加入處置概況 4 卡片
E4: /neural-command stats tab 加入處置分佈區塊
E5: 新增 25+ i18n 翻譯鍵 (zh-TW + en)
全部頁面 next build 通過,統帥鐵律: 無假數據,無資料顯示 '--'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 13:00:41 +08:00 |
|
OG T
|
246587a401
|
fix(web): Sprint F 前端打假行動 — 29處假數據全面清除 (首席架構師 98/100)
P0: Neural Command 三個子組件移除所有 MOCK 常數,接上真實 API props
- NeuralLiveCenter: 假歷史/假KPI/假雷達 → 從 stats/history/incidents 即時計算
- NeuralStats: MOCK_HISTORY/SCHEME_STATS/PLAYBOOK_RANKINGS → useMemo 聚合
- NeuralApprovalPanel: MOCK_PENDING → 真實 /api/v1/approvals 簽核操作
P1: 10+處假用戶身份 (demo-user/user-001/War Room User) → CURRENT_USER 常數統一
P2: 刪除 6 個 Demo 匯出 (GlobalPulseChartDemo/MOCK_APPROVAL/DEMO_DECISION_CHAIN)
P3: /demo 頁面加 NEXT_PUBLIC_ENABLE_DEMO 環境變數保護
i18n: 新增 22 個翻譯鍵 (zh-TW + en)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 12:53:52 +08:00 |
|
OG T
|
561bcb638b
|
fix(api): Sprint 4 首席架構師 Review P0 修正 — hash 統一 + 積木化合規
P0-1: anomaly_key hash 推導統一
- B1: 新增 _derive_anomaly_key() 使用 AnomalyCounter.hash_signature()
取代 symptoms.compute_hash()
- B3/B4: namespace 改用 signal.labels.get("namespace", "")
修正 getattr(signal, "namespace", "") 永遠回傳空字串
P0-2: Router 層積木化合規
- C1/C2: 封裝 get_all_disposition_stats() 到 AnomalyCounter
- Router 不再直接存取 counter.redis
- stats.py 移除未使用的 days/stats 參數
P1: get_frequency() 填充 disposition 欄位
- 與 _record_anomaly_impl() 一致,回傳完整處置統計
首席架構師評分: 82/100 → P0 全數修正
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 12:53:12 +08:00 |
|
OG T
|
a85e9ced08
|
feat(api+telegram): Sprint 4 Phase C+D — API 端點 + Telegram 處置統計
Phase C: API 端點
- C1: GET /api/v1/stats/disposition — 完整處置分佈統計
- DispositionSummary: auto/human/manual/cold_start + auto_rate
- DispositionByAnomaly: 按異常類型明細 (最多 20 筆)
- Redis SCAN + HGETALL 聚合
- C2: GET /api/v1/auto-repair/stats 擴充 disposition_summary
Phase D: Telegram 告警格式
- D1: 告警卡片加處置統計行
- 🤖 自動: N | 👤 審核: N | 🔧 手動: N
- 自動化率百分比
- D2: 歷史按鈕強化處置分佈明細
- 完整 5 項計數 + 自動化率
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 12:17:20 +08:00 |
|
OG T
|
9253281d46
|
feat(api): Sprint 4 Phase A+B — 告警處置統計資料層+寫入層
Phase A: 資料層
- A1: IncidentFrequencyStats 新增 4 欄位 (human_approved/manual_resolved/cold_start_trust/total_resolution)
- A2: AnomalyCounter.record_disposition() — Redis HINCRBY 原子遞增
- A3: get_disposition_stats() — HGETALL 回傳處置分佈
- AnomalyFrequency dataclass 擴充 + to_dict() 同步
- _record_anomaly_impl() 整合 disposition stats
Phase B: 寫入層觸發點接線
- B1: 自動修復成功 → record_disposition("auto_repair")
- B2: 冷啟動信任成功 → record_disposition("cold_start_trust")
- AutoRepairDecision 新增 is_cold_start flag
- execute_auto_repair() 接收並區分處置類型
- B3: 人工批准執行成功 → record_disposition("human_approved")
- 新增 _get_anomaly_key_from_approval() helper
- B4: 手動處理推斷 → resolve_incident() 排除法判定
- 若 resolved 且無 auto/human/cold_start 紀錄 → manual_resolved
安全設計: 所有 disposition 記錄走 try/except,失敗不阻塞主流程
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:54:46 +08:00 |
|
OG T
|
e82d3802c5
|
docs: Sprint 4 告警處置統計系統 — 完整計畫文件 + LOGBOOK 更新
Sprint 4 計畫包含 6 Phase / 19 工作項:
- Phase A: 資料層 (IncidentFrequencyStats + Redis 計數器)
- Phase B: 寫入層 (4 觸發點: auto_repair/cold_start/human/manual)
- Phase C: API 端點 (/stats/disposition)
- Phase D: Telegram 告警卡片統計
- Phase E: 前端 (/reports 儀表板 + 首頁 + auto-repair + neural-command)
- Phase F: 週報 + 文件
首席架構師審查: 100% Fully Approved
衝突檢查: 所有依賴正確,DAG 無環
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:37:21 +08:00 |
|
OG T
|
53b2daeaca
|
feat(api): 首次信任機制 — 打破自動修復冷啟動雞生蛋問題
問題: Playbook 需要 success_count >= 3 才算 is_high_quality,
但沒有自動修復就不會有成功紀錄 → 永遠達不到門檻。
方案 C: 首次信任 (Cold Start Trust)
- APPROVED 狀態 + 全步驟 risk=LOW + 執行次數 < 3 → 自動放行
- Redis counter 限制每日最多 5 次首次信任自動修復
- 累積 3 次成功後自動回歸正常 is_high_quality 門檻
安全邊界:
- 只有 LOW risk 步驟才能首次信任 (重啟容器等)
- HIGH/CRITICAL 仍需人工審核
- P0/P1 嚴重度仍需人工審核
- 每日上限防止失控
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:21:00 +08:00 |
|
OG T
|
2fe8062fb8
|
refactor(api): Re-Review S1/S2/S3 改善 — 消除重複+防禦性驗證+測試隔離
S1: 抽取 _execute_and_observe() 公用方法
- 消除 repair_by_uri 中 3 處重複的 execute+audit+langfuse 邏輯
- 統一 AuditLog + Langfuse trace 寫入路徑
S2: SSH username 防禦性驗證
- 新增 validate_ssh_user() + _SSH_USER_RE 正則
- 在 _ssh_execute() 入口驗證 user 參數
- 防止 user@host 拼接產生非預期行為
- 新增 8 個 username 驗證測試
S3: Singleton 測試重置
- 新增 _reset_for_test() classmethod
- 避免跨測試狀態污染
- 新增 2 個 singleton reset 測試
測試: 55/55 全數通過 (原 45 + 新 10)
首席架構師 Re-Review: 91/100 ✅ 通過,3 個 Suggestion 全數實裝
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:17:40 +08:00 |
|
OG T
|
78a8d3dfa5
|
fix(api): ansible 控制節點加白名單驗證,防環境變數繞過 (Re-Review Important)
首席架構師 Re-Review 指出: ANSIBLE_CONTROL_HOST 來自環境變數 (ConfigMap),
若 ConfigMap 被篡改可繞過 SSH_TARGET_WHITELIST。
在 _execute_ansible() 開頭加 validate_ssh_target_host(host) 閉環。
Re-Review 評分: 91/100 ✅ 通過
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:13:49 +08:00 |
|
OG T
|
0dec007673
|
docs(logbook): 記錄 Sprint 3 P0 critical security fixes 完成
CD Pipeline / build-and-deploy (push) Successful in 11m37s
|
2026-04-07 11:10:48 +08:00 |
|
OG T
|
f8d4772abf
|
fix(api): Sprint 3 P0-1/P0-2/P0-3/P0-4 Critical Security Fixes
P0-1: Complete shell metacharacter regex detection
- Enhanced _SHELL_METACHAR_RE to detect: >, <, \n, ${}, $()
- Prevents all shell injection vectors (redirects, variable expansion, newlines)
- Added 5 new validation tests
P0-2: Add shlex.quote() protection for ansible playbook path
- Wraps playbook_path in shlex.quote() before SSH command construction
- Prevents shell injection if path contains special characters
- Applied in _execute_ansible() method
P0-3: Add SSH target host whitelist validation
- Introduces validate_ssh_target_host() function
- Only allows SSH to: 192.168.0.110, 192.168.0.188
- Prevents unauthorized SSH target exploitation
- Added 5 new whitelist validation tests
P0-4: Convert HostRepairAgent to singleton pattern
- Implements __new__() singleton with shared _in_process_locks dict
- Ensures in-process locks persist across multiple auto_repair_service calls
- Previously created new instance per call, making locks ineffective
- Added singleton persistence test
Test Results: 45/45 passing (34 existing + 11 new P0 tests)
All security validations verified via comprehensive unit test coverage.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-07 11:09:45 +08:00 |
|
OG T
|
af07c23675
|
fix(k8s): known_hosts 改掛 /etc/repair-known-hosts 獨立目錄,修 mount 衝突
CD Pipeline / build-and-deploy (push) Successful in 12m11s
E2E Health Check / e2e-health (push) Successful in 34s
/etc/repair-ssh 已被 repair-ssh-key 佔用,subPath 檔案掛載衝突
改為獨立目錄 /etc/repair-known-hosts,路徑同步更新 KNOWN_HOSTS_PATH
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 15:06:28 +08:00 |
|
OG T
|
d56aae135d
|
fix(k8s): repair-known-hosts secret optional:true — Pod 不阻塞等待 secret 建立
CD Pipeline / build-and-deploy (push) Failing after 8m35s
CD 首次跑時才建立 secret,optional 讓 Pod 先起來
等 CD 建立 secret 後自動掛載
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:48:45 +08:00 |
|
OG T
|
93bcfb4ce8
|
docs: 更新 LOGBOOK — Sprint 3 SSH_COMMAND 指揮權鏈完成
|
2026-04-06 14:48:11 +08:00 |
|
OG T
|
ee187dcb79
|
ci(cd): CD 自動建立 awoooi-repair-known-hosts Secret (Sprint 3 T2 閉環)
CD Pipeline / build-and-deploy (push) Has been cancelled
每次部署時 ssh-keyscan .110/.188 並 kubectl apply secret
替換 StrictHostKeyChecking=no — Security Fix A1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:45:20 +08:00 |
|
OG T
|
1644fe6474
|
feat(api): auto_repair_service 整合 repair_by_uri (Sprint 3 T6)
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:39:03 +08:00 |
|
OG T
|
a4e11bfa92
|
feat(api): AuditLog + Langfuse Trace for SSH_COMMAND (Sprint 3 T5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:38:59 +08:00 |
|
OG T
|
02510d3d93
|
feat: /api/v1/auto-repair/history endpoint + neural-command 接真實 API (Sprint 3)
CD Pipeline / build-and-deploy (push) Failing after 8m50s
- 新增 RepairHistoryItem/RepairHistoryResponse Pydantic models
- GET /api/v1/auto-repair/history?limit=N 從 incidents working memory 推導修復歷史
- 前端 fetchData() 同時拉 history + approvals/pending,移除硬編碼 pendingApprovals=0
- try/except 包覆確保任何錯誤都回傳空列表不中斷前端
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:28:55 +08:00 |
|
OG T
|
4561f141bb
|
feat(api): Redis 冪等鎖防止重複修復 (Sprint 3 T4)
雙層鎖設計: in-process asyncio.Lock (必定生效) + Redis 分散式鎖 (跨 Pod best-effort)
同一 URI 的第二次修復呼叫立即返回 "already running" 錯誤
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:26:53 +08:00 |
|
OG T
|
1a654aa37d
|
feat(api): HostRepairAgent 三條執行路徑 + known_hosts + Ansible 白名單 (Sprint 3 T3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:22:54 +08:00 |
|
OG T
|
d4cb9a4ac5
|
ops(k8s): known_hosts Secret + Ansible 白名單 ConfigMap (Sprint 3 T2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:20:14 +08:00 |
|
OG T
|
5e8b2a6894
|
feat(api): URI scheme 解析器 + Shell Injection 防護 (Sprint 3 T1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:18:21 +08:00 |
|
OG T
|
9197994d51
|
feat(neural-command): 加入 Sprint 3 指揮鏈可視化 + T1-T7 任務進度監控
CD Pipeline / build-and-deploy (push) Successful in 11m15s
- SSH Gateway → URI解析器 → Shell防注入 → Redis冪等鎖 → Ansible Playbook DB 節點流程圖
- T1-T7 任務卡片 (T1/T2 標記完成,T3-T7 待執行)
- 4 指標面板:實作速度/安全等級/可觀測性/架構健康度
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:13:58 +08:00 |
|
OG T
|
1a8021bfaa
|
docs(plans): Sprint 3 SSH_COMMAND 指揮權鏈實作計畫 (7 tasks)
|
2026-04-06 14:08:28 +08:00 |
|
OG T
|
0b1ceb8618
|
feat(web): 新增神經指揮中心頁面 /neural-command
CD Pipeline / build-and-deploy (push) Successful in 12m22s
Sprint 3 SSH_COMMAND 指揮權鏈 UI — 完整前端實作:
- Pre-Flight 審查面板: 8/8 安全檢查 (A/B/C 三類) + 通過狀態 + 功能開關
- 即時指揮中心: OpenClaw 🦞 + NemoTron ⚡ 狀態 + 神經傳導鏈路動畫 + 執行串流
- 統計 & 歷史: 5 KPI + URI scheme 分佈 + Playbook 成效排名 + 時間軸
- 核鑰授權面板: 兩位指揮官診斷 + 執行路徑詳情 + NuclearKeyButton 長按確認
技術:
- 路由: /neural-command (獨立新頁面,非取代 /auto-repair)
- sidebar: BrainCircuit icon,緊接 auto-repair 下方
- i18n: 完整 zh-TW + en 支援 (neuralCommand namespace)
- TypeScript: 型別定義獨立至 components/neural-command/types.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 14:01:31 +08:00 |
|
OG T
|
0da827beef
|
perf(web): Dockerfile 加入 --mount=type=cache 持久化 Next.js build cache
CD Pipeline / build-and-deploy (push) Successful in 13m37s
CACHE_BUST 仍強制讓 source 層失效(確保代碼變更進入 bundle),
但 .next/cache 透過 BuildKit cache mount 跨 build 持久化到 runner host。
Next.js 增量編譯只重建有變更的頁面,預計節省 3-4 分鐘。
# 2026-04-06 ogt: Web build 從 5 min 降至 ~1-2 min(第二次起)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:45:43 +08:00 |
|
OG T
|
a4ae74f767
|
fix(cd): 修正 Playwright 版本偵測路徑 ../package.json → ./package.json
CD Pipeline / build-and-deploy (push) Has been cancelled
在 apps/web 目錄執行,../package.json 不存在故每次都回傳 unknown
導致每次部署都重下載 110MB Chromium。
改用 ./package.json 正確讀取 apps/web 的 @playwright/test 版本。
# 2026-04-06 ogt: 節省 CD 約 2 分鐘
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:44:45 +08:00 |
|
OG T
|
cd37befbe6
|
fix(models): 全面替換 datetime.UTC → timezone.utc 相容 Python 3.10
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 59s
terminal.py, incident.py, utils/timezone.py 同樣問題。
CI runner Python 3.10 無 UTC 常數,導致所有模型靜默 import 失敗。
# 2026-04-06 ogt: 完整修復,不再有漏網之魚
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:40:27 +08:00 |
|
OG T
|
59c3dfb910
|
fix(models): approval.py 改用 timezone.utc 相容 Python 3.10
CD Pipeline / build-and-deploy (push) Successful in 12m12s
Type Sync Check / check-type-sync (push) Failing after 52s
CI runner 用 Python 3.10,datetime.UTC 是 3.11 才加入。
改用 datetime.timezone.utc 全版本相容,修復 CI type-sync 全量失敗。
# 2026-04-06 ogt: root cause — CI Python 3.10 無法 import UTC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:19:23 +08:00 |
|
OG T
|
b416ab6577
|
ci(debug): type-sync-check 加入 diff 輸出以診斷 CI 失敗原因
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:17:36 +08:00 |
|
OG T
|
8235f91bc6
|
fix(scripts): generate-schemas 同時加入 apps/api 和 apps/api/src 到 sys.path
Type Sync Check / check-type-sync (push) Failing after 56s
問題: CI type-sync-check 持續失敗
原因: 只加 apps/api/src 不夠,模型檔內部用 from src.utils.X import Y
需要 apps/api 在 path 才能解析 src 套件
結果: 51 個型別全部正確生成
# 2026-04-06 ogt: fix CI type-sync blocking deployment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 12:00:18 +08:00 |
|
OG T
|
f6332b4b2f
|
fix(telegram): 修正 approval_id UUID 轉換錯誤 — 支援 INC-xxx 格式
CD Pipeline / build-and-deploy (push) Successful in 12m24s
_execute_approval_action 用 UUID(approval_id) 但 approval_id 是 INC-xxx,
導致 'badly formed hexadecimal UUID string' 錯誤,簽核無法執行。
修正: 先嘗試 UUID 轉換,失敗則用 incident_id 查出對應的 pending approval UUID。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 11:53:48 +08:00 |
|
OG T
|
71715506c3
|
chore(types): 重新產生 TypeScript 型別 — Phase 26 ApprovalRequest + namespace 修正
Type Sync Check / check-type-sync (push) Failing after 51s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 11:50:43 +08:00 |
|
OG T
|
8d496e84e2
|
fix(test): 更新 action_parsing 測試 — 無 -n 參數預設 namespace 改為 awoooi-prod
CD Pipeline / build-and-deploy (push) Has been cancelled
action_planner.py default_namespace 已是 awoooi-prod,測試預期值同步更新。
明確指定 -n default 的 kubectl 命令保持不變。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 11:49:24 +08:00 |
|
OG T
|
b133631b2d
|
feat(scripts): Phase 26 補寫腳本 — 從 approval_records 反向建立 KM
225 筆歷史告警處理記錄全部補寫到 knowledge_entries (INCIDENT_CASE)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 11:47:47 +08:00 |
|
OG T
|
658337ec18
|
fix(phase26): 打通 Incident→DB→KM 完整鏈路 + namespace 修正
CD Pipeline / build-and-deploy (push) Failing after 1m29s
Type Sync Check / check-type-sync (push) Failing after 52s
問題根因:
1. create_incident_for_approval 只存 Redis,不存 PostgreSQL
→ TTL 7天後消失,Playbook 萃取永遠找不到 Incident
2. ApprovalRecord 無 incident_id 欄位
→ _trigger_playbook_extraction 靠 regex 掃中文文字找 INC-,永遠失敗
3. operation_parser namespace fallback 是 "default"
→ 所有 deployment 在 awoooi-prod,203 次執行全失敗
修復:
- Incident 同時寫入 Redis + PostgreSQL (save_to_episodic_memory)
- ApprovalRecord 加入 incident_id 欄位 (model + ORM + migration)
- alertmanager_webhook 建立 Approval 後回寫 incident_id
- _trigger_playbook_extraction 直接用 approval.incident_id
- operation_parser DEFAULT_NAMESPACE = "awoooi-prod"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-06 11:46:05 +08:00 |
|