Commit Graph

47 Commits

Author SHA1 Message Date
Your Name
8cf559215c docs(awooop): add Phase 1 Isolation Foundation implementation plan (ADR-106 P1) 2026-05-02 12:28:33 +08:00
Your Name
04ff22563e fix(aiops-p1): Playbook 學習閉環 5斷點全修 + DB Migration(ADR-092 B4)
Some checks failed
run-migration / migrate (push) Failing after 14s
CD Pipeline / build-and-deploy (push) Failing after 2m7s
【P0.4 補丁】pre_decision_investigator Prometheus query 欄位缺失
- _build_tool_params() 補 "query" 欄位(prometheus_query tool 必要參數)
- 新增 _build_prometheus_query() — 依告警類型生成 PromQL(CPU/Memory/Crash/Disk/HTTP/Pod/fallback)
- 修復後 D3_METRICS 感官維度實際取得資料(原本 100% 回 missing_query_parameter)

【P1 Playbook 學習閉環 B1-B5 全修】
- B2 db/models.py: ApprovalRecord 新增 matched_playbook_id 欄位 + ix_approval_matched_playbook index
- B2 db/models.py: TimelineEvent 新增 incident_id 欄位(MCP 稽核用)+ index
- B3 approval_db.py: record→ApprovalRequest 補回 incident_id + matched_playbook_id
- B4 approval_repository.py: 同 B3(兩個轉換函式必須同步)
- B5 approval_db.py: approval_request_to_record_data 補 matched_playbook_id → DB 才能存值

【P1.5 KM 寫入】approval_execution.py: fire-and-forget → await wait_for(30s)
- 根因:asyncio.create_task 在 Pod recycle 時被殺,KM 寫入靜默遺失
- 修復:await asyncio.wait_for(..., timeout=30.0) + TimeoutError log

【Migration 文件】adr092_p1_learning_chain_fix.sql
- ALTER TABLE approval_records ADD COLUMN matched_playbook_id VARCHAR(36)
- ALTER TABLE timeline_events ADD COLUMN incident_id VARCHAR(64)
- 執行:psql $DATABASE_URL -f apps/api/migrations/adr092_p1_learning_chain_fix.sql

【附帶 Agent 改動】
- decision_manager: Phase 2 YAML NO_ACTION 優先門(主機層/外部服務跳過 Agent Debate)
- alert_rules.yaml: Sentry/ClickHouse + HostDiskUsageHigh/Critical 新規則
- solver_agent: action_title 語意合成兜底(取代靜默丟棄)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 15:41:35 +08:00
Your Name
994817a23a docs: ADR-092 附錄 A+B + LOGBOOK + MASTER §8 記錄四修與 C1-C4 全流程串接
- ADR-092: 附錄 A(B1-B4 四修 root cause + commit)+ 附錄 B(C1-C4 斷點修復表 + 架構鐵律)
- LOGBOOK: 新增 2026-04-20 晚 C1-C4 章節(斷點清單 + commits + 驗收步驟)
- MASTER §8: 追加 C1-C4 changelog(§3/§1.1 對齊 + 修復後行為說明)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 20:24:41 +08:00
Your Name
39ac292c90 docs(master): §8 追加 ADR-092 四修記錄 + project_current_status 更新
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:01:50 +08:00
Your Name
803b389f6b security(secrets): 替換 test fixture 真 TG bot token 為假值
Some checks failed
run-migration / migrate (push) Failing after 20s
CD Pipeline / build-and-deploy (push) Successful in 9m10s
## 事件
aider-watch v1 session 把真 production TG bot token(NEMOTRON_BOT_TOKEN)
當成 test fixture 寫入下列 tracked 檔(均已 push Gitea):
- apps/api/tests/test_secret_redactor.py
- docs/superpowers/plans/2026-04-19-aider-watch.md (3 處)
- docs/superpowers/plans/2026-04-20-aider-watch-v2.md

違反 feedback_secrets_leak_incidents_2026-04-18.md L2 零信任(source control 無 secrets)。

## 處置
- 統帥決議:不撤銷 token(接受風險)
- 替換為假值 111222333:A*35(明顯 placeholder,仍符合 redactor 判別格式)
- 減少未來 search engine / fork 的暴露面(但 git history 仍存)

## 驗證
secret_redactor.py 8 個 test 全過,telegram regex 仍能辨識新假值格式。

## P1 backlog
- git history 清理(git filter-repo)需統帥批准 force push
- pre-commit hook 防未來再洩(grep TG token 格式 / detect-secrets)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:23:09 +08:00
Your Name
8d40bbff2b docs(aider-watch v2): 補 4 個全景盲點
統帥 2026-04-20 提醒「每次更新都不忘全景」— 在執行前做二次檢查
發現 4 個 plan 未處理的盲點,現補齊:

盲點 1:Mac 外網可達性
  - spec §8 + §8b 新增 Tailscale/nginx/VPN 三選一
  - plan Task B5 install.sh 前置提醒選配置

盲點 2:incident 洗版(同 session 多 error)
  - spec §8 新增 coalesce 策略(60s 窗口 per session_id)
  - plan Task A5 service 實作 create_incident_for_event 加 coalesce 邏輯
  - 加 2 個測試 case 驗證同 session reuse + 不同 session 分離

盲點 3:AI Router feedback 首次 rollout 風險
  - spec §8 新增 USE_AIDER_FEEDBACK flag 預設 false,灰度 7 天再開
  - plan Task A8 route() hook 外包 if settings.USE_AIDER_FEEDBACK block
  - plan Task A9 config 加 USE_AIDER_FEEDBACK: bool = False

盲點 4:AWOOOI_PG_PW secret 取得
  - spec §8c 新增 kubectl get secret → env → shred 流程
  - plan Task A0 Step 1 明確寫出 K8s Secret 讀取 + 立即銷毀檔案

符合 feedback_ai_autonomous_direction.md 的全景思考紀律。
執行策略:全 subagent-driven(統帥批准)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:04:13 +08:00
Your Name
345e6832da docs(aider-watch): v2 implementation plan — 18 tasks across server/client/E2E
對應 v2 spec 2026-04-20-aider-watch-v2-design.md:

Phase A (server, 10 tasks, TDD):
  A0 HMAC secret + env setup
  A1 adr091 migration
  A2 secret_redactor util
  A3 Pydantic AiderEventIn/AiderBatchIn
  A4 AiderEventRepository
  A5 aider_event_service (classify/incident/pattern)
  A6 API webhook HMAC-verified
  A7 Redis stream consumer job + daily pattern extract
  A8 ai_router feedback_from_aider_events hook
  A9 config settings + main.py lifespan register

Phase B (Mac client, 5 tasks):
  B1 scaffolding (parsers/config/redactor 從 v1 搬)
  B2 api_client HMAC + retry
  B3 JSONL buffer + flush
  B4 aiderw wrapper + cli
  B5 install.sh + launchd plist

Phase C (E2E, 3 tasks):
  C1 happy path Mac → awoooi
  C2 degradation + buffer flush
  C3 AI Router feedback verification (fixture-driven)

Self-review:spec 覆蓋率 100%,無 placeholder,型別一致。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:04:13 +08:00
Your Name
8ce8efad29 docs(aider-watch): v2 設計稿 — 完全整合 awoooi AI 自主化飛輪
統帥 2026-04-20 指示「C 路線 + 甲 bot」— v1 獨立個人工具路線與
awoooi MASTER blueprint 全景割裂,違反 feedback_ai_autonomous_direction
北極星(純記錄非自主化)。v2 重新對齊:

- DB:進主 PG,新 migration adr091 的 aider_events 表
- Telegram:走既有 telegram_gateway @tsenyangbot + Redis dedup
- Incident:aider error 自動建 incident 走既有告警鏈
- AI 學習回路:symptom_pattern 抽取 + AI Router feedback hook
- Mac client:薄殼 HTTP POST + 本機 JSONL fallback buffer

v1 產物去向:events.py/redactor.py 搬進 awoooi;其他廢棄。
@NemoTronAwoooI_Bot 轉 sandbox 用,不刪。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:04:13 +08:00
Your Name
55486ce2fd docs: aider-watch 實作計畫(15 tasks,TDD + 頻繁 commit)
對應 spec 2026-04-19-aider-watch-design.md 的完整 §1-§7 拆解:
scaffold → events schema → redactor → config → tg format/send → PG DDL
→ storage → parsers → wrapper → CLI → reporter → launchd → install → E2E。

每個 task 含 TDD 步驟(測試先行 → 驗失敗 → 實作 → 驗通過 → commit)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 22:42:41 +08:00
Your Name
8603bce23b docs: aider-watch 設計稿(統帥批准的 §1-§7 定稿)
aider CLI 全程監控系統:Python wrapper 攔 aider stdout + chat history
→ Telegram DM 即時推播(session start/end/file edit/error/commit/silent
timeout)+ PG 192.168.0.188/aider_watch 累積儲存 + 每日 23:50/每週日
22:00 launchd 日週報。

Graceful degradation:PG 不可達 fallback 本機 JSONL buffer + 5min
flush job;Telegram 429 指數退避不阻塞 aider;secret pattern 自動遮罩。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 22:39:40 +08:00
OG T
0670fe4d76 docs(master): §8 追加 Phase 7 Round 3 Telegram 子系統修復記錄
Round 3 Changelog 條目:
- 9 bugs 盤點 + 5 commits 清單
- git tag v7.3.0
- 交接指引給下個 Session

2026-04-19 凌晨 — ogt + Claude Opus 4.7
2026-04-19 01:32:52 +08:00
OG T
5ae82d1d1f feat(db): ADR-090 L4 AIOps 地基 — 資產盤點 × 7 項自動化覆蓋矩陣永久化 DB
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-18 下午(台北時區)—— ogt + Claude Opus 4.7 (1M)

MoWoooWorkDown 假警報 RCA 暴露三重結構性失守:
- 110/188 主機 load 18/16 × 13 天 / cadvisor 288% / K3s 120/121 無監控
- Prometheus 僅 35 targets / 58 rules(覆蓋不到三成)
- HostHighCpuLoad 量錯維度(CPU idle vs load_avg)

統帥戰略指令:
- 全景資產 × 七大自動化 × 永久化 DB
- AI 四分工(OpenClaw × NemoTron × Hermes × Claude LLM)
- 所有自動化操作歷程必進 DB,不靠 MD(MD 會漂移)

本 commit 交付:

1. SQL migration (apps/api/migrations/adr090_asset_inventory_foundation.sql)
   - 11 張表 + 33 indexes + 20 CHECK + 3 UNIQUE + 16 FK
   - pgcrypto extension dependency
   - 完整 idempotent(CREATE IF NOT EXISTS + single transaction)
   - 已 apply 進 awoooi_prod(188 PG),驗證通過

2. ADR-090 (docs/adr/ADR-090-monitoring-blindspot-governance.md)
   - 決策紀錄 + 7 引擎對映 + 4 替代方案否決

3. 主戰略文件 (docs/superpowers/specs/2026-04-18-blindspot-governance-capacity-l4.md)
   - §0-§14: 背景 / 根因 / Schema DDL / 4 層防禦 / 7 Phase 實施 /
     HARD_RULES / AI 分工矩陣 / 驗收指標 / 技術債 / 回滾 / 接手協議

4. MASTER §8 Living Changelog 追加 Phase 7 啟動條目

11 張表:
  asset_inventory / asset_discovery_run / asset_coverage_snapshot /
  asset_relationship / alert_rule_catalog / asset_change_event /
  asset_compliance_snapshot / host_capacity_snapshot /
  capacity_violation_event / automation_operation_log /
  ai_collaboration_trace

首筆 bootstrap 記錄已 seed 進 asset_discovery_run
(run_id=6760c5bf-57e5-4a40-b82d-31b794464652)

相關 Memory (未 commit,存於 ~/.claude/...):
  - project_blindspot_governance.md (跨 session 指針)
  - feedback_monitor_self_monitoring.md (監控工具必須被監控)
  - feedback_secrets_leak_incidents_2026-04-18.md (憑證外洩三防線)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:18:46 +08:00
OG T
e465ee1936 docs(Phase 3): Evolver 演練完成 — exit condition #6 通過
- MASTER spec §3/§7/§8:三處 Evolver 演練勾選完成
- LOGBOOK:演練結果記錄 + 下一步更新為 7 天生產監控

演練結果:POST /api/v1/learning/evolver/run → HTTP 200 errors:[] 2026-04-15

ADR-083 Phase 3 — 2026-04-15 ogt + Claude Sonnet 4.6(亞太)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:24:33 +08:00
OG T
4718c7667c feat(Phase 3): Evolver loop 排程 + 手動觸發端點 — 合併演練閘道完工
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- playbook_evolver.py: 新增 run_evolver_loop()(24h 無限迴圈)
- main.py: 掛載 run_evolver_loop asyncio.create_task
- api/v1/learning.py: POST /api/v1/learning/evolver/run(Phase 3 exit #6 演練端點)
- MASTER §8: 補錄 66c4eda AgentSession + 本次 Evolver 完整退出條件清單

ADR-083 Phase 3 — 2026-04-15 ogt + Claude Sonnet 4.6(亞太)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:07:56 +08:00
OG T
fb1bbd0e20 feat(Phase 3): 學習閉環補完 — Root cause 3 + 診斷 feedback + 知識遺忘 + Fine-tune 管線
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- approval_execution.py: _run_post_execution_verify() 補接 record_verification_result()
  Root cause 3 終結:環境驗證結果(success/degraded/failed/timeout)不再孤立
- learning_service.py: 新增 record_verification_result() — 驗證結果 → Redis + Playbook EWMA
- learning_service.py: 新增 record_diagnosis_outcome() — 誤診負向訊號回寫(L3×D4)
- jobs/knowledge_decay_job.py: 新建 30d 知識遺忘 Job(未引用 draft/review → archived)
- services/finetune_exporter.py: 新建每週 JSONL 匯出(EvidenceSnapshot × AgentSession)
- main.py: 掛載 knowledge_decay_loop(24h)+ finetune_export_loop(7d)
- MASTER §8: Phase 3 核心改造項全部落地記錄

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 20:57:43 +08:00
OG T
05b774386b feat(Phase 6): AI SLO REST API — GET /api/v1/ai/slo 收官
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
ADR-087 Phase 6 自我治理閉環最後一塊拼圖:

1. api/v1/ai_slo.py — GET /api/v1/ai/slo
   - Service 層快取優先(TTL 5min,AiSloCalculator.get_cached_report)
   - force_refresh=true 強制重算(AiSloCalculator.run)
   - Router 層零 Redis 直接存取(leWOOOgo 積木化鐵律)

2. main.py — 路由掛載 ai_slo_v1.router(prefix=/api/v1)

3. MASTER §8 Living Changelog 追加:
   - P0 告警靜默 3 根因 RCA 完整紀錄
   - P2 飛輪斷鏈修復摘要
   - Phase 6 全元件完成清單

Phase 6 退出條件 5/6 已達(生產驗證待 image 上線)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:57:26 +08:00
OG T
7da64eaad2 feat(Phase 3): 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 19m7s
Type Sync Check / check-type-sync (push) Failing after 1m18s
ADR-083 Phase 3 學習閉環重建:

**三根因修復**
- approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2
  (成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash)
- models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位
- decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id
- learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback)

**2x EWMA 負向強化**
- models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位)
- repositories/playbook_repository.py: update_stats 加 EWMA
  成功: trust = 0.9 × old + 0.1 × 1.0
  失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x)
  trust < 0.1 → log warning,等 Evolver 封存

**Evolver Agent(新建)**
- services/playbook_evolver.py: 三功能全靜態規則
  1. 低信任封存: trust < 0.1 → DEPRECATED
  2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED
  3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
  AIOPS_P3_EVOLVER_ENABLED=False 預設關閉

**文件**
- ADR-083 學習閉環重建
- MASTER §8 Phase 3 完工記錄

AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟

Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
2026-04-15 14:01:37 +08:00
OG T
db9e304a14 feat(adr-080): Phase 0 防護欄建立 — AI 自主化飛輪啟動
- docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md
  (1456 行,§0-§8 全填完:42-cell 戰術矩陣、7 Phase 計畫、7 ADR 摘要、
   15 KPI、21 Feature Flags、10 風險場景)

- docs/adr/ADR-080-ai-autonomy-flywheel-overview.md
  (7 Phase 結構 + 4 北極星 + 7 架構師 Review Gates + Phase 退出條件)

- apps/api/src/core/feature_flags.py
  (AIOpsFeatureFlags: P1~P6 總開關全 False + 15 細粒度子開關
   is_phase_enabled() / is_sub_flag_enabled() + bool cast 安全)

- apps/api/src/jobs/__init__.py + baseline_snapshot.py
  (Phase 0 基線快照 Job:MCP calls / Playbook confidence / general 比例
   / learning loop rate / auto_repair — 寫入 aiops:baseline:latest)

- apps/api/tests/test_feature_flags.py  (21 tests — 全綠)

- docs/HARD_RULES.md → v1.9
  (新增 Phase 退出條件鐵律:禁止未過 exit conditions 宣告 Phase 完成)

- CLAUDE.md 防失憶閘門 1:強制讀 MASTER §0 Session Resume Protocol

Gate 0 Pass — 21/21 tests green

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 12:44:53 +08:00
OG T
50edeaa9ea docs(Phase 5): 分類按鈕完整化 — 完整解決方案與實施步驟
統帥要求「提出完整的解決方案和詳細的實施步驟」→ 本 plan 回覆。

內容涵蓋:
- 28 按鈕完整 action → MCP tool 對應表(3 類:查/寫/secops)
- 6 個 Sprint 工作分解(5.0 規格 → 5.1 dispatch → 5.2 查類 → 5.3 寫類 → 5.4 secops → 5.5 E2E)
- 架構設計決策(callback_dispatcher registry pattern)
- 依賴與風險矩陣
- 5 個 E2E 驗收案例
- Rollout 策略(查類先上線,觀察 24h 再上寫類)

估時: 3-5 天(總計 5.5 工作日)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 20:22:03 +08:00
OG T
d32d494320 docs: 四階段細化實施步驟 + 架構轉型截圖定案 + 防偏差守則
規格書 v2.0 新增:
- §十一 四階段細化實施步驟(階段1~4各含驗收清單)
  - 階段1: CD解鎖+debounce+alertname+冷啟動Playbook+KM向量化(9步)
  - 階段2: DB Migration+classify_alert_early+outcome寫入(5步)
  - 階段3: 分診站+SSH路由+TYPE-1/E/F+action解析+risk_level(Tier3,7步)
  - 階段4: KMConversionService+手動修復記錄(4步)
- §十二 防偏差守則(不跳步驟/Tier3授權/不改範圍/異常立刻報告)

ADR-073 更新:架構轉型截圖定案(舊架構中斷→新架構分診飛輪)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:30:37 +08:00
OG T
d3ddaafcfd docs(spec): v2.2 新增 §15 Subsystem 1 核心飛輪修復路線圖(2026-04-12)
- 四階段路線圖定案(截圖對應):CD解鎖→數據完整性→路由用戶體驗→知識引擎
- 各階段解鎖條件與 Tier 標記
- 整合 ADR-073/ADR-074 參考
- 飛輪停擺統計數據(觸發原因)
- 後續子系統前提條件

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:23:45 +08:00
OG T
77771c16b1 docs(spec): ADR-073/074 AIOps 飛輪全面修復整合規格書 v1.0
整合四個層次的完整解決方案:
- 層次一 ADR-073-A:緊急解封(CD修復/alertname/debounce/Playbook冷啟動/KM向量化)
- 層次二 ADR-073-B:路由修正(檢傷分類站/SSH路徑/action解析/KMConversionService)
- 層次三 ADR-074:監控補全(飛輪健康度Exporter/網路/DNS/Gitea CI/備份還原測試)
- 層次四 ADR-073-C:前端飛輪即時化(真實API/WebSocket/KPI面板)

整合來源:ADR-073盤點 + v2.2規格書§14.11 ADR-071工作序 + 監控缺口盤點 + 飛輪截圖定案

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:21:02 +08:00
OG T
09982fdfaa docs(session6): Telegram 全面審計 + ADR-072 Bug 清單 + 規格整合
- LOGBOOK: Session 6 Redis DB10 審計結果(8個系統性問題,P0-P2分級)
- ADR-072: AIOps 閉環 Bug 修復清單(drift_interpreter/deployment_name/KM vectorization等)
- 規格文件 v2.2: 確認 Sprint A/B/C + MCP 1-4 + ADR-071 全部完成,標記下一步為 ADR-072

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 20:04:50 +08:00
OG T
fa7b763689 docs(infra): ADR-069 基礎設施重建計畫規格 v1.3 — Sprint A/B/C 完整設計
新增 Sprint A(清廢棄修錯誤)+ Sprint B(Ansible+ArgoCD GitOps)+ Sprint C(Velero+rsync DR)
完整技術調查:Sentry snuba DNS根因、Harbor port錯誤、bitan Docker化需求、volumes盤點
加入第十二節(與現有專案整合)+ 第十三節(文件更新時間表)
LOGBOOK 更新、project_master_workplan 加入 ADR-069 Sprint A/B/C

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 00:01:07 +08:00
OG T
6f7a4be2c7 docs: Sprint 5.1 資料安全護欄 — ADR-062/063 + 方案規範驗證
- ADR-062: Data Safety Guardrails (服務分級/Pre-flight/MultiSig)
- ADR-063: Service Registry IaC 設計規範
- Sprint 5.1 方案文件: 規範驗證通過,P1-P5 問題修正
  - P1: Playbook 存 Redis(非 SQL),M-001 改為 Pydantic model 修改
  - P2: velero_client.py 命名維持(與 signoz_client 慣例一致)
  - P3: docker-health-monitor 狀態釐清
  - P4/P5: DI setter + Deployment Verification 補充
- LOGBOOK: 當前焦點更新為 Sprint 5.1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 16:07:12 +08:00
OG T
83e9d3eef8 docs(specs): Sprint 5 四份技術文檔 — Tab 規格/路由對照/元件抽取/API 變更
1. Tab 結構規格書: 每個新頁面的 Tab 配置、區塊佈局、元件複用方式
2. 路由對照表: 26 個舊 URL → 新位置的精確映射 + redirect 實作方式
3. 元件抽取計畫: 17 個頁面抽取為 Panel 元件的步驟和目錄結構
4. API 變更規格: DashboardResponse +3 欄位 + SSE +1 事件 (不新增 API)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 16:03:58 +08:00
OG T
bb6a57dd87 docs(plan): Sprint 5 前端資訊架構重組 — 完整解決方案
涵蓋:
- 第一章: 現有 26 頁面 + 62 元件完整資產清單
- 第二章: 重組對照表 (25→6+2 導航,零功能遺失)
- 第三章: 6 個新頁面的 Tab 結構與元件整合
- 第四章: 舊路由向後兼容 (20+ redirect)
- 第五章: 共用 Tab 容器元件規格
- 第六章: 新導航 Sidebar 結構
- 第七章: 互動模式規範 (Tab/Drawer/Modal/Toggle)
- 第八章: 細化實施步驟 (6 Phase, 30 Step)
- 第九章: 檔案影響清單 (15 新增 + 5 修改)
- 第十章: 8 份技術文檔清單
- 第十一章: 風險矩陣
- 第十二章: 時程預估 (~10天, 3批交付)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-08 16:01:38 +08:00
OG T
8788c720e4 docs(plan): Sprint 5 完整解決方案 — 與現有架構整合的細化實施計畫 2026-04-08 12:22:05 +08:00
OG T
f2b3a7129f docs(plan): Sprint 5 指令中心重設計 — 完整解決方案與細化實施步驟 2026-04-08 12:01:14 +08:00
OG T
246587a401 fix(web): Sprint F 前端打假行動 — 29處假數據全面清除 (首席架構師 98/100)
P0: Neural Command 三個子組件移除所有 MOCK 常數,接上真實 API props
- NeuralLiveCenter: 假歷史/假KPI/假雷達 → 從 stats/history/incidents 即時計算
- NeuralStats: MOCK_HISTORY/SCHEME_STATS/PLAYBOOK_RANKINGS → useMemo 聚合
- NeuralApprovalPanel: MOCK_PENDING → 真實 /api/v1/approvals 簽核操作

P1: 10+處假用戶身份 (demo-user/user-001/War Room User) → CURRENT_USER 常數統一
P2: 刪除 6 個 Demo 匯出 (GlobalPulseChartDemo/MOCK_APPROVAL/DEMO_DECISION_CHAIN)
P3: /demo 頁面加 NEXT_PUBLIC_ENABLE_DEMO 環境變數保護
i18n: 新增 22 個翻譯鍵 (zh-TW + en)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 12:53:52 +08:00
OG T
e82d3802c5 docs: Sprint 4 告警處置統計系統 — 完整計畫文件 + LOGBOOK 更新
Sprint 4 計畫包含 6 Phase / 19 工作項:
- Phase A: 資料層 (IncidentFrequencyStats + Redis 計數器)
- Phase B: 寫入層 (4 觸發點: auto_repair/cold_start/human/manual)
- Phase C: API 端點 (/stats/disposition)
- Phase D: Telegram 告警卡片統計
- Phase E: 前端 (/reports 儀表板 + 首頁 + auto-repair + neural-command)
- Phase F: 週報 + 文件

首席架構師審查: 100% Fully Approved
衝突檢查: 所有依賴正確,DAG 無環

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-07 11:37:21 +08:00
OG T
1a8021bfaa docs(plans): Sprint 3 SSH_COMMAND 指揮權鏈實作計畫 (7 tasks) 2026-04-06 14:08:28 +08:00
OG T
be60ec1507 docs(plan): ADR-059 Gitea Webhook 遷移實作計畫 (9 Tasks)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:22:29 +08:00
OG T
5cd67d372f docs(spec): ADR-059 Gitea Webhook 遷移設計規格
從 GitHub Webhook (Phase 13.1) 遷移至 Gitea Webhook
最少改動策略:Header 常數替換,業務邏輯層不動
廢棄 workflow_run CI 診斷(CD pipeline 已有 TG 通知覆蓋)
整合首席架構師護欄:防禦性 payload 解析 + Content-Type 設定

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:17:13 +08:00
OG T
0db9b41808 docs(plan): Observability + Auto-healing 完整實施計畫 (15 Tasks, 3 Sprints)
Sprint 1 (P0): Prometheus 統一告警規則 + Sentry 啟動 + CD 同步
Sprint 2 (P1): SigNoz 日誌告警 + Sentry SDK 標籤
Sprint 3 (P2): SSH HostRepairAgent 基礎設施

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 02:24:23 +08:00
OG T
de33abe0e3 docs(spec): 全系統自愈閉環設計規格 v1.0
整合三大問題的完整解決方案:
1. Prometheus 規則未部署 (13條→40+條,含SentryDown/AlertChain)
2. 日誌收集但無log-based alerting
3. 自動修復只限K8s層,無Host Docker/systemd修復能力

包含:
- 統一標籤規範 (layer/component/team/host)
- Sprint 1: 規則部署+Sentry啟動+CD同步
- Sprint 2: SigNoz log alert + Sentry整合
- Sprint 3: SSH HostRepairAgent + Playbooks
- SOP v4.0整合更新點

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 02:14:01 +08:00
OG T
2243a21b96 fix(ai-router): v4.3 NIM 保護 — timeout 不計 CB 失敗,每次先跑 NIM 才切 Gemini
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 20s
需求: NIM 必須等到有回應才切換,不能因為慢就被 CB 封鎖走 Gemini

變更:
- Timeout exception 不累積 CB failure(只有真實連線錯誤才計)
- NIM CB: failure_threshold=10, recovery_timeout=30s(比預設寬鬆)
- 設計文件 v4.3: 更新方向二,移除錯誤假設

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:51:12 +08:00
OG T
0c180dec86 docs(spec): 方向二實作修正記錄 — Nemotron privacy_level=cloud (P0) 2026-04-04 17:42:53 +08:00
OG T
0b41df45d6 docs(plans): 三方向實作計畫 P0/P1/P2
- P0: DIAGNOSE Privacy-First Routing(local chain 隔離 + REJECT 保護)
- P1: Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 生成)
- P2: Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:31:36 +08:00
OG T
035cb9cd0d docs(spec): Nemotron 主動防禦三方向設計文件
- 方向一:Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 自動生成)
- 方向二:DIAGNOSE Privacy-First Routing(Local-Only Fallback Chain)
- 方向三:Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析)

首席架構師 ogt 100% 技術背書

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:18:11 +08:00
OG T
51961b9f03 docs: Phase O 可觀測性終極補完計畫設計規格
SigNoz 統一派架構,解決 6 大盲區 (Event/Log/Metrics/Descheduler/kubectl/MinIO-Kali)
+ Monitoring Master Plan Wave A-D 收尾
+ 5 個首席架構師 Review 節點

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:45:23 +08:00
OG T
db2a2852b8 docs: 前端重構驗收報告 87/100
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Playwright 瀏覽器截圖 + KB API 端點測試 + Console 分析
- 24/24 路由零 404
- 7 完整頁面 + 15 ComingSoon
- KB API 7 端點全部正常
- 1 Low bug (archived entry still accessible via GET)
- Metrics Strip [object Object] 待修

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:20:27 +08:00
OG T
25889d4b8e docs: 歸檔 ADR-050 reanalyze 實作計畫 (已完成)
Some checks failed
CD Pipeline (Dev) / build-and-deploy-dev (push) Failing after 9s
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:38:03 +08:00
OG T
5959855a71 feat(web): 字體系統升級 + NemoClaw SVG 還原 + Knowledge Base 設計文件
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 字體:Syne (標題) + DM Mono (內文) + VT323 (品牌點陣),替換 Inter
- Tailwind: fontFamily 更新 + 5 層文字色彩 token (primary→disabled)
- Sidebar: NemoClaw 白瓷龍蝦爪 SVG + AWOOOI 用 VT323 放大
- OpenClaw Panel: 還原 NemoClaw 3D 白瓷龍蝦爪 (替換 NemoNodeAnimation)
- Knowledge Base 設計文件 (B分離/A K8s Job/Phase1跳過向量搜尋)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:48:42 +08:00
OG T
8845377a6d docs: 更新 AI中心重設計規格 (廢棄元件 + 授權邏輯記錄)
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 22:28:32 +08:00
OG T
0b04abf990 docs(plan): add AI Center v6 redesign implementation plan (13 tasks) 2026-04-01 19:39:41 +08:00
OG T
4b84e95723 docs: AI中心 UI 重設計規格文件 v6
- Anthropic Warmth (#f5f4ed) + OpenClaw Blue (#4A90D9) 色彩系統
- 3欄佈局:Sidebar(200px) | Feed(50%) | RightPanel(50%)
- 完整側邊欄:4區19項(整合 wooo-aiops 所有菜單)
- 事件卡片流程圖 + Q版龍蝦 (橘紅本色 #E85530)
- NemoClaw 白底節點動畫(截圖風格)
- 全面圓角規範

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 19:19:03 +08:00