Commit Graph

166 Commits

Author SHA1 Message Date
OG T
de6dcd181a docs(logbook): Session 結尾更新 — Backlog 清零 + 全站真實數據驗收
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:19:38 +08:00
OG T
41ec9efc32 docs(logbook): 更新至 2026-04-10 深夜 — ADR-067全完成 + CI B5通過 + SOUL v5.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:03:41 +08:00
OG T
8157d139a7 docs(logbook): 飛輪 Telegram 回饋閉環 + 心跳排除記錄 2026-04-10 15:56:58 +08:00
OG T
b52e2de968 docs(adr068): 飛輪冷啟動修復結案文件 + Skills v2.8
- ADR-068: 完整記錄五根因、四階段修復、首席架構師審查、E2E 驗收、驗證 Runbook
- LOGBOOK: 更新當前狀態,標記全閉環
- Skill 02 v2.8: 新增「自動修復飛輪六大鐵律」章節(affected_services/alert_name/Router層/Jaccard/alertname變體/Embedding雙軌)

2026-04-10 Asia/Taipei — Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:39:42 +08:00
OG T
0077eee452 docs(logbook): Phase O-6 視覺化驗收 + 全 Backlog 閉環記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 10:48:29 +08:00
OG T
5d2bf6ce18 docs(logbook): C1 修正閉環 — rag_chunk_repository 完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 10:46:19 +08:00
OG T
98c450d10a docs(logbook): Phase 33 架構審查+修正閉環記錄 (2026-04-10)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 10:39:53 +08:00
OG T
2065665c9b docs(adr): ADR-067 Ollama 五大本地 AI 應用實施規格
批准五大應用 Phase 30-34,依序執行:
1. Drift 報告中文摘要 (qwen2.5:7b-instruct)
2. Log 異常摘要 (deepseek-r1:14b)
3. PR 自動審查 (qwen2.5-coder:7b)
4. RAG 知識庫 pgvector (nomic-embed-text)
5. 圖片分析 (llava:latest)

pgvector 0.8.2 已確認在 prod 就緒

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:35:07 +08:00
OG T
25412807f5 docs(logbook): B1 Sensor Agent 兩台主機部署完成 2026-04-10 00:16:45 +08:00
OG T
c803e94370 docs(logbook): Sprint 5R Phase 3 閉環記錄 2026-04-10 00:04:46 +08:00
OG T
5b42bd34e6 docs(logbook): Sprint 5R Phase 2 完整閉環記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 23:24:50 +08:00
OG T
3bdac2e68e docs(logbook): Sprint 5R 架構審查+QA全驗收閉環記錄
- 首席架構師審查 9 項修復完成
- CORS/sign/host_aggregator 修復
- QA 9/9 頁面通過,無假資料

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 20:33:55 +08:00
OG T
309fe04698 docs(adr066): 批准執行閉環修復記錄 — LOGBOOK + ADR-066 + Skill 02 更新
- LOGBOOK.md: 新增 2026-04-09 批准執行閉環修復狀態區塊
- ADR-066: 記錄根本問題鏈條、決策與受影響檔案
- Skills/02: v2.7 新增 Nemotron tool→kubectl_command 回填鐵律 + 教訓

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:23:55 +08:00
OG T
2779233b25 docs: Sprint 5R 實施完成紀錄更新
- LOGBOOK: 13/14 步驟全部完成,CD 部署中
- ADR-065: 狀態更新為「實施完成」
- Skills 01 v1.8: Sprint 5R 完成記錄
- Memory: project_current_status + sprint5r_plan 已更新

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:19:57 +08:00
OG T
a39647d793 docs(logbook): 自動修復全鏈路完整閉環記錄 — 雙主機 E2E 驗證通過
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
docker-110: SentryDown → REPAIR_OK:sentry (6208ms)
docker-188: MoWoooWorkDown → REPAIR_OK:momo-app (3791ms)
20 Playbooks (8 auto-generated), repair-bot 雙主機白名單更新

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:14:17 +08:00
OG T
c180bdaaac docs: Sprint 5R 前端重構批准 — ADR-065 + 設計稿 + Skills + LOGBOOK
- ADR-065: Sprint 5R 前端重構決策(版本 A 批准)
- sprint5r-approved-design.html: 統帥批准的設計稿存檔
- Skills 01 v1.7: 品牌 Logo/AwoooI 一致性鐵律
- LOGBOOK: Sprint 5R 開始實施

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 15:15:43 +08:00
OG T
aa2eb486ce docs(logbook): 自動修復 L7 閉環記錄 — 12 Bug 全修 E2E 6208ms 成功
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 14:55:40 +08:00
OG T
6615432471 docs(logbook): Sprint 5.2 自動修復閉環完成記錄 2026-04-09 12:01:33 +08:00
OG T
afe52c2c70 docs(logbook): Sprint 5 全面完成 + 監控告警全部修復
- C1-C4 + I1-I5 審查修正清零
- node-exporter Docker 部署 110+188
- RedisMemoryHigh 除以零誤報修正
- Prometheus 0 firing alerts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 11:48:58 +08:00
OG T
12b084e2e0 docs(logbook): 2026-04-09 Telegram 截斷修復 + Panel 抽取全完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 11:21:59 +08:00
OG T
f51ef5e089 docs(logbook): 首席架構師審查 P0 修正完成記錄 2026-04-09 11:08:51 +08:00
OG T
428e66c111 fix(arch-review): 首席架構師審查 S1×3 S2×3 S3×3 全修復 + ADR-064
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
S1 Critical:
- S1-1: asyncio 觸發移至 _call_with_fallback async 上下文,移除 sync 中的 get_event_loop()
- S1-2: _append_rule_to_yaml 加 textwrap.dedent() 正規化 LLM 輸出縮排
- S1-3: _matches() 對 alertname=["*"] 直接回傳 False,防意外命中

S2 Major:
- S2-1: auto_generate_rule() 改為 DI 參數注入 (ollama_url/model/gemini_api_key),移除 import settings
- S2-4: _generate_mock_response docstring 澄清為規則引擎生產路徑,非假數據
- S2-5: suggested_action .strip() 防空白字串繞過 or

S3 Minor:
- S3-2: priority 上界 min(next, 890)
- S3-3: alertname sanitize re.sub([{}]) 防 format KeyError
- S3-4: model_registry.py 最後修改時間戳更新

文件:
- ADR-064: Alert Rule Engine YAML 驅動 + AI 自動學習
- Skills 02: 告警規則引擎 DI 規範 + asyncio 禁止事項
- Skills 03: _generate_mock_response 語意澄清 + 規則引擎降級流程
- LOGBOOK: 本次 Session 完整記錄

2026-04-09 ogt: 首席架構師審查修正

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 10:52:40 +08:00
OG T
9af281cc98 docs(logbook): Sprint 5 前端重設計完成記錄 2026-04-09 09:15:20 +08:00
OG T
0af5c2e89c docs(sprint5.1): LOGBOOK + ADR-062 + Skill 02 更新(首席架構師審查記錄)
- docs/LOGBOOK.md: 當前狀態更新至 L1-L5+審查完成,里程碑補充審查修正記錄
- docs/adr/ADR-062: 新增實施記錄章節(執行清單+審查問題+修正方式)
- .agents/skills/02-lewooogo-backend-core.md v2.5→v2.6:
    加入 Sprint 5.1 Service Registry 模式
    加入 Guardrail 保守原則(失敗 block 不放行)
    加入新 Service 標準樣板(structlog/now_taipei/DI setter/try-except)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 16:38:31 +08:00
OG T
6f7a4be2c7 docs: Sprint 5.1 資料安全護欄 — ADR-062/063 + 方案規範驗證
- ADR-062: Data Safety Guardrails (服務分級/Pre-flight/MultiSig)
- ADR-063: Service Registry IaC 設計規範
- Sprint 5.1 方案文件: 規範驗證通過,P1-P5 問題修正
  - P1: Playbook 存 Redis(非 SQL),M-001 改為 Pydantic model 修改
  - P2: velero_client.py 命名維持(與 signoz_client 慣例一致)
  - P3: docker-health-monitor 狀態釐清
  - P4/P5: DI setter + Deployment Verification 補充
- LOGBOOK: 當前焦點更新為 Sprint 5.1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 16:07:12 +08:00
OG T
f525e657ca docs: ADR-060/061 全面監控+Event Sourcing架構決策記錄
- ADR-060: 全面基礎設施監控規劃 (Plan A/B/C/D/E)
- ADR-061: Alert Operation Log Event Sourcing 架構
- LOGBOOK: 2026-04-08 里程碑記錄更新

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:44:06 +08:00
OG T
3a3f9cf70c docs(logbook): Sprint 4 全棧完成記錄 — 6 Phase / 19 工作項 2026-04-07 13:02:59 +08:00
OG T
37bddbb430 docs(logbook): Sprint 4 Phase E 前端處置統計完成記錄
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:01:22 +08:00
OG T
246587a401 fix(web): Sprint F 前端打假行動 — 29處假數據全面清除 (首席架構師 98/100)
P0: Neural Command 三個子組件移除所有 MOCK 常數,接上真實 API props
- NeuralLiveCenter: 假歷史/假KPI/假雷達 → 從 stats/history/incidents 即時計算
- NeuralStats: MOCK_HISTORY/SCHEME_STATS/PLAYBOOK_RANKINGS → useMemo 聚合
- NeuralApprovalPanel: MOCK_PENDING → 真實 /api/v1/approvals 簽核操作

P1: 10+處假用戶身份 (demo-user/user-001/War Room User) → CURRENT_USER 常數統一
P2: 刪除 6 個 Demo 匯出 (GlobalPulseChartDemo/MOCK_APPROVAL/DEMO_DECISION_CHAIN)
P3: /demo 頁面加 NEXT_PUBLIC_ENABLE_DEMO 環境變數保護
i18n: 新增 22 個翻譯鍵 (zh-TW + en)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 12:53:52 +08:00
OG T
e82d3802c5 docs: Sprint 4 告警處置統計系統 — 完整計畫文件 + LOGBOOK 更新
Sprint 4 計畫包含 6 Phase / 19 工作項:
- Phase A: 資料層 (IncidentFrequencyStats + Redis 計數器)
- Phase B: 寫入層 (4 觸發點: auto_repair/cold_start/human/manual)
- Phase C: API 端點 (/stats/disposition)
- Phase D: Telegram 告警卡片統計
- Phase E: 前端 (/reports 儀表板 + 首頁 + auto-repair + neural-command)
- Phase F: 週報 + 文件

首席架構師審查: 100% Fully Approved
衝突檢查: 所有依賴正確,DAG 無環

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-07 11:37:21 +08:00
OG T
0dec007673 docs(logbook): 記錄 Sprint 3 P0 critical security fixes 完成
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 11m37s
2026-04-07 11:10:48 +08:00
OG T
93bcfb4ce8 docs: 更新 LOGBOOK — Sprint 3 SSH_COMMAND 指揮權鏈完成 2026-04-06 14:48:11 +08:00
OG T
b9ee58f752 fix(cd): 移除 parse_mode=HTML 避免 commit message 特殊字元造成 400 (non-fatal)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 13m15s
E2E Health Check / e2e-health (push) Successful in 36s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:32:02 +08:00
OG T
e51a68d309 docs(logbook): 記錄 Telegram/CD 顯示修復 + ADR-059 全部完成 2026-04-05 14:49:10 +08:00
OG T
35d37111f0 docs(logbook): ADR-059 全計劃執行完畢 (Task 1-9)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:47:05 +08:00
OG T
d9af8e1c7a docs(logbook): ADR-059 Gitea Webhook 遷移完成記錄
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:45:02 +08:00
OG T
6937238174 docs(logbook): 記錄 Telegram 按鈕修復 + SRE 群組格式升級 2026-04-05 14:17:11 +08:00
OG T
a49faf7baa docs: ADR-058 Host Auto-Repair SSH 白名單 + LOGBOOK 更新
首席架構師 Review 結果: 72→88/100
已修正: C1 C2 C3 M3 m1 m2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:09:58 +08:00
OG T
25e2e45353 docs(logbook): Telegram 格式重設計 + 按鈕修復首席架構師 R1 通過記錄
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:08:13 +08:00
OG T
2a2a1fac8b docs(logbook): Sprint 3 Host Auto-Repair 全閉環完成記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 12:31:19 +08:00
OG T
3136fc5ea0 feat(backup): 全面自動化備份 + AWOOOI DB + GFS 延長保留
首席架構師備份審計 — 全部自動化完成:

- backup-awoooi.sh:新增 AWOOOI PostgreSQL 備份腳本
  - awoooi_prod (KB/事故/AutoRepair/Drift) + k3s_datastore
  - 從 110 SSH 到 188 執行 pg_dump,整合進 restic
  - 首次執行:680K,9s,snapshot 8750748f 

- backup-all.sh v2.0:整合第 4 個服務 AWOOOI DB

- GFS 保留策略延長:
  - 每日 7→30 份(覆蓋最近 30 天)
  - 每週 4→12 份(覆蓋最近 3 個月)
  - 每月 6→24 份(覆蓋最近 2 年)

- BACKUP-STATUS.md:更新為全自動化狀態總覽

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:11:31 +08:00
OG T
84cfdb6195 docs(backup): 備份審計完整盤點 + 新增 AWOOOI DB 與 Gitea DB 備份腳本
首席架構師備份審計結論:
- awoooi_prod PostgreSQL: 無備份 (P0 缺口)
- Gitea SQLite DB: 無備份 (今日已損壞,人工修復耗時 2h+)

新增:
- scripts/backup/backup-awoooi-db.sh (188 部署,02:00 daily)
- scripts/backup/backup-gitea-db.sh (110 部署,01:00 daily)
- docs/runbooks/BACKUP-STATUS.md (全景表 + 部署步驟 + SOP)
- LOGBOOK.md 備份審計段落

待手動部署:統帥需 scp 腳本至 188/110 並設定 crontab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:01:58 +08:00
OG T
ddb75b69c5 docs(logbook): Phase 25 Review R2 通過 + ADR-054~057 記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:25:31 +08:00
OG T
c4923b6908 docs(logbook): Phase 22.4 + Phase 25 全部驗證通過記錄
- Phase 22.4 tests 18/18 PASSED (b6e12f7)
- embed-all 7/7 prod 成功
- semantic-search E2E score=0.6867 驗證通過
- drift /scan E2E 正常回應
- drift-scanner CronJob 每小時執行
- dev/prod DB migration (symptoms_hash + enum) 完成
- 53 integration tests PASSED

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:00:33 +08:00
OG T
369413f87d docs: 更新 LOGBOOK KB Phase 2 全修完成 + 5 tests PASSED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:55:40 +08:00
OG T
69a9218723 docs: 更新 LOGBOOK KB Phase 2 + 首席架構師 Review 紀錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:49:31 +08:00
OG T
cddc4cb1fc fix(knowledge): 首席架構師 Review 修復 C1+C2+I1+I2 (71→~88/100)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m16s
C1: IKnowledgeRepository Protocol 補齊 save_embedding + semantic_search +
    list_unembedded_entries,恢復 Interface 先行保護層

C2: embed_all_entries Service 層 raw SQL 移至 Repository.list_unembedded_entries()
    Service 改透過 Protocol 呼叫,符合 leWOOOgo 積木化原則

I1: asyncio.create_task 加入 _pending_tasks set 持有引用,防 GC 回收與
    Shutdown 時 Task 遺失;task done 後自動 discard

I2: OllamaEmbeddingService 從每次 new 改為 KnowledgeService.__init__ 注入,
    單一實例重用

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:22:38 +08:00
OG T
15aabd6ac5 fix(chat+nim): 修復首席架構師 Review I1-I4 + S3 四項重要問題
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m9s
I1: chat_manager._call_openclaw timeout=30.0 → 讀 settings.OPENCLAW_TIMEOUT
I2: nvidia_provider.py stale comment "45" → "55" 對齊 ConfigMap
I3: asyncio.shield 移除 — shield 超時後 task 繼續跑但無人等待 (silent leak)
I4: ChatManager.__init__ 移除 repo 實例 (leWOOOgo 禁 Service 持有 repository)
S3: _check_nemotron_health probe 10s → 25s + /v1/models 輕量端點

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 16:36:16 +08:00
OG T
dc232ebb49 docs: LOGBOOK 更新 — KB Phase 1 + monitoring + I1/I3 完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 13:22:54 +08:00
OG T
0b83707697 feat(web): APM/Apps/Deployments/Tickets 頁面升級 — 串接真實 API 數據
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- apm/page.tsx: Golden Signals 真實數據 (SignOz ClickHouse)
- apps/page.tsx: 主機服務狀態 (/api/v1/dashboard 真實數據)
- deployments/page.tsx: K8s 部署狀態串接
- tickets/page.tsx: Incidents 列表串接
- i18n: apm/apps/deployments/tickets namespace 雙語補齊

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 00:08:11 +08:00