Your Name
|
7795f027d2
|
fix(aiops): persist emergency intervention traces
CD Pipeline / tests (push) Successful in 2m56s
Code Review / ai-code-review (push) Failing after 39s
CD Pipeline / build-and-deploy (push) Successful in 12m54s
CD Pipeline / post-deploy-checks (push) Successful in 4m40s
|
2026-05-01 20:34:33 +08:00 |
|
Your Name
|
8e49f2ea88
|
fix(ci): preserve ssh mcp known hosts [skip ci]
|
2026-05-01 17:18:32 +08:00 |
|
Your Name
|
433f7b068e
|
fix(aiops): close ssh and telegram remediation gaps
CD Pipeline / tests (push) Successful in 2m7s
Code Review / ai-code-review (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Successful in 13m14s
CD Pipeline / post-deploy-checks (push) Successful in 4m29s
|
2026-05-01 16:53:02 +08:00 |
|
Your Name
|
b0da6da1e9
|
feat(aiops): structure agent loop shadow output
CD Pipeline / tests (push) Successful in 2m50s
Code Review / ai-code-review (push) Successful in 33s
CD Pipeline / build-and-deploy (push) Failing after 25m48s
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-01 15:09:57 +08:00 |
|
Your Name
|
f8e44971c1
|
feat(aiops): enable read-only agent loop canary
CD Pipeline / tests (push) Successful in 1m43s
Code Review / ai-code-review (push) Successful in 31s
CD Pipeline / build-and-deploy (push) Successful in 10m22s
CD Pipeline / post-deploy-checks (push) Successful in 4m3s
|
2026-05-01 14:20:16 +08:00 |
|
Your Name
|
7e4d995e4b
|
feat(aiops): add mcp agent loop foundation
CD Pipeline / tests (push) Successful in 1m59s
Code Review / ai-code-review (push) Successful in 28s
run-migration / migrate (push) Failing after 24s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 13:21:19 +08:00 |
|
Your Name
|
9db87f177e
|
fix(aiops): suppress repeated llm alert loops
CD Pipeline / tests (push) Successful in 1m37s
Code Review / ai-code-review (push) Successful in 28s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 13:02:07 +08:00 |
|
Your Name
|
11673d80ea
|
fix(aiops): route backup decisions through ssh
CD Pipeline / tests (push) Successful in 1m35s
Code Review / ai-code-review (push) Successful in 34s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 12:50:01 +08:00 |
|
Your Name
|
7cd53c0228
|
fix(monitoring): 記憶體告警改用 working_set,停止 page cache 假告警
- alerts-unified.yml:
- SentryClickHouseMemoryPressure: usage_bytes → working_set_bytes,0.8 → 0.85
- GiteaMemoryPressure: 同步修正(同樣 page cache 虛高根因)
- ops/monitoring/tests/clickhouse_memory_test.yml: promtool 4 cases
- 04-awoooi-devops-commander.md v2.8: Prometheus 指標選擇規範 + Gitea HMAC Webhook 規範
- LOGBOOK: 記錄 T0 五大並行任務(A 按鈕 / B ClickHouse / C Gitea webhook / D ElephantAlpha / F Code review)
鐵證: 2026-04-23 23:13 sentry-clickhouse usage_bytes=88.5% vs working_set=7.8%
根因: container_memory_usage_bytes 含 OS page cache,OOM killer 不視為壓力
修法: 改用 K8s/cadvisor 認可的 working_set_bytes (RSS + active cache),閾值 0.85
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 20:16:12 +08:00 |
|
Your Name
|
7d1c85eb86
|
fix(hermes): ANTHROPIC_API_KEY 注入 + solver 信心度修法 A + 12-Agent 治理文件
CD Pipeline / build-and-deploy (push) Has been cancelled
- nl_gateway.py: ClaudeAgentOptions 透過 env= 注入 ANTHROPIC_API_KEY(CLAUDE_API_KEY alias),
修復 SDK 找不到 API key 的問題(SDK 讀 ANTHROPIC_API_KEY,K8s secret 名稱是 CLAUDE_API_KEY)
- solver_agent.py: 修法 A — kubectl_command 欄位優先路徑,OpenClaw Nemo 回傳完整指令時
不再被語意合成壓縮 confidence(0.9 → min(0.5) 的 bug),9 tests pass
- AGENTS.md: Codex CLI 對應版 CLAUDE.md(Codex Session 啟動用)
- docs/12-agent-game-rules.md: 12-Agent 任務判型 + 主責/協作派工 + 9 skills 對照(v1.0)
- .agents/skills/06-awoooi-monorepo-master.md: v1.6,新增 12-agent 協作治理章節
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:33:43 +08:00 |
|
Your Name
|
712d146129
|
docs(adr+skills): ADR-092 AI Decision LLM 層 + Skill 03 更新統一 LLM pattern
首席架構師 2026-04-19 Review 92/100 Grade A 後的完整文檔化:
**ADR-092 新建 (AI Decision LLM 擴展架構)**:
- 背景: 14 scanner 中 8 個純 threshold,違反 feedback_ai_autonomous_direction
- 決策: 4 個 LLM service + 統一 pattern (llm_json_parser)
- 約束 5 鐵律: 失敗不 raise / AI 只建議不動作 / openclaw 統一入口 /
aol 留痕 / 繁中 + JSON schema
- 節流: Daily cron + 事件觸發 (red_ratio>30% 且 scanned>=50)
- autonomy_score 0-100 量化追蹤
- 實作成果 + P1 剩餘 + 回滾計畫
**Skill 03 openclaw-cognitive-expert 更新**:
- 新增「2026-04-19 AI Decision LLM 擴展層」章節
- Pattern code 範本 (不是每次重寫 3-path parse)
- 4 LLM service 對照表 + required_key
- 擴加 5 鐵律清單
- autonomy_score 追蹤使用說明
下 session Claude 接手時能快速看到 LLM service pattern,不會重複造輪子.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 22:42:58 +08:00 |
|
OG T
|
99cc420429
|
docs(review): 首席架構師 Code Review 後 — ADR-064/067 + Skill 02 補全記錄
ADR-064: 補 I1 整合記錄(get_incident_type 三層降級、rule.id ≠ incident_type 設計決策)
ADR-067: 補 D1 集中化完成記錄(9 purpose keys 對應表)
Skill 02: 補 get_incident_type 使用規範 + Ollama D1 模型中央化禁令
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:35:25 +08:00 |
|
OG T
|
a1432c03ed
|
docs: ADR-070/071 + ssh-mcp-setup runbook + Skill-04 v2.7
- ADR-070: 全自動 AIOps 閉環 MCP Phase 1-4 決策文件
- ADR-071: 告警通知四類型 + KM 三段資料閉環決策文件
- docs/runbooks/ssh-mcp-setup.md: SSH MCP 建立/驗證/輪換 SOP
- Skill-04: v2.7 新增 Sprint C DR + ADR-070 MCP 10 providers 完整記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:04:47 +08:00 |
|
OG T
|
a63c586d9a
|
docs: LOGBOOK + Skill04 更新 — Sprint B-1 + Architecture Review 記錄
- LOGBOOK: 新增 Sprint B-1 完成條目 + 架構Review修復清單
- Skill04 v2.6: 加入 Ansible IaC 目錄結構 + SSH MCP 安全規則
記錄首席架構師 2026-04-11 架構Review指令執行結果
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 02:52:29 +08:00 |
|
OG T
|
b52e2de968
|
docs(adr068): 飛輪冷啟動修復結案文件 + Skills v2.8
- ADR-068: 完整記錄五根因、四階段修復、首席架構師審查、E2E 驗收、驗證 Runbook
- LOGBOOK: 更新當前狀態,標記全閉環
- Skill 02 v2.8: 新增「自動修復飛輪六大鐵律」章節(affected_services/alert_name/Router層/Jaccard/alertname變體/Embedding雙軌)
2026-04-10 Asia/Taipei — Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-10 11:39:42 +08:00 |
|
OG T
|
309fe04698
|
docs(adr066): 批准執行閉環修復記錄 — LOGBOOK + ADR-066 + Skill 02 更新
- LOGBOOK.md: 新增 2026-04-09 批准執行閉環修復狀態區塊
- ADR-066: 記錄根本問題鏈條、決策與受影響檔案
- Skills/02: v2.7 新增 Nemotron tool→kubectl_command 回填鐵律 + 教訓
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-09 18:23:55 +08:00 |
|
OG T
|
c01026be9b
|
docs(skills+adr): 自動修復全鏈路知識更新 — ADR-058 Appendix A + Skills v2.5
ADR-058: 188白名單補完 + Appendix A (12 Bug修復記錄 + E2E驗證 + Playbook覆蓋矩陣)
Skill-04 DevOps v2.5: SSH自動修復架構章節 (白名單/SOP/陷阱)
Skill-05 SRE: 自動修復E2E驗收規範 + 診斷表
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-09 18:21:24 +08:00 |
|
OG T
|
2779233b25
|
docs: Sprint 5R 實施完成紀錄更新
- LOGBOOK: 13/14 步驟全部完成,CD 部署中
- ADR-065: 狀態更新為「實施完成」
- Skills 01 v1.8: Sprint 5R 完成記錄
- Memory: project_current_status + sprint5r_plan 已更新
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-09 18:19:57 +08:00 |
|
OG T
|
c180bdaaac
|
docs: Sprint 5R 前端重構批准 — ADR-065 + 設計稿 + Skills + LOGBOOK
- ADR-065: Sprint 5R 前端重構決策(版本 A 批准)
- sprint5r-approved-design.html: 統帥批准的設計稿存檔
- Skills 01 v1.7: 品牌 Logo/AwoooI 一致性鐵律
- LOGBOOK: Sprint 5R 開始實施
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-09 15:15:43 +08:00 |
|
OG T
|
428e66c111
|
fix(arch-review): 首席架構師審查 S1×3 S2×3 S3×3 全修復 + ADR-064
CD Pipeline / build-and-deploy (push) Has been cancelled
S1 Critical:
- S1-1: asyncio 觸發移至 _call_with_fallback async 上下文,移除 sync 中的 get_event_loop()
- S1-2: _append_rule_to_yaml 加 textwrap.dedent() 正規化 LLM 輸出縮排
- S1-3: _matches() 對 alertname=["*"] 直接回傳 False,防意外命中
S2 Major:
- S2-1: auto_generate_rule() 改為 DI 參數注入 (ollama_url/model/gemini_api_key),移除 import settings
- S2-4: _generate_mock_response docstring 澄清為規則引擎生產路徑,非假數據
- S2-5: suggested_action .strip() 防空白字串繞過 or
S3 Minor:
- S3-2: priority 上界 min(next, 890)
- S3-3: alertname sanitize re.sub([{}]) 防 format KeyError
- S3-4: model_registry.py 最後修改時間戳更新
文件:
- ADR-064: Alert Rule Engine YAML 驅動 + AI 自動學習
- Skills 02: 告警規則引擎 DI 規範 + asyncio 禁止事項
- Skills 03: _generate_mock_response 語意澄清 + 規則引擎降級流程
- LOGBOOK: 本次 Session 完整記錄
2026-04-09 ogt: 首席架構師審查修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-09 10:52:40 +08:00 |
|
OG T
|
0af5c2e89c
|
docs(sprint5.1): LOGBOOK + ADR-062 + Skill 02 更新(首席架構師審查記錄)
- docs/LOGBOOK.md: 當前狀態更新至 L1-L5+審查完成,里程碑補充審查修正記錄
- docs/adr/ADR-062: 新增實施記錄章節(執行清單+審查問題+修正方式)
- .agents/skills/02-lewooogo-backend-core.md v2.5→v2.6:
加入 Sprint 5.1 Service Registry 模式
加入 Guardrail 保守原則(失敗 block 不放行)
加入新 Service 標準樣板(structlog/now_taipei/DI setter/try-except)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:38:31 +08:00 |
|
OG T
|
22de22c989
|
refactor(phase-s): Phase S 技術債清理 - 五項架構改善
S-01: generate_alert_fingerprint() 移至 alert_analyzer_service (Router→Service)
S-02: 移除廢棄 USE_NEW_ENGINE config (Phase R 已完成歷史使命)
S-03: github_webhook.py linter 清理 (Field unused + delivery_id noqa)
S-04: Pydantic v2 遷移 - approval/incident models (class Config → ConfigDict)
S-05: Skill 09 v1.1 更新 (USE_NEW_ENGINE 廢棄說明)
測試: 393 passed, 零失敗
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-01 13:12:02 +08:00 |
|
OG T
|
cd91560e0b
|
docs: Phase R-R2 完成文件更新 + ADR-046 型別統一
- ADR-024: 更新執行進度 (R1✅ R2✅ R3✅ R4待執行)
- ADR-046: 新增跨套件 Incident 型別統一治理 (待決策)
推薦 Option B: IncidentConverter 轉換層
- Skill 02: v2.5 記錄 Phase R-R2 + R-R2.1 + ADR-046
- LOGBOOK: 更新當前狀態
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-31 22:17:44 +08:00 |
|
OG T
|
b14d1110fd
|
docs(skill): Skill 02 v2.4 - Phase 22 首席架構師審查通過
E2E Health Check / e2e-health (push) Successful in 18s
更新變更紀錄: Mock違規修復+分層修復全部完成
2026-03-31 Claude Code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 19:01:36 +08:00 |
|
OG T
|
dd526684ab
|
feat(ai): Phase 22 OpenClaw + Nemotron 協作架構 (ADR-044)
E2E Health Check / e2e-health (push) Successful in 17s
統帥批准實作「仲裁-執行分工」架構:
- OpenClaw = 仲裁者 (Why + Risk Level)
- Nemotron = 執行者 (How + kubectl Command)
新增功能:
- config.py: ENABLE_NEMOTRON_COLLABORATION Feature Flag
- openclaw.py: generate_incident_proposal_with_tools()
- openclaw.py: _call_nemotron_tools() Nemotron 呼叫
- telegram_gateway.py: TelegramMessage Nemotron 欄位
- telegram_gateway.py: format_with_nemotron() 雙區塊格式
- decision_manager.py: 整合協作方法
- proposal_service.py: 整合協作方法
觸發條件:
- LOW 風險 → 僅 OpenClaw
- MEDIUM/HIGH/CRITICAL → OpenClaw + Nemotron 雙軌
首席架構師審查: 83/100 條件通過
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 18:52:53 +08:00 |
|
OG T
|
9aa76ecf75
|
docs: 更新 Skills + LOGBOOK (K0 完成 + OTEL gRPC/HTTP 區分)
E2E Health Check / e2e-health (push) Successful in 15s
Skills v2.4:
- OTEL gRPC (24317) vs HTTP (24318) 端點明確區分
- K8s API 用 gRPC,CI/CD 用 HTTP
LOGBOOK:
- K0.1/3/4/6/7 低風險項目完成
- #33 Sentry + OTEL 修復
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 12:23:10 +08:00 |
|
OG T
|
bcd33e854f
|
docs: ADR-042 前端效能優化模式 (DOM Bypass + Optimistic Updates)
E2E Health Check / e2e-health (push) Successful in 16s
新增 ADR-042:
- Pattern 1: DOM Bypass (繞過 React 渲染,100x 效能提升)
- Pattern 2: Optimistic Updates (0ms UI 延遲 + 失敗回滾)
- Pattern 3: SSE Incremental Updates (增量更新,減少 API 請求)
- Pattern 4: AbortController (防止記憶體洩漏)
更新 Skills 01:
- v1.6 版本更新
- 新增效能優化模式章節
- 參考 ADR-042
首席架構師審查: 96-98/100 OUTSTANDING
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 11:36:21 +08:00 |
|
OG T
|
f25e94e8c4
|
fix(web): #17 i18n Hydration 防護 (NEXT_LOCALE Cookie)
Phase D #17: 修復 i18n 語系切換 Hydration 當機
問題: Client/Server 渲染語系落差導致 Hydration Mismatch
解法: Middleware 強制綁定 NEXT_LOCALE Cookie
實作內容:
- 從 URL 路徑提取當前語系
- 強制設定 NEXT_LOCALE cookie (1年 TTL)
- 確保 Server/Client 語系一致
@see QA Report 3.1 節
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 11:18:53 +08:00 |
|
OG T
|
b31e079e41
|
docs: 更新 LOGBOOK - Phase A/B/C P1 完成 (97/100)
CD Pipeline / build-and-deploy (push) Successful in 3m42s
E2E Health Check / e2e-health (push) Has been cancelled
- LOGBOOK: Phase A/B/C 首席架構師審查 OUTSTANDING
- Skills: DevOps Commander 更新
- ADR-033: K3s HA 架構補充
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-31 11:03:10 +08:00 |
|
OG T
|
5a3f539fe5
|
docs: 全面更新 Memory/Skills/LOGBOOK
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-03-30 ogt: 首席架構師審查 94/100 後狀態同步
更新項目:
- project_current_status.md: 今日完成總覽
- LOGBOOK.md: sudoers NOPASSWD 修復
- feedback_ai_fallback_order.md: NVIDIA 優先順序
- feedback_cd_security_nopasswd.md: 新增安全鐵律
- MEMORY.md: 新增索引
- 02-lewooogo-backend-core.md v2.3: AI Fallback 章節
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-30 01:42:08 +08:00 |
|
OG T
|
bf3a21d88e
|
docs: 首席架構師審查 94/100 OUTSTANDING
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- Skills v2.2: 新增 Phase 19.4 API 整合模式
- ADR-030: 補充 §5.3 Playbook 自動狀態轉換閾值
- LOGBOOK: 更新審查結果
審查範圍: 18 commits (Phase 19.4 + ADR-039 + AI 仲裁)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-30 01:38:41 +08:00 |
|
OG T
|
4f06115497
|
docs: 首席架構師審查 - 前端內網 IP 禁令 (90/100)
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
審查結果:
- P0 安全修復: sudo 密碼改用 secret ✅
- P1 識別: Sentry DSN build-arg 待處理
- P2 識別: 3 項次要問題已記錄
已更新:
- Skills 01 v1.5: 前端建置禁止內網 IP
- Skills 04 v2.1: CD 安全規範 + 內網 IP 禁令
- ADR-022: 新增前端內網 IP 禁令章節
- MEMORY.md: 新增審查記錄索引
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-30 01:32:48 +08:00 |
|
OG T
|
f87c30b1c7
|
docs(skills): 新增 ADR-038/039 OpenClaw 安全網章節
Wave 1 部署完成後更新 Skill 02:
- Circuit Breaker 雙層保護模式 (Layer 1 斷路 + Layer 2 限流)
- 全域修復冷卻機制 (15min/5次 → 凍結)
- StatefulSet 黑名單 (postgres/redis/clickhouse 禁止自動修復)
- Worker XCLAIM 孤兒訊息回收配置
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 16:12:47 +08:00 |
|
OG T
|
3bfb9c51f5
|
chore: Skills + CLAUDE.md + Playwright 配置更新
- SRE-QA Skills 擴充
- CLAUDE.md 指引更新
- playwright.config.ts 優化
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 16:04:43 +08:00 |
|
OG T
|
50c055b547
|
feat(api): Phase D-G P0 修正 - Learning Repository 積木化
新增:
- ILearningRepository Protocol (interfaces.py)
- LearningRepository (Redis 持久化層)
- Learning API 端點 (/api/v1/learning/*)
- LearningService.get_recommended_fix() 方法
- LearningService.get_learning_summary() 方法
修正:
- Service 不直接依賴 Redis Client (透過 Repository)
- 符合 leWOOOgo 積木化原則
- 首席架構師審查: 74/100 → 92/100
更新:
- ADR-030: 新增 Phase D-G P0 修正章節
- Skill 02: v1.9 → v2.0
- Runner 修復: 序列建構解決 _runner_file_commands 衝突
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 11:03:51 +08:00 |
|
OG T
|
4f7282a97a
|
fix(ai): Phase 20 P2 修復 - Protocol + 邊界測試 + model_registry
P2-1: 定義 INvidiaProvider Protocol (@runtime_checkable)
P2-2: 補充邊界測試 15 → 25 案例
P2-3: model_registry 新增 NVIDIA + tool_calling_fallback_order
首席架構師評分: 82 → 86 → 90/100
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 01:24:17 +08:00 |
|
OG T
|
ee2bceefff
|
feat(monitoring): Phase 19.6 測試文檔 + P1-P3 改進 + 首席架構師審查
Phase 19.6 測試文檔收尾:
- E2E 測試擴充至 18 項 (Terminal/GenUI 驗證)
- 新增 PHASE19-VERIFICATION-CHECKLIST.md (完整驗證清單)
P1 驗證:
- ArgoCD Metrics NodePort 監控 (30883/30884)
- TLS 證書監控 (Blackbox Exporter 9115)
P2 改進:
- waitForTimeout → waitForLoadState('networkidle')
- 跨平台快捷鍵 (Meta+J / Control+J)
- SKIP_MULTISIG_TESTS 環境變數控制
- Prometheus GitOps 部署腳本
P3 改進:
- HPA maxReplicas 4 → 6 (API/Web)
首席架構師審查: 47/50 OUTSTANDING (94%)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 01:19:26 +08:00 |
|
OG T
|
6a38c0c968
|
fix(cd): ADR-035 Telegram Secrets 自動注入三層防護
🔴 事故根因: K8s Secrets 未注入,Telegram 告警長時間失效
- kustomization.yaml 說「由 CI/CD 處理」但 CD 從未執行
🛡️ 三層防護機制:
- Layer 1: Pre-flight 檢查 GitHub Secrets 存在
- Layer 2: Deploy 時 kubectl patch secret 自動注入
- Layer 3: Post-Deploy E2E 測試告警驗證
📄 文件更新:
- ADR-035: docs/adr/ADR-035-telegram-alert-chain-enforcement.md
- DevOps Skill v1.9: 新增 Secrets 注入鐵律
- CLAUDE.md: 新增告警鏈路章節
- LOGBOOK: 事故記錄
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-28 21:47:49 +08:00 |
|
OG T
|
9fa996c9fe
|
fix(cicd): 修正 OTEL 端點配置 192.168.0.121→188
問題: CI/CD workflows 指向錯誤的 OTEL 端點
- ci.yaml: 121:4318 → 188:24318
- cd.yaml: 121:4318 → 188:24318
SignOz 實際運行在 192.168.0.188 (AI+Web 中心)
更新:
- Skill 04 v1.8 加入可觀測性端點規範
- LOGBOOK 記錄配置修正
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-28 18:47:23 +08:00 |
|
OG T
|
7b9b0c490b
|
feat(phase19): Omni-Terminal 100% 完成 + 首席架構師審查 47/50
## Phase 19 Omni-Terminal (Wave 0-6 全部完成)
### 核心功能
- SSE 狀態機 (7-State 設計,10/10 分)
- GenUI 動態渲染 (6 張卡片 + Zod Schema 驗證)
- 核鑰 UX (長按授權 + 風險分級)
- Terminal Telemetry (Sentry 整合)
### P0-P2 修復
- P0: Singleton → FastAPI Depends 依賴注入
- P1: Zod Schema 升級 (7 個驗證 Schema)
- P1: 錯誤分類碼聚合 (Sentry fingerprint)
- P2: Slow Query 監控 (5s 警告 / 10s 嚴重)
### 測試
- test_terminal_service.py: 54 項測試全通過
- 意圖分類: 42 個測試案例 (9 種 IntentType)
### 文檔
- ADR-031: SSE 架構實作紀錄
- ADR-032: GenUI 渲染實作紀錄
- Skills: v1.9 (後端 Terminal 章節)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-28 18:04:12 +08:00 |
|
OG T
|
3e5315aaf8
|
docs(k3s): 首席架構師審查完成 46/50 (92%)
K3s 優化工作審查完成:
- ADR-033: Phase K0 + K-NET 標記為已完成
- 09-pdb.yaml: Worker PDB 設計說明註釋
- DevOps Skill: 新增 keepalived 快速操作參考
審查結果:
- 架構合規性: 9/10
- Runbook 完整性: 10/10 ⭐
- 模組化合規: 9/10
- 風險控制: 9/10
- 文檔完整性: 9/10
P2 問題已修復,無 P0/P1 阻擋項
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-28 18:00:07 +08:00 |
|
OG T
|
abc21c735e
|
feat(api): P1 Telegram 按鈕優化 - 稍後/靜默
新增按鈕:
- ⏰ 稍後 (snooze): 延遲 30 分鐘後再提醒
- 🔕 靜默 1h (silence): 同類資源告警靜默 1 小時
實作細節:
- telegram_gateway.py: 新增 _handle_snooze/_handle_silence
- decision_manager.py: 發送前檢查 silence 狀態
- Redis Key: telegram_snooze:{approval_id}, telegram_silence:{resource_name}
- Skill 03 v1.5 → v1.6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-27 09:50:28 +08:00 |
|
OG T
|
4ee5376bd1
|
docs: 告警機制優化計畫 + ADR-030 Phase 6 + Skill 03 v1.5
- LOGBOOK: 新增告警機制完整審查記錄
- ADR-030: 新增 Phase 6 非同步分析優化章節
- Skill 03: v1.5 Stream Key 統一 + Telegram 去重
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-27 09:42:53 +08:00 |
|
OG T
|
309a019cc3
|
docs: 記錄 Telegram 告警轟炸事故修復
更新:
- ADR-027: 新增緊急事故修復章節
- LOGBOOK: 記錄 2026-03-26 事故時間線
- Skill 02 v1.6: 新增 Telegram 去重機制章節
根因: Phase 6.5 修改 + INC- 前綴重複
修復: Redis 去重 (10 分鐘) + 前綴檢查
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 20:13:07 +08:00 |
|
OG T
|
a9f8ad56c1
|
chore: 未提交變更整理 (API core + docs + scripts)
API 核心:
- constants.py: 系統常量定義
- unit_of_work.py: Unit of Work 模式
- incident_approval_service.py: Incident-Approval 同步服務
文檔更新:
- LOGBOOK.md: 進度更新
- AWOOOI_AGENTIC_WORKSPACE_ROADMAP.md: 路線圖
- 2026-03-26_llm_testing_evaluation.md: LLM 測試評估
- phase5_telemetry_architecture.md: 遙測架構
- SECRETS_REFERENCE.md: 密鑰參考
配置/腳本:
- Skill 02 v1.x: leWOOOgo 後端更新
- .dependency-cruiser.cjs: 依賴規則
- demo-multisig-flow.sh: 演示腳本
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 19:10:12 +08:00 |
|
OG T
|
30145c7d7e
|
docs: ADR-025 CI/CD AI 整合架構 + Skill 07 更新
- ADR-025: 文檔化 Phase 13.1 CI/CD AI 整合架構決策
- GitHub Webhook 事件驅動流程
- 風險分級執行決策 (AUTO/TELEGRAM/APPROVAL/BLOCKED)
- SignOz Log 整合
- Skill 07 v1.3: 新增 Grafana MCP + SignOz query_logs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 15:41:26 +08:00 |
|
OG T
|
14c81f728f
|
docs: 新增 ADR-025 告警鏈路 E2E 驗證 + 更新 Skills
新增:
- ADR-025: 告警鏈路 E2E 驗證架構 (2026-03-26 事故教訓)
更新:
- ADR-011: 新增 DNS 規則最佳實踐 (附錄 B)
- Skill 04: 新增 NetworkPolicy DNS 規則 + CoreDNS 設定
- Skill 05: 新增告警鏈路 Smoke Test 要求
- CLAUDE.md: 新增告警鏈路驗證到任務前必讀
事故根因:
1. URL 路徑錯誤 (webhook vs webhooks)
2. NetworkPolicy DNS 規則標籤不匹配
3. CoreDNS 上游 DNS 依賴 systemd-resolved
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 15:34:12 +08:00 |
|
OG T
|
579da38b8b
|
feat(api): Phase 13 智能路由 + CI/CD 整合 (#74-88)
Phase 13.1 CI/CD Integration:
- #76 workflow_run handler for CI failure diagnosis
- #77 SignOz log query (query_logs, error_logs_summary MCP)
- #78 CIAutoRepairService with risk-based execution decisions
Phase 13.3 Smart Routing:
- #85 Intent Classifier v2.0 (rule engine + LLM fallback)
- #86 Complexity Scorer (9-dimension scoring)
- #87 AI Router v3.0 (routing decision matrix)
- #88 Token Counter (OTEL + Langfuse integration)
New files:
- services/ci_auto_repair.py (risk stratification)
- services/model_registry.py (centralized model config)
- services/token_counter.py (677 lines)
- Skill 08: Model Router Expert
- Skill 09: Strangler Pattern Expert
- ADR-023: Smart Routing Architecture
- ADR-024: API Layer Architecture
Tests:
- phase11-conversational.spec.ts (E2E tests)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 15:32:52 +08:00 |
|
OG T
|
96c3ddd8c4
|
feat(api): Phase 18.1 K8s 資源名稱驗證 (ADR-016)
三層防禦架構確保 kubectl 指令有效:
1. Webhook 入口正規化 (webhooks.py)
2. OpenClaw 產生指令前驗證 (openclaw.py)
3. 靜態映射表 + 模糊匹配 (k8s_naming.py, resource_resolver.py)
新增:
- src/utils/k8s_naming.py: RFC 1123 正規化 + 靜態映射
- src/services/resource_resolver.py: MCP K8s Tool 動態驗證
- docs/adr/ADR-016-k8s-resource-naming.md: 契約文檔
- scripts/e2e_tool_call_verification.py: E2E 驗證腳本 v2.0
修改:
- webhooks.py: Phase 18.1.7 入口正規化
- openclaw.py: Phase 18.1.6 產生指令前驗證
- Skill 03 v1.4: 新增 K8s 資源驗證章節
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 11:22:47 +08:00 |
|
OG T
|
604e38cf07
|
docs: Phase 14 紅區治理 + Skills 01/03 更新
- CLAUDE.md: 紅區治理章節
- Skills 01/03: 版本更新
- ADR/Architecture: 標準化
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 09:55:47 +08:00 |
|