Commit Graph

699 Commits

Author SHA1 Message Date
OG T
e93a50a4b4 feat(pages): 全部 ComingSoon 頁面升級為真實 UI — 串接真實 API / 空狀態頁面
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m47s
- services/topology: 串接 /api/v1/dashboard,顯示服務清單表格與主機拓撲卡片 grid
- notifications: 串接 /api/v1/notifications/channels,404 時顯示空列表
- reports: 串接 /api/v1/stats/incident-summary + /api/v1/stats/resolution-stats,顯示統計卡片
- apm: 乾淨空狀態頁(SignOz 待整合)
- apps/tickets/users/deployments: 空列表表格結構
- billing/compliance/cost/security: 空狀態卡片結構
- help: 靜態系統版本資訊頁
- zh-TW.json + en.json: 新增所有頁面 i18n key(零 hardcode 字串)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:49:24 +08:00
OG T
6266a4fc01 fix(test): 更新 AIProviderEnum 測試 — NVIDIA → NEMOTRON (Phase 24 B3)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- test_nvidia_provider_in_router: 改為驗證 NEMOTRON enum
- test_tool_calling_route: 改為期望 NEMOTRON provider
- test_existing_routing_not_affected: 排除 NEMOTRON (非一般路由)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:39:46 +08:00
OG T
e9a1ac6276 fix(ui): 對齊 figma-v2 設計稿 — IncidentCard + OpenClawPanel 視覺修正
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 35s
IncidentCard:
- 背景 #fff、圓角 12px、頂邊條 4px(對齊設計稿)
- P1 嚴重度色修正為 #F59E0B(amber,非 orange)
- Severity badge 改為 4px 圓角 uppercase 樣式
- Impact 指標列移除灰底方塊,改為細邊框分隔線
- AI 提案按鈕改為全寬居中橙色風格

OpenClawPanel:
- 移除多餘 rounded-xl/backdrop/border(由父層卡片容器提供)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:36:59 +08:00
OG T
97d86861ed fix(ai_router): C1 修復 — AIProviderEnum 對齊 Registry 實際 Provider 名稱
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 37s
問題: AIProviderEnum.NVIDIA = "nvidia" 在 Registry 無對應 Provider
      OpenClawNemoProvider.name = "openclaw_nemo"
      NemotronProvider.name = "nemotron"
      → 高複雜度/Tool Calling 路由永遠 skip,靜默 fallback 到 Gemini/Ollama

修復:
- 新增 OPENCLAW_NEMO = "openclaw_nemo" (一般推理, via .188 → NVIDIA NIM)
- 新增 NEMOTRON = "nemotron" (Tool Calling, direct NVIDIA NIM)
- 移除 NVIDIA = "nvidia" (Registry 無對應)
- 規則 4 (複雜度>=4/HIGH風險): NVIDIA → OPENCLAW_NEMO
- route_tool_calling: NVIDIA → NEMOTRON
- Rate Limiter check: "nvidia" → "openclaw_nemo"
- _full_fallback_chain: OPENCLAW_NEMO 首位
- _tool_calling_fallback_chain: NEMOTRON 首位

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:31:31 +08:00
OG T
a3f02888a1 feat(ui): 加入 chibi 龍蝦游泳列 + 主頁卡片式佈局對齊設計稿
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- Metrics Strip 頂部加入龍蝦游泳動畫列
- 主體 Feed 和 Right Panel 改為圓角卡片式(背景白/陰影)
- Section header 加入橘點裝飾,對齊 figma-v2 設計稿
- 所有資料串接真實 API,無假資料

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:31:01 +08:00
OG T
ef5b1ab85a fix(knowledge-base): 串接 NEXT_PUBLIC_API_URL 取代相對路徑
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- /api/v1/knowledge 改用 process.env.NEXT_PUBLIC_API_URL 前綴
- 確保 Docker build 後能正確連到後端 API,不再打到 Next.js app server

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:19:14 +08:00
OG T
2d87eca5f6 fix(ci): 移除 e2e-health push 觸發 — 根治「每 commit 兩個 run」問題
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
根本原因:
  cd.yaml + e2e-health.yaml 都監聽 push main
  → 每次 push 產生兩個 run,互相 cancel,code commit 被跳過

解法:
  e2e-health.yaml 移除 push trigger,只保留排程(每日00:00)和手動觸發
  CD 本身已有 smoke test,E2E 不需要每次 push 重複跑

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-04-02 23:17:31 +08:00
OG T
cde61b06ae fix(ci): CD 改搶佔模式 — cancel-in-progress: true
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Successful in 17s
問題: 多個 commit 快速推版時排隊堆積;docker build 卡住阻塞整條 queue
根因: cancel-in-progress:false 讓每個 commit 都排隊等,新的無法取消舊的
修復: cancel-in-progress:true — 新 push 立即取消舊 build,只部署最新 commit
安全: concurrency group 保證同時只有一個 job 跑,kubectl rollout status 防半部署

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:16:24 +08:00
OG T
1e1d7e34cd fix(ci): 加入 timeout-minutes:45 防止 CD job 無限卡住
Some checks are pending
CD Pipeline / build-and-deploy (push) Waiting to run
E2E Health Check / e2e-health (push) Successful in 18s
問題: task 288 卡住 71 分鐘 (docker build/push Harbor 網路問題)
影響: 後續 task 排隊無法執行
修復: job 超過 45 分鐘自動 fail,下次 push 重新觸發

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:15:05 +08:00
OG T
58002e6bf4 feat(phase24-b3): NemotronProvider 抽取 + incident-card 重構
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Phase 24 B3:
- 新增 ai_providers/nemotron.py: NemotronProvider 封裝 K8s Tool Calling
  搬移自 openclaw.py _call_nemotron_tools (L1623-1785)
  capabilities=tool_calling, privacy_level=cloud
- ai_router.py: 加入 NemotronProvider 到 Registry
- ai_providers/__init__.py: 匯出 NemotronProvider

Phase R-UI2 (架構師 Warning):
- incident-card.tsx: 抽取 useApprovalAction hook
  handleApprove/handleReject 60行重複邏輯 → 共用 hook
  行為完全不變,維護性提升

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:12:42 +08:00
OG T
5a8aae89c4 fix(phase24): 首席架構師 Review C1/C2/C3/I4 修復
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m12s
E2E Health Check / e2e-health (push) Successful in 18s
C1 (P0): AIRouterExecutor.execute() 補 Langfuse Trace (D5)
  - 建立 langfuse_trace("ai_router_execute") 包住整個執行鏈
  - 成功時記錄 generation (model/input/output/tokens/cost)
  - prod 所有 AI 呼叫現在有 LLMOps 追蹤

C2 (P0): 絞殺者改為呼叫 AIRouter.route() 智慧路由
  - 先取得 RoutingDecision (意圖分類 + 複雜度評分)
  - provider_order 從 selected_provider + fallback_chain 動態生成
  - D1 意圖路由矩陣、D7 隱私保護 (DIAGNOSE 強制 local) 生效

C3 (P1): 型別標注 typo 修復
  - AIProviderEnumEnum → AIProviderEnum
  - AIProviderEnumProtocol → AIProviderProtocol

I4 (P1): interfaces.py AIProvider Protocol 補 close() 定義

S1: ai_router.py 模組版本標頭更新至 v4.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:47:06 +08:00
OG T
9d00b0389e fix(ci): CD path filter — 只有 apps/k8s/workflows 變更才觸發部署
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
問題: docs/memory/ADR commit 也觸發 CD,擠掉 code commit 的 run
      導致線上版本 (28bd06d) 落後 main (2d5f1a7) 6個 commit

解法: push paths filter,排除不影響部署的路徑
     workflow_dispatch 手動觸發永遠可用

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-04-02 21:43:27 +08:00
OG T
2d5f1a71ad chore(observability): ClickHouse TTL 設定完成 — Phase O 全驗收
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
signoz_logs: 30天 (已內建 _retention_days DEFAULT 30)
signoz_metrics 8個表: 233280000s(2700天) → 7776000s(90天)
  - samples_v4, samples_v4_agg_5m, samples_v4_agg_30m
  - exp_hist, time_series_v4, time_series_v4_6hrs
  - time_series_v4_1day, time_series_v4_1week

Phase O 驗收清單全部打勾 

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-04-02 21:38:39 +08:00
OG T
ba4ee46514 fix(ui): 架構師 Review 修復 — i18n/keyframe/型別/版面
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Critical:
- flow-pipeline.tsx: 移除 4 個重複 lobster-bob keyframe,統一在父元件注入
  修正 isResolved 路由邏輯,保留嚴重度視覺識別 (P0 resolved 仍用 StyleA)
- incident-card.tsx: 修復 4 個硬編碼中文字串 (affectedServices/signalCount/statusLabel/aiProposal)
  新增對應 i18n key 到 zh-TW.json + en.json

Warning:
- page.tsx: MetricItem type 提升至 module scope,pendingApprovals null 安全檢查
  Metrics Strip 移除固定 height:68px 改為 auto + padding:8px

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:36:51 +08:00
OG T
08f73dfce8 docs: Phase O-5 Wave 5.4 告警鏈路 E2E 驗證 Runbook
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 架構圖、手動測試步驟、smoke test 清單
- generate_monitoring.py 用法說明
- 已知問題豁免清單、回滾指令
- 首次驗收記錄 2026-04-02 8/8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:34:43 +08:00
OG T
234f7febd0 feat(ci): Phase O-5 Wave C.2 加入 monitoring coverage check step
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- cd.yaml 新增 Monitoring Coverage Check step (generate_monitoring.py --check)
- continue-on-error: true — 不阻塞部署
- Telegram 通知加入 📊 Monitoring 覆蓋率狀態

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:33:59 +08:00
OG T
827923b9b9 feat(monitoring): Phase O-5 Wave C.1 generate_monitoring.py 自動發現
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- 查詢 Prometheus targets API 取得全量 scrape 狀態
- 10 個預期服務覆蓋率計算 (門檻 70%)
- 已知 DOWN targets 豁免清單 (不影響健康判斷)
- --json 機器可讀輸出 / --check CI 模式 (exit 1 if coverage < threshold)
- 首次執行: 100% 覆蓋率,無真實問題

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:33:28 +08:00
OG T
28bd06d7b3 feat(homepage): Metrics Strip 7指標視覺強化 + 真實資料串接
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- 新增 podHealth/allRunning i18n key (zh-TW + en)
- Metrics Strip: 6個指標全部串接真實 API
  - 活躍事件: incidents count + P0 badge
  - 服務健康: dashboard services healthy/total + RPS sparkline
  - 待簽核: dashboard pendingApprovals + 橘色 badge
  - 自動處置率: incidents resolved rate + error rate sparkline
  - MTTR 均值: incidents resolved avg duration
  - POD 健康: dashboard services up/total + 顏色狀態
- Right panel 固定 530px 寬度 (55/45 比例)
- 禁止假數據: 無 API 資料時顯示 "--"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:27:59 +08:00
OG T
48c65756da chore(config): USE_AI_ROUTER=true 寫入 ConfigMap (Phase 24 B2)
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
防止下次 CD deploy 覆蓋 kubectl set env 的設定。
B2 觀察期 48h, 截止 2026-04-04 18:40 台北時間。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:26:53 +08:00
OG T
3f339110dd fix(observability): 同步 .188 實際部署調整至 repo
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
與原始計畫的差異:

1. MinIO Bearer Token 認證
   - 原計畫: MINIO_PROMETHEUS_AUTH_TYPE=public (此版本不支援)
   - 實際: mc admin prometheus generate 產生 Bearer Token
   - 更新: prometheus-config-phase-o.yaml 加入 bearer_token

2. remote_write 廢棄 → OTEL Collector Prometheus scrape
   - 原計畫: Prometheus remote_write → SigNoz OTEL /api/v1/write
   - 實際: SigNoz OTEL Collector 不支援 Prometheus remote_write 格式 (404)
   - 改用: OTEL Collector prometheus receiver 直接 scrape node-exporter + kube-state-metrics
   - 新增: ops/signoz/otel-collector-config-phase-o.yaml (版本控管副本)

3. ADR-053 驗收清單更新為實際結果

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-04-02 21:23:47 +08:00
OG T
93e3aa6811 feat(ui): 四種嚴重度管線動畫 + WoooClaw 命名更新
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
- flow-pipeline.tsx: 新增 severity prop,四種管線樣式
  - P0 → Style A: 脈衝光波 + 流動光效 (#cc2200)
  - P1 → Style B: 進度條,龍蝦站在進度端點 (#F59E0B)
  - P2 → Style C: 卡片步驟,龍蝦浮在 active 卡片上方 (#4A90D9)
  - P3 → Style D: 時間軸,虛線流動動畫 (#22C55E)
- incident-card.tsx: FlowPipeline 傳入 severity={sev}
- openclaw-panel.tsx: NemoClaw→WoooClaw, OpenClaw Pipeline→WoooClaw Pipeline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:18:22 +08:00
OG T
04978995c1 fix(metrics): 實際呼叫 record_alert_chain_success (Wave A.5)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m47s
E2E Health Check / e2e-health (push) Successful in 17s
alert_chain_last_success_timestamp 指標已定義但從未被 set。
在 alertmanager_webhook 兩個主要成功路徑呼叫 record_alert_chain_success():
- CI/CD 告警成功處理後
- LLM 分析完成後

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 20:10:58 +08:00
OG T
f5b8738185 fix(wave-a): Wave A 告警鏈路驗收修復
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
- sentry_webhook: 加入 GET /health endpoint (smoke test 探測用)
- smoke_test: alertmanager 路徑改為 /webhooks/health (已存在)
- smoke_test: Prometheus URL 改為正確的 110:9090
- smoke_test: Alert chain metric 標記 critical=False (初始化期正常)

Wave A.6 smoke test 現在 6/8 → 7/8 checks pass (sentry health deploy 後 8/8)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 20:08:26 +08:00
OG T
5a7919f55c fix(test): AIProvider → AIProviderEnum (Phase 24 C1 rename fix)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m11s
E2E Health Check / e2e-health (push) Successful in 16s
C1 修復 (3ad7b60) 重命名 AIProvider Enum 為 AIProviderEnum
test_nvidia_provider.py 未同步更新,導致 CD 測試失敗。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:38:04 +08:00
OG T
9afb518ea6 fix(ui): 修復事件卡片溢出框 + 基礎架構資料欄位錯誤對應
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 49s
E2E Health Check / e2e-health (push) Successful in 21s
- incident-card: AI提案按鈕 width 100% + margin 造成右側懸浮框,改為 calc(100%-20px)
- page.tsx: useHosts() 返回 Host[] 但直接傳入 HostGrid 期望的 HostInfo[],
  補上 mapper (name→hostname, metrics.cpu_percent→cpuPct, service.status→healthy)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:01:07 +08:00
OG T
9c01ed85a9 chore: trigger CD rebuild for Phase 24 (3e4612f not yet built)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 35s
E2E Health Check / e2e-health (push) Successful in 18s
2026-04-02 18:32:39 +08:00
OG T
3e4612f259 docs(observability): ADR-053 SigNoz 統一架構 + Phase O 驗收
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 36s
E2E Health Check / e2e-health (push) Successful in 16s
- 新增 ADR-053: 可觀測性統一架構決策記錄
- 更新 service-registry.yaml: 補齊 MinIO/Kali 監控入口
- 更新 LOGBOOK: Phase O 完成狀態

Phase O 驗收清單:
 kubectl Mac 本機免密碼
 OTEL Collector 2 Pod Running
 Event Exporter 1 Pod Running
 Descheduler CronJob Completed
 MinIO + Kali 告警規則
 Alert Chain Smoke Test
 CD Pipeline 整合
 ClickHouse TTL / remote_write / SigNoz rules (待 .188 手動)

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-04-02 18:26:57 +08:00
OG T
d2b337430a feat(cd): Phase O-4 Wave A 收尾 — Sentry Token 注入 + Alert Chain Smoke Test
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 35s
E2E Health Check / e2e-health (push) Successful in 17s
Wave A.1: SENTRY_AUTH_TOKEN CD 自動注入 K8s Secret
  - 每次部署自動 kubectl patch (遵循 ADR-035 鐵律)
  - Token 缺失時 warn 不 fail (降級保護)

Wave A.6 + B.2: Alert Chain Smoke Test
  - scripts/alert_chain_smoke_test.py (新建)
  - 檢查: API Health / Alert Chain Metric / 3 Webhook /
          SigNoz / OTEL Collector / Event Exporter
  - 整合進 cd.yaml (Alert Chain Smoke Test 步驟)
  - continue-on-error: true (不阻塞部署,結果顯示在 TG)
  - TG 部署通知新增 Alert Chain 狀態欄

Wave A.2/A.3/A.4: SignOz/Sentry 程式碼已在 2026-03-29 實作完成
  - signoz_webhook.py / sentry_webhook.py 均已部署
  - 待手動部署 SignOz 告警規則到 .188

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:22:13 +08:00
OG T
99be215e83 fix(monitoring): R1 Review 修正 — Blackbox DNS/PSA label/告警閾值
Critical: Blackbox Exporter replacement 從 K8s DNS 改為主機 IP (192.168.0.188:9115)
Important: Descheduler namespace 顯式宣告 PSA restricted labels
Suggestion: failedJobsHistoryLimit 3→1, 新增 MinioDiskUsageCritical 5% 告警

R1 Review by: 首席架構師 (Phase O-1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:02:50 +08:00
OG T
41bf0681cf feat(observability): Phase O-2/O-3 OTEL Log管線 + Event Exporter + Remote Write
O-2.1: OTEL Collector DaemonSet (filelog receiver)
  - 收集所有 K3s 節點 Pod stdout/stderr → SigNoz ClickHouse
  - CRI log parser (Go time layout for +08:00 timezone)
  - filter processor 排除 kube-system debug noise
  - observability namespace PSA privileged (log 目錄需 root)
  - 資源限制: 50m-200m CPU / 64-128Mi Memory

O-2.2: kubernetes-event-exporter
  - K8s Event → 結構化 JSON Log → SigNoz
  - Warning/Error 全量保留, Normal 過濾高頻事件
  - 解決: Event 預設僅保留 ~1hr 的致命盲區

O-3: Prometheus remote_write 配置模板
  - 白名單: ~50 關鍵 metric series (node/container/kube/api/db)
  - 目標: 90 天長期儲存於 SigNoz ClickHouse

已部署驗證: 3 Pod Running, 0 error, filelog 正常監控所有 namespace

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:01:42 +08:00
OG T
1dd0ff8cf4 fix(cd): runs-on 改回 ubuntu-latest (Gitea runner label 不支援 self-hosted)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 43s
E2E Health Check / e2e-health (push) Successful in 19s
根因: Gitea act_runner 只有 ubuntu-latest/24.04/22.04 labels
     改為 self-hosted 後 runner 無法匹配 → CD 靜默失敗
     所有 Phase 24 代碼都沒部署到 K8s

Gitea ≠ GitHub: GitHub 有內建 self-hosted label
                Gitea 需要明確匹配 runner 註冊的 label

2026-04-02 ogt: CD 失敗根因修復

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:59:58 +08:00
OG T
1ec342db0c fix(web): 首席架構師審查修復 (82/100 → Pass)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
- 字體遷移遺漏: host-grid (2處), sidebar (1處) → var(--font-body)
- time-series-chart tick → var(--font-mono) (圖表軸標籤保留等寬意圖)
- i18n key 重複: 移除 incident.anomaly, 保留 incident.card.anomaly
- 全站 inline fontFamily: 'monospace' 歸零

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:56:43 +08:00
OG T
f0f9cc87a1 fix(web): monitoring 頁 QA 修復 — NAN% + HostGrid + i18n
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- HostGrid 接 useHosts() SSE 數據(不再傳空陣列)
- HealthSummary NAN% 修復:total_count=0 時顯示 0% 而非 NaN%
- 8 處硬編碼中文改 i18n (正常/警告/異常/黃金指標/主機狀態/服務清單/表頭)
- 新增 monitoring namespace i18n keys (11 keys × 2 langs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:55:29 +08:00
OG T
6ce82ff883 fix(k3s): Phase O-1 基礎設施修復 — Descheduler + MinIO/Kali 監控
O-1.1: Descheduler securityContext 修復 (PodSecurity restricted 合規)
  - 新增 pod securityContext (runAsNonRoot, runAsUser:65534, seccompProfile)
  - 新增 container securityContext (allowPrivilegeEscalation:false, drop ALL)
  - 補齊 RBAC: namespaces + replicasets list 權限
  - 已部署驗證: CronJob 成功執行 (Status: Completed)

O-1.3: MinIO Prometheus scrape 配置 + 告警規則
O-1.4: Kali Blackbox TCP probe + 告警規則
  - MinioDown, MinioDiskUsageHigh, MinioOfflineDisk
  - KaliScannerDown

待手動部署: Prometheus config → .188, kubectl kubeconfig → 120/121

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:55:26 +08:00
OG T
95343de782 chore: trigger CD (Phase 24 Review 修復已 push)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-02 13:52:23 +08:00
OG T
51961b9f03 docs: Phase O 可觀測性終極補完計畫設計規格
SigNoz 統一派架構,解決 6 大盲區 (Event/Log/Metrics/Descheduler/kubectl/MinIO-Kali)
+ Monitoring Master Plan Wave A-D 收尾
+ 5 個首席架構師 Review 節點

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:45:23 +08:00
OG T
3ad7b60f68 fix(ai): Phase 24 R1+R2 首席架構師 Review 修復 (C1-C3 + I1-I5)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Critical 修復:
- C1: AIProvider Enum 改名為 AIProviderEnum (避免與 Protocol 同名衝突)
- C2: 共用 Circuit Breaker → per-provider _SimpleCircuitBreaker
  (避免 Gemini 掛掉時 Ollama 也被擋)
- C3: cache_key 移到 try 外面 (避免 UnboundLocalError)

Important 修復:
- I1: Claude hardcode model → 用 get_model_registry()
- I2: Claude 追蹤 tokens/cost (input_tokens + output_tokens)
- I3: Ollama 追蹤 tokens (eval_count + prompt_eval_count)
- I4: Gemini temperature → 用 model_registry
- I5: AIProviderRegistry.close_all() shutdown hook

2026-04-02 ogt: Phase 24 首席架構師審查通過後修復

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:40:58 +08:00
OG T
1f174e1268 fix(web): 首頁全面 QA 修復 — hosts 數據 + incident 標題 + i18n + 字體
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- HostGrid 接 useHosts() SSE 數據(不再傳空陣列)
- IncidentCard 標題從 description?? '--' 改為 decision.action ?? services + 異常
- 6 處硬編碼中文改 i18n (活躍事件/載入中/系統穩定/OpenClaw認知引擎/基礎架構)
- fontFamily: Inter/monospace → var(--font-body) 全部替換
- 新增 dashboard.openclawEngine / infrastructure i18n keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:33:48 +08:00
OG T
1628f659e3 fix(web): tDashboard is not defined — 補上 useTranslations('dashboard')
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
ReferenceError 導致 web pod crash loop。
page.tsx 用了 tDashboard() 但沒宣告。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:17:32 +08:00
OG T
73e8f8ab77 feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
  - OllamaProvider (local, rca/chat/code_review)
  - GeminiProvider (cloud, rca/chat)
  - ClaudeProvider (cloud, rca/chat/code_review)
  - OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
  - AIProviderRegistry (動態註冊/啟停)
  - AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範

安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false

2026-04-02 ogt: Phase 24 首批實作

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:16:09 +08:00
OG T
1123eb4107 feat(web): Metrics Strip 自動處置率 + MTTR 真實計算
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- autoRemediationRate: resolved+closed / total incidents
- mttrAvg: 平均 (updated_at - created_at) 分鐘/小時
- 替換原本的 '--' 靜態值

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:03:20 +08:00
OG T
05cd9cbab4 fix(web): 驗收報告 6 個問題修復
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
1. [Medium] Metrics Strip [object Object] — 移除 pendingApprovals 陣列直接渲染
   + label 硬編碼改 i18n (activeIncidents/serviceHealth/todayIncidents 等)
2. [Low] KB GET /{id} 不過濾 archived — get_by_id 加 status != ARCHIVED
3. [Low] favicon.ico 404 — 新增 NemoClaw SVG favicon + layout metadata
4. [Medium] auto-repair console errors — fetchEval 加 try-catch 靜默處理

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:30:43 +08:00
OG T
db2a2852b8 docs: 前端重構驗收報告 87/100
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Playwright 瀏覽器截圖 + KB API 端點測試 + Console 分析
- 24/24 路由零 404
- 7 完整頁面 + 15 ComingSoon
- KB API 7 端點全部正常
- 1 Low bug (archived entry still accessible via GET)
- Metrics Strip [object Object] 待修

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:20:27 +08:00
OG T
25889d4b8e docs: 歸檔 ADR-050 reanalyze 實作計畫 (已完成)
Some checks failed
CD Pipeline (Dev) / build-and-deploy-dev (push) Failing after 9s
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:38:03 +08:00
OG T
4d46e6b9a7 style(web): 全站 font-mono → font-body (DM Mono 設計系統套用)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
45 個 component + 6 個 page 統一從舊 font-mono 遷移到
font-body (DM Mono),確保設計系統一致性。

font-body = DM Mono (等寬),視覺效果相同但走新設計 token。
保留: font-heading (Syne)、font-dot-matrix (VT323/DSEG7)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:37:03 +08:00
OG T
db1aed81d9 fix(db): C1 時區統一遷移 — utc_now → taipei_now (全 5 table)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 18s
CD Pipeline / build-and-deploy (push) Has been cancelled
🔴 首席架構師審查 C1: 全系統禁止 UTC,必須台北時區 +8

- utc_now() → taipei_now() (調用 src.utils.timezone.now_taipei)
- 影響: ApprovalRecord, TimelineEvent, AuditLog, IncidentRecord, KnowledgeEntryRecord
- 13 處 default/onupdate 全部替換
- 移除 datetime.UTC import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:13:36 +08:00
OG T
628387de8c fix: risklevel migration 自動化 + Telegram Whitelist 注入
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
1. init_db() 啟動時自動確保 risklevel enum 包含 'high' 值
   (Phase 23 新增,避免舊 DB 缺值導致 InvalidTextRepresentation)

2. CD Pipeline 新增 OPENCLAW_TG_USER_WHITELIST 自動注入
   (之前為 CHANGE_ME,已更新為實際 user ID 5619078117)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:13:13 +08:00
OG T
3ecfe7b3f5 chore: 清理 NemoNodeAnimation 殘留 + 修復 Migration YAML
Some checks failed
E2E Health Check / e2e-health (push) Successful in 19s
CD Pipeline / build-and-deploy (push) Has been cancelled
- 移除 nemo-node-animation.tsx (無人引用,已被 NemoClaw 取代)
- Migration YAML: 修復 $$ 在 YAML heredoc 被 shell 解析問題
  改用單引號字串 DO '' 語法

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:09:25 +08:00
OG T
d2bad44173 fix(api): KB 架構審查修復 I3-I5
Some checks failed
E2E Health Check / e2e-health (push) Successful in 17s
CD Pipeline / build-and-deploy (push) Has been cancelled
- I3: Service 層加 IKnowledgeRepository Protocol 型別標注
- I4: search 方法加入 tags JSONB 搜尋 (cast→String→ilike)
- I5: get_categories 獨立方法,不再繞道 list_entries(limit=0)

首席架構師審查 87/100 → 全部 Important issues 已修復

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:05:54 +08:00
OG T
48a0bc66f7 fix(api): KB 首席架構師審查修復 (I1 tags filter + I2 type annotation)
Some checks failed
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
- I1: Repository list_entries 實作 tags JSONB @> 篩選 (之前聲明未實作)
- I2: ORM tags 型別從 list[dict[str, Any]] 修正為 list[str]

首席架構師審查: 87/100
C1 時區(UTC→Taipei) 為既有系統性問題,另開 task 統一遷移

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:04:41 +08:00