Commit Graph

187 Commits

Author SHA1 Message Date
AWOOOI CD
9a4fa5edf5 chore(cd): deploy 27ba97e [skip ci] 2026-04-15 19:12:19 +00:00
OG T
5f9c9d84a2 fix(configmap): Ollama 改指向 111 GPU + fallback 順序調整
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- OLLAMA_URL: 188(CPU-only) → 111(RTX GPU,avg 10s)
- AI_FALLBACK_ORDER: nvidia→gemini→ollama→claude
  改為 ollama→nvidia→gemini→claude
  本地 GPU 優先,外部 API 備援,雲端 Claude 最終兜底

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:00:16 +08:00
AWOOOI CD
44ecf609e0 chore(cd): deploy 9538f6c [skip ci] 2026-04-15 18:39:05 +00:00
AWOOOI CD
381be78344 chore(cd): deploy f5e33da [skip ci] 2026-04-15 17:55:11 +00:00
AWOOOI CD
644cae33c3 chore(cd): deploy 9bfa6fc [skip ci] 2026-04-15 17:37:10 +00:00
OG T
0760315059 fix(decision-manager): 修正 CAST 語法 + 關閉 shadow_mode
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
decision_chain_persist_failed 根因:
  asyncpg 不支援 :dc::json 語法 (:: 與具名參數 : 衝突)
  改為 CAST(:dc AS jsonb) — asyncpg 標準寫法

configmap:
  AIOPS_P4_SHADOW_MODE: true → false
  真實主動監控啟用 (proactive_inspector 輸出供 PreDecisionInvestigator 讀取)

2026-04-16 Claude Sonnet 4.6 Asia/Taipei

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 01:24:48 +08:00
AWOOOI CD
3fc2c41216 chore(cd): deploy 457018c [skip ci] 2026-04-15 17:18:47 +00:00
AWOOOI CD
34dd20298a chore(cd): deploy d258a1f [skip ci] 2026-04-15 16:22:45 +00:00
OG T
d4fed639f6 fix(ai-router): DIAGNOSE 恢復 OPENCLAW_NEMO 路由,修復全部 timeout 降級問題
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m16s
根因: 2026-04-12 patch 把 DIAGNOSE 改為 None(複雜度路由)
→ 落入 Rule 6 → Ollama deepseek-r1:14b (CPU 238s) → timeout
→ 降級 20% 信心 + 「待分析」→ 全部 unknown

修復:
1. ai_router.py: DIAGNOSE → OPENCLAW_NEMO(via 188:8088 NVIDIA NIM, 2-27s)
2. ai_router.py: 移除 Rule 6 的 DIAGNOSE deepseek 特殊case(已無用)
3. 04-configmap.yaml: AI_FALLBACK_ORDER 改為 nvidia 優先
   gemini→ollama→nvidia(舊)→ nvidia→gemini→ollama(新)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 00:03:04 +08:00
AWOOOI CD
b55575b56b chore(cd): deploy c9efaa3 [skip ci] 2026-04-15 15:59:47 +00:00
AWOOOI CD
7d3391cb69 chore(cd): deploy 800ab16 [skip ci] 2026-04-15 15:41:49 +00:00
AWOOOI CD
4bee14ae08 chore(cd): deploy 77a92eb [skip ci] 2026-04-15 14:39:13 +00:00
AWOOOI CD
65c8eb587c chore(cd): deploy 256a24e [skip ci] 2026-04-15 14:20:27 +00:00
OG T
76558a3cd9 feat(AIOps): 全開 P1-P6 feature flags + Nemotron + offline replay loop
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- configmap: 啟用 AIOPS_P1~P6 全部總開關與子開關
- configmap: ENABLE_NEMOTRON_COLLABORATION=true(回歸 120s timeout)
- feature_flags.py: 補齊 AIOPS_P6_GOVERNANCE_ENABLED 缺失欄位
- main.py: 掛載 run_offline_replay_loop(ADR-087 Phase 6)

2026-04-15 ogt + Claude Sonnet 4.6(亞太)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:59:51 +08:00
AWOOOI CD
e449b275aa chore(cd): deploy e5e94f5 [skip ci] 2026-04-15 13:19:00 +00:00
AWOOOI CD
e23e49c13b chore(cd): deploy ff448ad [skip ci] 2026-04-15 12:47:59 +00:00
AWOOOI CD
d46f230c1f chore(cd): deploy 6583870 [skip ci] 2026-04-15 12:18:44 +00:00
AWOOOI CD
2d85b49cc0 chore(cd): deploy f9ba200 [skip ci] 2026-04-15 11:47:40 +00:00
AWOOOI CD
160689a110 chore(cd): deploy f045506 [skip ci] 2026-04-15 11:31:02 +00:00
AWOOOI CD
586602e7ff chore(cd): deploy f31b4e3 [skip ci] 2026-04-15 11:18:56 +00:00
AWOOOI CD
1d22376b86 chore(cd): deploy fab65e7 [skip ci] 2026-04-15 11:06:22 +00:00
AWOOOI CD
37f4553349 chore(cd): deploy 4e2e665 [skip ci] 2026-04-15 08:22:53 +00:00
AWOOOI CD
53344c201e chore(cd): deploy 14a0226 [skip ci] 2026-04-15 07:57:10 +00:00
AWOOOI CD
9126c594a4 chore(cd): deploy 0f2ec79 [skip ci] 2026-04-15 07:28:25 +00:00
AWOOOI CD
8997ba70cb chore(cd): deploy a142e6e [skip ci] 2026-04-15 07:11:37 +00:00
AWOOOI CD
d493fb9b78 chore(cd): deploy 7da64ea [skip ci] 2026-04-15 06:18:11 +00:00
AWOOOI CD
7edb298a75 chore(cd): deploy 42bc1df [skip ci] 2026-04-15 05:58:38 +00:00
AWOOOI CD
d51705b4ec chore(cd): deploy b6cb199 [skip ci] 2026-04-15 05:40:15 +00:00
AWOOOI CD
40aa7ceba8 chore(cd): deploy 6c7f648 [skip ci] 2026-04-15 03:10:45 +00:00
AWOOOI CD
a52b550607 chore(cd): deploy a92562d [skip ci] 2026-04-14 13:50:09 +00:00
AWOOOI CD
44545633a8 chore(cd): deploy 208c28e [skip ci] 2026-04-14 12:53:38 +00:00
AWOOOI CD
a120cc45b8 chore(cd): deploy 10e3043 [skip ci] 2026-04-14 12:29:34 +00:00
AWOOOI CD
094aa957b2 chore(cd): deploy ca862c5 [skip ci] 2026-04-14 12:16:45 +00:00
AWOOOI CD
2a37d1c06f chore(cd): deploy 8b7e9cb [skip ci] 2026-04-14 11:46:35 +00:00
AWOOOI CD
35736315ce chore(cd): deploy 9b9ff5b [skip ci] 2026-04-14 11:31:31 +00:00
AWOOOI CD
3f8d087aee chore(cd): deploy 72dd0c5 [skip ci] 2026-04-14 11:13:00 +00:00
AWOOOI CD
e7171a4ac8 chore(cd): deploy aa4e575 [skip ci] 2026-04-14 10:56:28 +00:00
AWOOOI CD
88a33eb4d7 chore(cd): deploy f54dea4 [skip ci] 2026-04-14 10:42:20 +00:00
OG T
b8b124c917 chore: 收工存檔 — LOGBOOK + 封存過時 flywheel-alerts CRD
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-14 傍晚 Session 收尾:
- LOGBOOK.md: 記錄本日 6 commits + Mission C E2E 驗證 + MASTER 藍圖 P0+P1 全綠
- k8s/monitoring/flywheel-alerts.yaml: 加 🔴 DEPRECATED 標記
  (11/11 規則已遷入 ops/monitoring/alerts-unified.yml,此 CRD 檔無 Operator 支援)

Backlog 清剿盤點:
 C2 hasType4 前端硬編(已接真實 API)
 C3 WebSocket 無重連(指數退避 + polling fallback)
 flywheel-alerts Docker 方式改寫(已完成,僅舊檔未清理 → 本 commit 封存)
 risk_level YAML 優先邏輯(decision_manager:1663)
 SMTP_USER/SMTP_PASSWORD CHANGE_ME(需統帥提供帳密)
 各類 E2E 驗證(需真實告警觸發)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 18:21:08 +08:00
AWOOOI CD
0f48a507c0 chore(cd): deploy dd0a778 [skip ci] 2026-04-14 08:01:04 +00:00
AWOOOI CD
a71f09e30a chore(cd): deploy 2c6ed4e [skip ci] 2026-04-14 07:38:35 +00:00
OG T
2c6ed4e9cf fix(k8s): 修復 ArgoCD probe 失敗 + drift-scanner egress 封鎖
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m36s
問題 1 — ArgoCD "All connection attempts failed":
- ARGOCD_URL 指向 192.168.0.120:30443,但 node 120 kube-proxy 對
  30443 有路由 bug(ArgoCD pod 在 121)
- 修復: ARGOCD_URL → 192.168.0.121:30443
- NetworkPolicy: 補白名單 192.168.0.121/32:30443
- NetworkPolicy: 補白名單 192.168.0.125/32:30443 (keepalived VIP)

問題 2 — drift-scanner Error x5 / 系統沉默 9.4h:
- CronJob pod template 缺少 system=awoooi label
- default-deny-all 封鎖所有 egress,allow-required-egress 僅對
  system=awoooi pods 生效
- 修復: drift-cronjob pod template 新增 system: awoooi

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 15:28:52 +08:00
AWOOOI CD
dd378ac698 chore(cd): deploy 684d6cf [skip ci] 2026-04-14 06:50:00 +00:00
AWOOOI CD
5d8feaad2a chore(cd): deploy 38ff2bb [skip ci] 2026-04-12 15:01:47 +00:00
AWOOOI CD
6dec8ce491 chore(cd): deploy db4d428 [skip ci] 2026-04-12 14:32:47 +00:00
OG T
3de45aa2c3 fix(k8s): deployment env 同步 + 停用 ENABLE_NEMOTRON_COLLABORATION
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
將 live-patch 的 env: 覆蓋同步回 Git,避免 ArgoCD drift:
- ENABLE_NEMOTRON_COLLABORATION: false (Ollama timeout 修復)
- USE_AI_ROUTER, OLLAMA_URL, OPENCLAW_* 等覆蓋值納入 GitOps 管理

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:06:10 +08:00
AWOOOI CD
b6caabd8e3 chore(cd): deploy b3d4b9c [skip ci] 2026-04-12 13:29:40 +00:00
OG T
efca6f816a fix(nemotron): 停用 Nemotron 協作 — Ollama timeout 導致 confidence=0.0
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m1s
Ollama 111 tool_call 60s×2 > asyncio.wait_for 30s
→ expert_system fallback → confidence=0.0 → 告警卡顯示規則匹配 0%

暫停 ADR-044 直到 Ollama 穩定,OpenClaw LLM 仍正常運作

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:06:27 +08:00
AWOOOI CD
f64393e4cb chore(cd): deploy eda0cfd [skip ci] 2026-04-12 12:30:49 +00:00
AWOOOI CD
f4675872f9 chore(cd): deploy c3fea26 [skip ci] 2026-04-12 12:17:06 +00:00