Your Name
0e14935351
fix(ops): classify systemd runner alerts as host resources
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
2026-05-05 14:28:18 +08:00
Your Name
54a4e59af9
fix(auto-approve): 主機告警 SSH 診斷指令豁免 bad_target 驗證 — 修復 no_executable_action
...
根因:host_resource_alert 規則使用 {host}(由 instance label 派生),
與 {target} 無關;但 host 告警缺少 K8s deployment label 導致 target=unknown,
_is_bad_target=True → kubectl_command 被清空 → auto_approve 以
no_executable_action 拒絕 → 每日 3 次人工攔截。
修復:
- alert_rule_engine.py: SSH 指令(startswith "ssh ")跳過 bad_target 驗證
- prompts.py: 主 + Nemo prompt 補 Host* 告警 SSH 診斷規則,防 LLM fallback 路徑輸出 kubectl
- ssh_command_whitelist.py: 新建唯讀 SSH 指令白名單模組(供 _ssh_execute() 執行前驗證)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-04 14:15:05 +08:00
Your Name
ed2a4838f2
fix(auto): use action parser for repair gates
CD Pipeline / tests (push) Failing after 1m2s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 24s
2026-04-30 14:06:09 +08:00
Your Name
d845d53257
fix(security): keep Gemini key out of request URLs
CD Pipeline / build-and-deploy (push) Successful in 15m5s
2026-04-29 22:56:12 +08:00
Your Name
6878e62af7
feat(flywheel): W1 PR-P1 + ADR-091 T1 — 飛輪 80→90 第一波
...
依 onboarder 端到端閉環審計挖出的 10 條斷鏈 + critic 鐵律違反全景,
W1 第一波修復飛輪鐵證 1 + 2 的核心斷鏈 C1。
## W1 PR-P1 — matched_playbook_id 四斷點守門 (C1 修復)
fullstack 探勘發現 4 斷點之前 session 已修,本 PR 補:
- ENABLE_PLAYBOOK_MATCHING feature flag (default=true)
rollback: kubectl set env deployment/awoooi-api ENABLE_PLAYBOOK_MATCHING=false
- proposal_service._try_playbook_match_id 入口加 flag check
- 7 個 e2e 測試補上保護網(之前無測試覆蓋)
斷鏈 C1 證據鏈:proposal_service.generate_proposal() → matched_playbook_id
→ approval_db → approval_repository → learning_service._update_playbook_stats
24h 後 playbooks.trust_score 應有真實 EWMA 更新。
## ADR-091 T1 — auto_generate_rule 雙寫 DB (鐵證 1 第一步)
飛輪鐵證 1:alert_rule_catalog.source='ai_generated' 全 codebase 0 筆。
auto_generate_rule() 寫 alert_rules.yaml 但不寫 DB → AI 自學成果與 catalog 雙軌脫鉤。
修法(依 ADR-091 §1 D1):
- 新增 _insert_catalog_ai_generated():YAML 寫入成功後雙寫
source='ai_generated', confidence=0.5, review_status='draft', created_by_agent
- 新增 _parse_for_to_seconds() helper("30s"/"5m"/"2h" → seconds)
- ON CONFLICT (rule_name) DO NOTHING 冪等保證
- transaction 策略:YAML + DB 不在同一 transaction(YAML 已成 SoT,DB 失敗只 log)
- ENABLE_AI_RULE_CATALOG_WRITE feature flag (default=true)
rollback: kubectl set env deployment/awoooi-api ENABLE_AI_RULE_CATALOG_WRITE=false
13 個測試覆蓋:parse helper 8 + 業務邏輯 5(success/db_fail/idempotent/flag/SQL_lit)
## 驗證
1572 unit tests 全綠(+20 新增:PR-P1 7 + ADR-091 T1 13)
## 期望影響
飛輪自主化評分:42 → 65(+23 = C1 +3 + 鐵證 1 +20)
## 已知債(critic PR review 揭示,下一個 commit 處理)
- KMWriter 統一契約 3 條 caller 路徑被旁路(C1/M1/M2)
- KMWriter 冪等聲明與實作不符(M3 缺 ON CONFLICT)
- Alertmanager equal:[] 爆炸抑制 + 版本未驗(M4/M5)
- drift checker regex 脆弱(M7 應改 AST)
- governance health score skipped 失真(M6)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-29 10:44:39 +08:00
Your Name
45dbe07188
fix(flywheel): 自動化飛輪六大能力修復(ADR-092 B3)
...
run-migration / migrate (push) Failing after 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 53s
Type Sync Check / check-type-sync (push) Successful in 2m54s
CD Pipeline / build-and-deploy (push) Has been cancelled
Ansible Lint / lint (push) Has been cancelled
【根因鏈修復】
MCP Provider bugs → PreDecisionInvestigator 失敗 → Agent Debate 無上下文
→ LLM 逾時 → description="待分析" → ADR-091 鐵閘攔截 → tg_sent 未設
→ W-2 Watchdog 誤報「靜默故障」
【六大修復】
1. MCP Provider 三蟲修復
- ssh_provider: asyncssh.run() → conn.run()
- prometheus_provider: KeyError 'query' → .get() 容錯
- k8s_provider: 空 pod_name → 早返回錯誤字典
2. Agent Debate / 決策品質
- decision_manager: 逾時降級文字改為明確描述(繞過 ADR-091 鐵閘)
- intent_classifier: LLM 逾時降級至關鍵字分類(非 None)
3. Watchdog 誤報修復(ADR-092 B3)
- W-2: tg_sent Redis TTL → telegram_message_id IS NULL(DB 真值)
- W-5 新增: suggested_action IN 空/待分析/NO_ACTION + tg_id IS NULL
- approval_timeout_resolver: 60min → 15min,batch 50 → 200
4. Config Drift 自動化
- drift_adopt_service: auto_adopt_if_safe() 六條件安全閘
- drift.py: 背景任務先嘗試自動採納再發人工 Telegram 卡片
5. Playbook 飛輪穩定
- playbook_seed_service: 修復幂等性(deprecated 不視為缺失)
- playbook_evolver: 只載 DRAFT+APPROVED(非全部 294 筆)
6. 可觀測性
- alert_rule_engine: auto_rule 結構化日誌 + Redis 計數器(pipeline)
- auto_approve: reject 原因 Redis 計數器
- heartbeat_report_service: 新增「⚙️ 自動化統計(今日)」區塊
【待人工執行】
psql $DATABASE_URL -f apps/api/migrations/cleanup_duplicate_deprecated_playbooks.sql
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 10:55:50 +08:00
Your Name
de2d34d4cd
fix(playbook): C1-C4 全流程串接 — evolver保護+seeder復活+規則即時建立+watchdog W-4
...
CD Pipeline / build-and-deploy (push) Has been cancelled
C1: playbook_evolver — yaml_rule source playbooks 加 YAML_RULE guard,
evolver 不再封存 seeder 建立的 APPROVED playbook,保護自動修復鏈路
C2: playbook_seed_service — idempotency SQL 排除 DEPRECATED 記錄,
evolver 封存後重啟可復活 yaml_rule playbooks
C3: alert_rule_engine — AI 自動生成規則成功後立即呼叫 seed_playbooks_from_rules(),
不等下次重啟即可建立對應 APPROVED Playbook
C4: ai_slo_watchdog_job — 新增 W-4 APPROVED playbook 數量為 0 告警,
鏈路斷裂立即 TYPE-8M;total checks 由 3 升為 4
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 20:18:11 +08:00
Your Name
54d60d04f5
feat(drift+target): P0.1+P0.2+P0.3 三修 — drift 分頁分類 + AI 推薦 + target 追 trace
...
統帥三問決議:全做;AI 推薦 0.85 門檻純顯示不自動;先查 aol 再修
## RCA: awoooi-service 失敗來源
- /api/v1/aiops/kpi 顯示過去 24h 有 1 筆 playbook_executed actor=approval_execution status=failed
- grep codebase: 無任何程式碼寫死 awoooi-service(只有歷史 comment)
- 最可能源: alert_rule_engine._extract_vars 從 labels.service 取值當 Deployment 名
- cf5050c/4f2e122(2026-04-18)已修 NEMOTRON 幻覺雙路徑;本次修第三條路徑
## 修復
### P0.3a alert_rule_engine._extract_vars
- labels.service 降級:-service 結尾先剝 suffix 視為 base name
- match_rule 回傳新增 target_source 欄位追 trace
- 下次 awoooi-service 復發可直接看來源(label.service(stripped) 等)
### P0.3c approval_execution._log_aol_started.input
- 補 parsed_target/operation/namespace 欄位
- 未來 aol 查 failed 可直接看 target,無需推敲
### P0.1 telegram_gateway._send_drift_diff_detail
- 分頁(10 項/頁)取代一次洗版 30 項
- header 3 桶分類計數: 人工高風險 / 一般修改 / K8s 自動
- 底部 ⬅️ /➡️ 分頁按鈕(callback: drift_view_page:{report_id}_{page})
- security_interceptor INFO_ACTIONS 加 drift_view_page 白名單
### P0.2 drift_narrator recommendation
- LLM prompt 加 recommendation 欄位(action/confidence/reason)
- action ∈ {adopt, revert, ignore, investigate}
- 卡片頂部顯示「🎯 AI 建議:⏪ 回滾 (85%) — reason」
- LLM 失敗走 _fallback_recommendation(規則式依 intent 對應)
- 卡片 diff_summary 上限 500 → 1500 字容納推薦 + narrative + items
- 統帥指令:純顯示不自動執行(門檻 0.85 保留未來)
## 驗證
- 90 個 pytest test 全過(drift + rule_engine + approval_execution)
- 5 檔 AST syntax check 過
## 下次驗收
1. 下次 drift 觸發 → 卡片頂部有「🎯 AI 建議」
2. drift_view 按下 → 3 桶分類 header + ⬅️ /➡️
3. awoooi-service 若復發 → automation_operation_log.input.parsed_target 直接查
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 04:04:13 +08:00
OG T
83ab5e32d7
fix(happy-path): Happy Path 全境加固 — INVALID_TARGET + critical NO_ACTION + 空指令攔截
...
CD Pipeline / build-and-deploy (push) Has been cancelled
問題 1 (P0) — deployment/unknown 無效重啟:
- alert_rule_engine: 追蹤 _invalid_target flag,回傳 blocked_reason="INVALID_TARGET-..."
- decision_manager: auto_execute 路徑偵測 INVALID_TARGET → 提早返回 + TYPE-4 人工確認
- auto_approve: 新增條件 1c — action 為空字串直接拒絕,防止誤報「即將執行」
問題 2 (P1) — critical+NO_ACTION 靜默:
- decision_manager: blocked_reason 感知層重構
① INVALID_TARGET → TYPE-4
② NO_ACTION + critical → TYPE-4(升級,SRE 不可錯過)
③ NO_ACTION + 非 critical → TYPE-1(維持純資訊卡)
問題 3 (P1) — 規則匹配信心黑洞:
- auto_approve 條件 1c 確保空 action 不通過 auto-approve
即便 is_rule_based=True 也無法在無指令時自動執行
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-16 22:57:50 +08:00
OG T
6c7f648b60
fix: 3 個飛輪沉默未打通節點 — 統帥截圖盤出
...
CD Pipeline / build-and-deploy (push) Successful in 18m56s
統帥截圖證據 (Telegram MEDIUM 告警仍走人工審核):
INC-20260411-A03B2E / A2BB29 顯示「[規則匹配]」+ action=unknown-service
節點 1: AutoApprovePolicy 擋下規則匹配 (飛輪主因)
- ADR-073 規則匹配 confidence=0.0 (防偽造)
- AutoApprovePolicy.min_confidence=0.50 → 擋下
- 結果: MEDIUM 規則匹配永遠人工審核,飛輪不轉
修復: auto_approve.py 加 _is_rule_based 判斷
(is_rule_based / source=expert_system / rule_id / matched_rule)
→ bypass min_confidence 檢查
→ 驗證: should_auto_approve=True ✅
節點 2: _is_bad_target 漏 unknown-service magic string
- _resolve_target_from_k8s fallback 產 unknown-service / unknown-pod
- GAP-A4 Phase 1/2 只擋 'unknown' 而非前綴
修復: alert_rule_engine.py 加 unknown-/none-/null-/undefined- 前綴黑名單
→ 驗證: 4 個 magic 全 bad ✅
節點 3: stale_ready_tokens_resend 無時效過濾
- 截圖是 2026-04-11 (4 天前) 告警
- 舊 labels 過期,重 process 也產不出新 target
- 壓爆 Ollama + 污染 Telegram 卡片
修復: decision_manager.py 跳過 > 3 天的 stale incident
→ skip + log stale_ready_token_skipped_too_old
回歸: 113/113
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2026-04-15 10:56:48 +08:00
OG T
10b74affcf
fix(GAP-A4): 規則 Action 模板 placeholder 解析修復 — 解開 8.3h 飛輪沉默
...
CD Pipeline / build-and-deploy (push) Has been cancelled
🚨 真因診斷(統帥逮到):
API log 顯示最近 1 小時爆發大量 auto_execute_blocked_unresolved_placeholder:
- action: "kubectl rollout restart deployment HostHighCpuLoad" ← target=alertname
- action: "kubectl rollout restart deployment unknown"
- action: "kubectl scale deployment unknown --replicas=3"
根因:alert_rule_engine._extract_vars() target 解析邏輯不夠強健,
當 Prometheus 告警無 deployment label 時,退回 alertname 或 "unknown",
產生垃圾指令。GAP-A1 防注入閘正確攔下,但自動修復路徑因此卡死,
KM 不寫入 → 飛輪沈默。
修復(三層防護):
1. 新增 _strip_pod_suffix() — K8s Pod 名稱還原 Deployment base
- Deployment 格式: awoooi-api-7d6b776f78-4sgjl → awoooi-api
- StatefulSet: postgresql-0 → postgresql
- Legacy: my-job-x2m4k → my-job
2. 新增 _is_bad_target() — 垃圾 target 識別
- 空串 / "unknown" / "none" / "null"
- target == alertname 本身
- IP:port 格式、純 IP、含空白/括號/引號
- 未解析 {placeholder}
3. 重寫 _extract_vars() — 多層 label 查找(權威優先):
deployment > app > statefulset > pod(去後綴) > container > service > target_resource
每層都過 _is_bad_target 驗證,全失敗 → target="unknown"
4. match_rule() 後置雙驗證:
- bad target → 清空 kubectl_command (降級 LLM)
- 殘留 { or } → 清空 kubectl_command (模板未填完)
測試覆蓋:
- 33 個新單元測試(GAP-A4 四大場景全覆蓋)
- 214/214 回歸測試全過
影響:
- 原本產出「kubectl rollout restart deployment HostHighCpuLoad」的路徑
→ 現在會 `rule_kubectl_command_discarded_bad_target` 並降級 LLM
- LLM 若能從錯誤 log 推理真實 deployment,飛輪恢復正常運轉
- 若 LLM 也無解,進 TYPE-4 人工扶梯
2026-04-14 Claude Sonnet 4.6(MASTER 藍圖之外的隱性 Bug 殲滅)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2026-04-14 18:43:29 +08:00
OG T
cc42aa0bdb
feat(adr-076): Task 2.2 + 2.3 — 規則擴充 + kubectl 注入防護
...
CD Pipeline / build-and-deploy (push) Has been cancelled
Task 2.2: alert_rules.yaml 新增 3 類規則 (priority 125-127)
- gitea_down: Gitea CI/CD 下線 → NO_ACTION (priority 125, critical)
- ssl_cert_expiring: SSL 憑證到期 → NO_ACTION (priority 126, medium)
- external_site_down: MoWoooWork/Dev/Blackbox probe → NO_ACTION (priority 127, medium)
規則總數: 21 → 24
Task 2.3: alert_rule_engine.py kubectl 注入防護
- _RULE_ENGINE_DESTRUCTIVE_RE: 阻擋 delete pvc/namespace/statefulset/deployment,
drain/cordon, --replicas=0, rm -rf, DROP TABLE, $() 反引號
- validate_kubectl_command(): 公開 API,SSH 指令/空字串直接通過
- match_rule() 整合: 變數替換後驗證,阻擋時清空 + log warning
- test_alert_rule_engine_validation.py: 34 tests (100% 通過)
測試: 776 passed, 26 skipped, 0 failed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-14 15:10:10 +08:00
OG T
c09521a1c6
fix(cr): Code Review P0/P1 全修補 — 積木化+SSH路由+安全守衛順序
...
CD Pipeline / build-and-deploy (push) Failing after 2m30s
P0-1: classify_alert_early 移至 incident_service (Service層),webhooks.py import 修正
P0-2: _ssh_execute() 改用 self._ssh,移除冗餘 SSHProvider() 實例化
P1-1: infrastructure SSH routing 移至 kubectl safety guard 之前,docker指令不再被攔截
P1-2: alert_rule_engine 新增 get_risk_for_alertname() public API
P1-3: classify_notification() docstring 修正 ORM→Pydantic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-12 14:51:12 +08:00
OG T
d77b2add73
fix(review): 首席架構師 Code Review 修補 — I1 get_incident_type 邏輯修正 + 測試補全
...
CD Pipeline / build-and-deploy (push) Failing after 8m13s
Code Review 發現 2 個 Critical + 2 個 Important 問題:
Critical:
- rule.id 語意為「規則識別符」,與 incident_type 命名空間不同,不可混用
移除 rule_id fallback 路徑,YAML 匹配無 incident_type 時 fall through 靜態 dict
- get_incident_type() 關鍵路徑無測試覆蓋
新增 test_get_incident_type.py:11 測試、4 類別(靜態/YAML優先/YAML錯誤/custom)全過
Important:
- ALERTNAME_TO_TYPE deferred import 移至模組頂層(無 circular 風險)
- alert_types.py TODO 過期 → 更新為 I1 整合後正確說明
技術債記錄:NetworkPolicy ArgoCD egress ClusterIP 10.43.16.201/32 需 ArgoCD 重裝後更新
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 21:33:19 +08:00
OG T
615822dcf3
feat(I1): ADR-064 Rule Engine 整合 — 動態推斷 incident_type
...
CD Pipeline / build-and-deploy (push) Successful in 11m28s
- alert_rule_engine.py: 新增 get_incident_type(alertname)
優先從 YAML 規則 match.alertname 查找 incident_type/rule_id
Fallback: ALERTNAME_TO_TYPE 靜態 dict → "custom"
- webhooks.py: alert_type 改用 get_incident_type(alertname)
取代 ALERTNAME_TO_TYPE.get() 靜態查找
- YAML 規則 19 條 alertname 覆蓋自動生效(無需手改 dict)
- 新 alertname 觸發 generic_fallback → auto_generate_rule() 後自動加入 YAML
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 21:21:41 +08:00
OG T
2554ac1e60
fix: E2E test 告警識別 + 自動修復結果 Telegram 通知
...
CD Pipeline / build-and-deploy (push) Has been cancelled
**alert_rule_engine.py**
- _matches() 加入 instance_prefix 匹配(最高優先)
- match_rule() 傳入 instance label 至 _matches
- 用途: e2e-final-* / e2e-test-* instance 可被 YAML 規則識別
**alert_rules.yaml**
- 新增 e2e_smoke_test 規則 (priority=120)
- alertname: E2E_SMOKE_TEST / instance_prefix: e2e-final- / e2e-test- / test-host
- suggested_action: NO_ACTION,顯示「告警鏈路驗證成功」
**decision_manager.py**
- _auto_execute() 成功後發 Telegram 結果通知 ✅
- _auto_execute() 失敗後發 Telegram 失敗通知 ❌
- 新增 _push_auto_repair_result() 函數
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 14:16:15 +08:00
OG T
79a9a514dd
fix(rules): ADR-064 L1 Redis 分散式鎖防止多 Pod 重複生成規則
...
CD Pipeline / build-and-deploy (push) Has started running
問題: _generating set 是進程級,多 Pod 各自獨立,同一 alertname 可能被
多個 Pod 同時送給 Ollama/Gemini 生成規則
修復: SET NX EX lock_key — 只有第一個 Pod 能取鎖,其他 Pod 直接跳過
降級: Redis 不可用時 fallback 進程級 set(保持原有行為)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 12:03:51 +08:00
OG T
428e66c111
fix(arch-review): 首席架構師審查 S1×3 S2×3 S3×3 全修復 + ADR-064
...
CD Pipeline / build-and-deploy (push) Has been cancelled
S1 Critical:
- S1-1: asyncio 觸發移至 _call_with_fallback async 上下文,移除 sync 中的 get_event_loop()
- S1-2: _append_rule_to_yaml 加 textwrap.dedent() 正規化 LLM 輸出縮排
- S1-3: _matches() 對 alertname=["*"] 直接回傳 False,防意外命中
S2 Major:
- S2-1: auto_generate_rule() 改為 DI 參數注入 (ollama_url/model/gemini_api_key),移除 import settings
- S2-4: _generate_mock_response docstring 澄清為規則引擎生產路徑,非假數據
- S2-5: suggested_action .strip() 防空白字串繞過 or
S3 Minor:
- S3-2: priority 上界 min(next, 890)
- S3-3: alertname sanitize re.sub([{}]) 防 format KeyError
- S3-4: model_registry.py 最後修改時間戳更新
文件:
- ADR-064: Alert Rule Engine YAML 驅動 + AI 自動學習
- Skills 02: 告警規則引擎 DI 規範 + asyncio 禁止事項
- Skills 03: _generate_mock_response 語意澄清 + 規則引擎降級流程
- LOGBOOK: 本次 Session 完整記錄
2026-04-09 ogt: 首席架構師審查修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 10:52:40 +08:00
OG T
89da2d24be
fix(model-registry): fallback config 更新為 deepseek-r1:14b + gemma3:4b
...
CD Pipeline / build-and-deploy (push) Successful in 13m20s
- model_registry._get_default_config: ollama summary llama3.2:3b → gemma3:4b
- model_registry._get_default_config: ollama default/rca → deepseek-r1:14b
- 修正 test_smart_router::test_simple_context 失敗 (斷言 gemma3:4b)
- alert_rule_engine: 移除 asyncio/time unused import
- 2026-04-09 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 09:52:47 +08:00
OG T
71437db0e9
feat(rule-engine): 自動規則生成 — generic_fallback 觸發 AI 學習
...
CD Pipeline / build-and-deploy (push) Successful in 11m25s
流程:
1. 告警命中 generic_fallback 規則
2. 背景觸發 auto_generate_rule()
3. Ollama (deepseek-r1:14b) 生成 YAML 規則片段
4. Ollama 失敗 → Gemini 備援
5. 驗證格式 → append alert_rules.yaml → 清除 lru_cache
6. 下次同類告警直接命中專屬規則,不再走兜底
去重: 同一 alertname 進程內只生成一次
手寫規則 priority 1-499,AI 生成 500-899,兜底 999
2026-04-09 ogt: AI 自學規則引擎
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 09:20:33 +08:00
OG T
d1ede7f989
feat(openclaw): 告警規則引擎 — alert_rules.yaml 取代硬編碼 if/elif
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 alert_rules.yaml: 6 條規則 (docker/target_down/oom/cpu/5xx/crash) + 通用兜底
- 新增 alert_rule_engine.py: YAML 載入、匹配邏輯、變數填充
- openclaw.py _generate_mock_response: 重構為呼叫規則引擎 (v8.0)
- 新增規則只需修改 YAML,重啟 Pod 即可,不需改代碼
- 2026-04-09 ogt: 架構重構
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 09:05:23 +08:00