AWOOOI CD
|
5d8feaad2a
|
chore(cd): deploy 38ff2bb [skip ci]
|
2026-04-12 15:01:47 +00:00 |
|
OG T
|
38ff2bb7a5
|
fix(heartbeat): 改用 ADR-075 TYPE-1 格式 — 💚 INFO 樹狀結構
CD Pipeline / build-and-deploy (push) Successful in 15m4s
舊平鋪文字 → ├─/└─ 樹狀結構對齊 ACTION REQUIRED 卡片風格
- 標題: 💚/⚠️ INFO | AWOOOI 系統報告
- 加 ────── 分隔線
- AI/MCP/飛輪/基礎設施各節統一 ├─/└─ 格式
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:52:05 +08:00 |
|
OG T
|
f1face4e34
|
fix(alert-rules): HostHighCpuLoad 獨立規則,停止 kubectl scale unknown
CD Pipeline / build-and-deploy (push) Has been cancelled
根因: HostHighCpuLoad 是 node_exporter host 告警,無 pod/deployment label
被分到 K8s high_cpu 規則 → {target}=unknown → auto-repair 安全攔截
修復: 新增 host_cpu_high 規則 (priority=45),NO_ACTION + 正確描述
high_cpu 規則移除 HostHighCpuLoad/NodeCPUUsageHigh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:50:37 +08:00 |
|
OG T
|
1a4b52ed28
|
fix(alert): fingerprint 加 alertname 防跨告警指紋衝突 + 補入缺漏心跳分類
CD Pipeline / build-and-deploy (push) Has been cancelled
問題根因:
1. generate_fingerprint 用 alert_type(大量 alertname 落入 "custom")
→ 不同告警名稱同目標共用指紋 → 30 分鐘 debounce 互相擋截
2. classify_alert_early 漏掉 DeadMansSwitch / NoAlertsReceived /
PrometheusNotConnectedToAlertmanager → 落入 TYPE-3 一般告警
修復:
- alert_analyzer_service.py: 指紋改為 namespace:deployment:alertname:target_resource
alertname 取自 labels(Alertmanager),fallback 到 alert_type(其他來源)
- incident_service.py: DeadMansSwitch → backup/TYPE-1;
NoAlertsReceived + PrometheusNotConnectedToAlertmanager → alertchain_health/TYPE-8M
- 補 2 個測試,全套 627 passed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:50:20 +08:00 |
|
OG T
|
b17a677b97
|
fix(gitea-webhook): analysis.model_dump() 對 dict 失敗
CD Pipeline / build-and-deploy (push) Has been cancelled
_call_openclaw_push_review 回傳 dict,不是 Pydantic model
改用 hasattr 判斷是否有 model_dump()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:45:09 +08:00 |
|
OG T
|
0c88f6702e
|
fix(ai-router): DIAGNOSE 強制用 deepseek-r1:14b,不用 gemma3:4b
CD Pipeline / build-and-deploy (push) Has been cancelled
gemma3:4b (summary model, complexity≤1) 不輸出結構化 JSON
→ _parse_llm_response 無法提取 confidence → confidence=0.0
deepseek-r1:14b (default model) 已驗證可輸出 confidence=0.8
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:43:49 +08:00 |
|
OG T
|
946fe1fa7c
|
fix(monitoring): 合併重複飛輪告警 group + 補 notification_type: TYPE-8M
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 44s
awoooi_flywheel_health (重複) 合入 awoooi_flywheel_meta_alerts:
- 所有 5 條規則加 notification_type: TYPE-8M
- 新增 FlywheelAlertnameNullHigh(原僅在舊 group)
- 刪除重複 group,消除 Prometheus 同名告警衝突
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:43:02 +08:00 |
|
AWOOOI CD
|
6dec8ce491
|
chore(cd): deploy db4d428 [skip ci]
|
2026-04-12 14:32:47 +00:00 |
|
OG T
|
db4d4280f5
|
test(ai-router): 更新 DIAGNOSE routing 測試反映暫停 NEMOTRON 現況
CD Pipeline / build-and-deploy (push) Successful in 14m28s
NEMOTRON 因 confidence=0.0 問題暫停,改走複雜度路由(None)
待 _parse_confidence() 修復後恢復
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:22:52 +08:00 |
|
OG T
|
09134f5c47
|
fix(openclaw): 修復 incident.title + DIAGNOSE→NEMOTRON confidence=0.0
CD Pipeline / build-and-deploy (push) Failing after 2m10s
1. telegram_gateway.py:1169 — classify_notification() 仍用 incident.title
改用 alertname + signal annotations 組合 (同 decision_manager.py 修法)
2. ai_router.py — DIAGNOSE 路由暫停 NEMOTRON
NIM tool_call 返回無 confidence → openclaw_analysis_complete confidence=0.0
改為 None (複雜度路由),讓 Gemini/openclaw_nemo 處理 DIAGNOSE
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:12:55 +08:00 |
|
OG T
|
3de45aa2c3
|
fix(k8s): deployment env 同步 + 停用 ENABLE_NEMOTRON_COLLABORATION
CD Pipeline / build-and-deploy (push) Has been cancelled
將 live-patch 的 env: 覆蓋同步回 Git,避免 ArgoCD drift:
- ENABLE_NEMOTRON_COLLABORATION: false (Ollama timeout 修復)
- USE_AI_ROUTER, OLLAMA_URL, OPENCLAW_* 等覆蓋值納入 GitOps 管理
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 22:06:10 +08:00 |
|
OG T
|
bd75aca727
|
feat(adr-075): 補全 2 個欠缺的 Prometheus 告警規則
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 49s
- MomoScraperSuccessLow: 業務爬蟲成功率 <90% (business group)
- CoreDNSResolutionFailed: CoreDNS SERVFAIL 率 >5% (kubernetes group)
ADR-075 Phase 3 完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:59:18 +08:00 |
|
AWOOOI CD
|
b6caabd8e3
|
chore(cd): deploy b3d4b9c [skip ci]
|
2026-04-12 13:29:40 +00:00 |
|
OG T
|
b3d4b9c8a9
|
test(telegram): 修正 test_telegram_message_templates 斷言
CD Pipeline / build-and-deploy (push) Successful in 14m24s
CRITICAL → 嚴重 (ADR-075 中文風險等級)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:20:16 +08:00 |
|
OG T
|
01e6d75ee7
|
test(telegram): 修正測試斷言符合 ADR-075 中文風險等級
CD Pipeline / build-and-deploy (push) Failing after 1m55s
HIGH→高風險, MEDIUM→中風險 (test_sentry / test_github webhook)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:08:48 +08:00 |
|
OG T
|
efca6f816a
|
fix(nemotron): 停用 Nemotron 協作 — Ollama timeout 導致 confidence=0.0
CD Pipeline / build-and-deploy (push) Failing after 2m1s
Ollama 111 tool_call 60s×2 > asyncio.wait_for 30s
→ expert_system fallback → confidence=0.0 → 告警卡顯示規則匹配 0%
暫停 ADR-044 直到 Ollama 穩定,OpenClaw LLM 仍正常運作
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:06:27 +08:00 |
|
OG T
|
9c8dde0951
|
fix(telegram): 修復 Incident 無 title 欄位導致所有 Telegram 推送失敗
CD Pipeline / build-and-deploy (push) Failing after 2m3s
根因: _push_decision_to_telegram() 有兩處引用 incident.title,
但 Incident model 從來沒有此欄位,導致所有告警卡片推送都
拋 AttributeError,事件在 telegram_decision_push_failed 靜默失敗。
修法:
- line 188: message 改用 signal annotation summary/description/alert_name
- line 249: TYPE-1 title 改用 alertname label / signal.alert_name
影響: 自從 decision_manager 加入這兩行以來,所有 Telegram 通知都沒發出
(包含 TYPE-1 資訊通知和 TYPE-3 審批卡)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:02:55 +08:00 |
|
OG T
|
3d8b0e4f90
|
fix(adr075): TYPE-3 格式改用 spec 模板 — ACTION REQUIRED + AI深度診斷 + 建議修復動作
CD Pipeline / build-and-deploy (push) Failing after 2m15s
- 標頭改為 "{emoji} ACTION REQUIRED | {severity_zh}"
- 新增 "🧠 AI 深度診斷" 區塊 (分析/責任/AI來源)
- 新增 "⚡ 建議修復動作" 區塊 (<code> 格式)
- confidence=0 顯示 "📋 規則分析" 取代誤導性 "🔴 0%"
- SignOz 指標區塊補回 Trace 連結
2026-04-12 ogt: ADR-075 TYPE-3 格式標準化
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:00:28 +08:00 |
|
OG T
|
a7f2b9c0f5
|
fix(display): 規則匹配改顯示 ✅ 取代 🔴 0% + 修復 LLM 字串 confidence 解析
CD Pipeline / build-and-deploy (push) Has been cancelled
- telegram_gateway.py: confidence==0 (規則匹配/Expert fallback) 不再顯示
「🔴 0%」,改顯示「⚙️ 規則匹配 ✅」,兩個 card 類型都修正
- openclaw.py: NIM/Ollama 有時回傳字串 "0.85" 而非 float,導致
isinstance(str, int|float)=False → confidence 被強制設 0.0。
現在先嘗試 float() 解析,解析失敗才 fallback 0.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:50:53 +08:00 |
|
AWOOOI CD
|
f64393e4cb
|
chore(cd): deploy eda0cfd [skip ci]
|
2026-04-12 12:30:49 +00:00 |
|
OG T
|
eda0cfd034
|
fix(adr075): drift 通知改用 send_drift_card,補齊所有呼叫點
CD Pipeline / build-and-deploy (push) Successful in 14m13s
- drift.py: 移除死碼 send_text(),改由 narrate_and_notify() 統一發卡片
- drift_narrator_service: _send_telegram() 改呼 send_drift_card() 帶四顆按鈕
- webhooks.py /alerts 路徑: 補傳 alert_category 啟用動態按鈕
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:20:47 +08:00 |
|
AWOOOI CD
|
f4675872f9
|
chore(cd): deploy c3fea26 [skip ci]
|
2026-04-12 12:17:06 +00:00 |
|
OG T
|
c3fea26222
|
fix(adr075): webhooks send_approval_card 補傳 alert_category+notification_type
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點真正根因:_push_to_telegram_background 呼叫 send_approval_card()
時沒有傳入 alert_category 和 notification_type,導致動態按鈕永遠
fallback 到通用 [批准][拒絕][靜默]。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:07:12 +08:00 |
|
OG T
|
0a4b7e9609
|
fix(classify): HostBackupFailed 精確補入 backup/TYPE-1(測試通過)
CD Pipeline / build-and-deploy (push) Has been cancelled
前次修法用 'backup' in alertname_lower 太寬,導致 BackupJobFailed warning
被分到 TYPE-1,破壞 test_backup_keyword_warning_not_type1。
改為精確白名單:
_BACKUP_TYPE1_NAMES = {HostBackupFailed, HostBackupStale, HostBackupMissing,
BackupRestoreTestFailed, BackupRestoreTestStale}
+ alertname.startswith('HostBackup') 兜底
結果:664 passed, 0 failed
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:03:46 +08:00 |
|
OG T
|
f25d82a88a
|
fix(adr075): 修補斷點E — _push_to_telegram_background 補 TYPE-8M routing
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點E:alertmanager webhook 走 _push_to_telegram_background,
未含 TYPE-8M branch,導致 meta alert 從未送出。
- webhooks.py: 新增 alert_category 參數 + TYPE-8M branch
- incident_service.py: 還原 rule 5 僅攔 watchdog/heartbeat,
移除誤加的 backup startswith 規則(VeleroBackup 由 K8s rule 接管)
Tests: 52/52 passed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:01:51 +08:00 |
|
OG T
|
1f7975170a
|
fix(classify): HostBackupFailed 補入 backup/TYPE-1 規則
CD Pipeline / build-and-deploy (push) Failing after 1m51s
classify_alert_early() 的 backup 規則只攔 watchdog/Heartbeat,
HostBackupFailed 先被 Host prefix 規則攔走 → host_resource/TYPE-3 → 跑 LLM → 審批卡。
修法:在 Host prefix 前新增 backup 關鍵字/前綴攔截:
- HostBackup* / Backup* / VeleroBackup* / BackupRestore*
- alertname 含 "backup"(大小寫不敏感)
影響:所有備份相關告警直接走 TYPE-1 info 通知,不進 LLM。
HostHighCpu / HostDown 等非備份的 Host 告警不受影響。
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:52:05 +08:00 |
|
OG T
|
a5f17cea79
|
fix(notification): TYPE-1 backup/info 告警不再發審批卡
CD Pipeline / build-and-deploy (push) Has been cancelled
classify_notification() 不知道 alert_category,對 backup 告警
(confidence=0, auto_executed=False)返回 TYPE-3,覆蓋掉
classify_alert_early() 已設好的 notification_type=TYPE-1。
修法:在路由分支前,讓 incident.notification_type 明確值
(TYPE-1 / TYPE-4D / TYPE-8M)覆蓋 classify_notification()。
影響:backup/info/watchdog 告警只發 send_info_notification(),
不再噴帶按鈕的審批卡到 Telegram。
2026-04-12 ogt (ADR-075 bugfix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:49:31 +08:00 |
|
AWOOOI CD
|
6490c6a885
|
chore(cd): deploy e5791b9 [skip ci]
|
2026-04-12 11:34:56 +00:00 |
|
OG T
|
e5791b9a91
|
perf(cd): 恢復 CACHE_BUST 方案,還原 5m50s Web build
CD Pipeline / build-and-deploy (push) Successful in 16m2s
實測結果:
- --no-cache: 10m50s(最慢)
- buildx registry cache: 不相容(docker driver 限制)
- CACHE_BUST=git_sha + inline cache: 5m50s(最快且安全)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:23:50 +08:00 |
|
OG T
|
7f3e585d6d
|
fix(webhooks): alertmanager handler — alert_type 超範圍改為 custom
CD Pipeline / build-and-deploy (push) Has been cancelled
AlertPayload.alert_type 只接受 8 個 Literal 值
ALERTNAME_TO_TYPE 映射回傳 host_cpu/backup_failure 等不在白名單 → ValidationError
修法:凡不在 Literal 白名單的 alert_type 一律 fallback 為 "custom"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:22:35 +08:00 |
|
OG T
|
edb97fd29b
|
fix(monitoring): 補回 4 個僅存於主機的 Prometheus 規則群組
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 41s
deploy-alerts.sh 部署時覆寫了這 4 個從未進 repo 的群組:
- awoooi_flywheel_health (5條:Playbook/Success/Vectorization/NullRate/Stuck)
- awoooi_backup_restore (2條:RestoreTestFailed/TestStale)
- awoooi_infrastructure_detailed (3條:Container/RedisStream/DiskGrowth)
- awoooi_host_connectivity (1條:NetworkPartition)
從 /home/wooo/monitoring/alerts.yml.bak_20260412_183835 還原。
offset PromQL 已修正為各個 selector 上,而非整個表達式。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:14:39 +08:00 |
|
OG T
|
5fe049de55
|
fix(backfill): 補充 ADR-075 三種新分類 (secops/flywheel_health/business)
_classify_alert() 與 classify_alert_early() 規則對齊,
確保回填腳本正確分類存量 incidents。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:13:07 +08:00 |
|
OG T
|
bc2665ef6b
|
feat(adr075): Step-5 decision_manager TYPE-5S/TYPE-6B 路由分支
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 secops elif:alert_category=secops → send_secops_card()
(resource, threat_behavior 從 incident.signals labels 提取)
- 新增 business elif:alert_category=business → send_business_alert()
(metric_name/current_value/threshold 從 Prometheus labels 提取)
- TYPE-7E escalation_monitor 標記 out-of-scope (ADR-075 範疇外)
- 兩分支均加 2026-04-12 ogt (ADR-075 Step-5) 變更標記
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:12:35 +08:00 |
|
AWOOOI CD
|
9f264ebad1
|
chore(cd): deploy e89d878 [skip ci]
|
2026-04-12 11:07:02 +00:00 |
|
OG T
|
f52dc459e6
|
feat(adr075): Step4 新增4組Prometheus規則 secops/business/flywheel_meta
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 41s
新增規則群組:
- awoooi_secops_alerts: UnauthorizedSSHLogin (5min>10次失敗)
- awoooi_business_alerts: AITokenCostSpike + GeminiAPIErrorRateHigh
- awoooi_flywheel_meta_alerts:
FlywheelPlaybookZero / FlywheelExecutionSuccessLow
FlywheelKMVectorizationLow / FlywheelIncidentsStuck
飛輪 meta 規則依賴 ADR-074 Exporter 指標
secops/business 規則依賴 node_exporter/awoooi custom metrics
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:51:23 +08:00 |
|
OG T
|
e89d878e06
|
fix(cd): 還原 Web build --no-cache,移除不相容的 buildx registry cache
CD Pipeline / build-and-deploy (push) Successful in 20m24s
buildx --cache-to type=registry + --output type=docker 在 docker driver 不支援
Web bundle 禁止快取(ADR-045/feedback_docker_buildkit_cache_poisoning)
快取毒化風險遠高於速度損失
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:51:15 +08:00 |
|
OG T
|
24c1b5677b
|
feat(adr075): Step1-3 classify補丁+新按鈕+TYPE-5S/6B/7E格式函數
Step-1 incident_service.py classify_alert_early():
- 新增 secops (TYPE-5S): UnauthorizedSSH/KubeAudit/CVE/WAFAttack/PodAbnormal
- 新增 business (TYPE-6B): AITokenCost/GeminiAPIError/SLOBurn/MomoScraper
- 新增 flywheel_health MCPProvider/OllamaDown/NemotronDown 前綴
- ssl_cert: 依 days_remaining 決定 TYPE-1(≥14d) vs TYPE-3(<14d)
Step-2 telegram_gateway.py _build_inline_keyboard():
- 新增 secops: [隔離] [封鎖IP] [驅逐] [確認授權]
- 新增 business: [暫停1h] [查SignOz] [忽略]
- 新增 flywheel_health: [觸發診斷] [飛輪面板] [靜默]
Step-3 telegram_gateway.py 新增格式化函數 (Tier 2):
- send_secops_card() — TYPE-5S 防禦按鈕+nonce
- send_business_alert() — TYPE-6B 業務損失速率
- send_escalation_card() — TYPE-7E P0/P1 升級,發 DM+群組
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:50:37 +08:00 |
|
OG T
|
65a5220e16
|
feat(flywheel-c2-c3): C2 hasType4接真實API + C3 WebSocket指數退避重連
CD Pipeline / build-and-deploy (push) Failing after 3m41s
C2: flywheel_stats_service 加 type4_count query → API 回傳
flywheel-diagram.tsx hasType4 改由 type4Count prop 驅動(非 false)
flywheel-kpi-card.tsx 傳入 type4Count={flowData?.type4_count}
C3: WebSocket onclose 加指數退避重連 (1s→2s→4s→最大30s)
cancelled 旗標確保 unmount 後不重連
wsRetryTimer 加入 cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:45:40 +08:00 |
|
OG T
|
079d0e89b9
|
docs(adr-075): 加入實作記錄 + LOGBOOK 更新(Phase 1+2+CR 全完成)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:44:57 +08:00 |
|
OG T
|
1cb654cf59
|
fix(adr-075): CR P0/P1 修補 — TYPE_8M enum + 死碼清理 + docstring 更新
CD Pipeline / build-and-deploy (push) Has been cancelled
P0-2: NotificationType 新增 TYPE_8M = "TYPE-8M"
classify_notification 早期回傳 TYPE-8M
decision_manager 改用 NotificationType.TYPE_8M enum 比較(移除字串字面量)
P1-1: 移除 _CATEGORY_BUTTONS 中不可達的 alertchain_health/flywheel_health 條目
P1-4: test_classify_alert_early.py docstring 更新為 13 條規則/10 分類
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:44:12 +08:00 |
|
OG T
|
561c1d806b
|
feat(adr-075): Phase 2 — TYPE-8M 飛輪/告警鏈路健康通知格式與路由
CD Pipeline / build-and-deploy (push) Failing after 4m0s
新增 send_meta_alert() — ⚙️ META SYSTEM 卡片(觸發診斷/查看面板/靜默)
decision_manager 新增 TYPE-8M elif 分支(在 TYPE-4D 後)
_alert_category 提取提前至 if 鏈前,三個分支共用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:39:04 +08:00 |
|
OG T
|
2cef2098d3
|
feat(adr-075): 修復 Telegram 動態按鈕 4 個斷點 + 新增 7 種告警分類
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點 A: decision_manager 提取 alert_category/notification_type 傳入 send_approval_card
斷點 B: send_approval_card 新增參數並傳遞至 _build_inline_keyboard
斷點 C: 互動型通知 (TYPE-3/4/4D/8M) 禁止發 SRE 群組,防 nonce 洩漏
斷點 D: _CATEGORY_BUTTONS k8s_workload → kubernetes + 新增 6 類按鈕組
classify_alert_early 新增: alertchain_health, flywheel_health, storage,
devops_tool, external_site, ssl_cert, host_resource (從 infrastructure 分離)
Test: 52 classify + 664 total passed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:35:56 +08:00 |
|
OG T
|
db282cd0e9
|
perf(cd): Web build 加速 — buildx registry cache + turbo cache mount
CD Pipeline / build-and-deploy (push) Has been cancelled
切換 docker buildx + type=registry cache (mode=max):
- 比 inline cache 更可靠,deps/runner 層存入 Harbor web-cache:buildcache
- 移除 BUILDKIT_INLINE_CACHE=1(不再需要)
Dockerfile 補 /root/.cache/turbo mount:
- Turborepo task hash 跨 build 生效,未變動 packages 直接跳過
- 配合既有 .next/cache mount,預期節省 1-2 min
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:33:27 +08:00 |
|
AWOOOI CD
|
022b3cd7d4
|
chore(cd): deploy 7fc1e0a [skip ci]
|
2026-04-12 10:12:04 +00:00 |
|
OG T
|
7fc1e0a767
|
fix(cd): 用 jq 建 JSON 修復中文 commit message 400
CD Pipeline / build-and-deploy (push) Successful in 16m14s
python3 stdin 與 data-urlencode 兩種方式均在 runner 失敗
jq --arg 直接接收 shell 變數,正確序列化 Unicode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:02:06 +08:00 |
|
OG T
|
587d745a50
|
fix(km): 修補 KMConversionService 兩個屬性錯誤
CD Pipeline / build-and-deploy (push) Failing after 28s
- incident.title → getattr(incident, 'title', None) or alertname
(Incident model 無 title 欄位)
- km_entry.entry_id → km_entry.id
(KnowledgeEntry model 主鍵為 id 非 entry_id)
- 補跑後 KM entries 714 → 821 (+107), incidents.vectorized 全部歸零
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:52:57 +08:00 |
|
OG T
|
80cdd36b9d
|
fix(cd): 棄用 python3 JSON 序列化,改用 --data-urlencode
CD Pipeline / build-and-deploy (push) Has been cancelled
runner 容器 Python 3.10 無法正確讀含中文的 stdin
(UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5)
兩個 Notify step 統一改用 --data-urlencode text@-
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:43:51 +08:00 |
|
OG T
|
38dddcc7a2
|
fix(heartbeat): KM向量化改用raw SQL + 格式優化去除空格對齊
CD Pipeline / build-and-deploy (push) Failing after 29s
- KM vectorized 改用 raw SQL (ORM 無 embedding 欄位)
- 移除 {display:<18} 空格對齊(非等寬字體Telegram會錯位)
- 格式: Name: value 每行一項,清楚易讀
- KM向量化加狀態icon (✅ ≥90% / ⚠️ <90%)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:36:01 +08:00 |
|
OG T
|
dd1b5a4364
|
fix(cd): 修補中文 commit message 導致 Notify Pipeline 400
CD Pipeline / build-and-deploy (push) Has been cancelled
PYTHONIOENCODING=utf-8 確保 python3 stdin 正確解碼 UTF-8
影響 Notify Pipeline Start + Notify Pipeline Failure 兩個 step
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:35:00 +08:00 |
|
OG T
|
a1691c41d5
|
fix(flywheel-stats): 修補 FlywheelStatsService 三個欄位錯誤
CD Pipeline / build-and-deploy (push) Failing after 30s
- KnowledgeEntryRecord.vectorized → embedding.is_(None) (欄位不存在)
- IncidentRecord.id → IncidentRecord.incident_id (主鍵名稱)
- 修復後 /api/v1/stats/flywheel nodes 不再全部回傳 unknown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:27:35 +08:00 |
|