OG T
|
9c8dde0951
|
fix(telegram): 修復 Incident 無 title 欄位導致所有 Telegram 推送失敗
CD Pipeline / build-and-deploy (push) Failing after 2m3s
根因: _push_decision_to_telegram() 有兩處引用 incident.title,
但 Incident model 從來沒有此欄位,導致所有告警卡片推送都
拋 AttributeError,事件在 telegram_decision_push_failed 靜默失敗。
修法:
- line 188: message 改用 signal annotation summary/description/alert_name
- line 249: TYPE-1 title 改用 alertname label / signal.alert_name
影響: 自從 decision_manager 加入這兩行以來,所有 Telegram 通知都沒發出
(包含 TYPE-1 資訊通知和 TYPE-3 審批卡)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:02:55 +08:00 |
|
OG T
|
3d8b0e4f90
|
fix(adr075): TYPE-3 格式改用 spec 模板 — ACTION REQUIRED + AI深度診斷 + 建議修復動作
CD Pipeline / build-and-deploy (push) Failing after 2m15s
- 標頭改為 "{emoji} ACTION REQUIRED | {severity_zh}"
- 新增 "🧠 AI 深度診斷" 區塊 (分析/責任/AI來源)
- 新增 "⚡ 建議修復動作" 區塊 (<code> 格式)
- confidence=0 顯示 "📋 規則分析" 取代誤導性 "🔴 0%"
- SignOz 指標區塊補回 Trace 連結
2026-04-12 ogt: ADR-075 TYPE-3 格式標準化
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 21:00:28 +08:00 |
|
OG T
|
a7f2b9c0f5
|
fix(display): 規則匹配改顯示 ✅ 取代 🔴 0% + 修復 LLM 字串 confidence 解析
CD Pipeline / build-and-deploy (push) Has been cancelled
- telegram_gateway.py: confidence==0 (規則匹配/Expert fallback) 不再顯示
「🔴 0%」,改顯示「⚙️ 規則匹配 ✅」,兩個 card 類型都修正
- openclaw.py: NIM/Ollama 有時回傳字串 "0.85" 而非 float,導致
isinstance(str, int|float)=False → confidence 被強制設 0.0。
現在先嘗試 float() 解析,解析失敗才 fallback 0.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:50:53 +08:00 |
|
AWOOOI CD
|
f64393e4cb
|
chore(cd): deploy eda0cfd [skip ci]
|
2026-04-12 12:30:49 +00:00 |
|
OG T
|
eda0cfd034
|
fix(adr075): drift 通知改用 send_drift_card,補齊所有呼叫點
CD Pipeline / build-and-deploy (push) Successful in 14m13s
- drift.py: 移除死碼 send_text(),改由 narrate_and_notify() 統一發卡片
- drift_narrator_service: _send_telegram() 改呼 send_drift_card() 帶四顆按鈕
- webhooks.py /alerts 路徑: 補傳 alert_category 啟用動態按鈕
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:20:47 +08:00 |
|
AWOOOI CD
|
f4675872f9
|
chore(cd): deploy c3fea26 [skip ci]
|
2026-04-12 12:17:06 +00:00 |
|
OG T
|
c3fea26222
|
fix(adr075): webhooks send_approval_card 補傳 alert_category+notification_type
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點真正根因:_push_to_telegram_background 呼叫 send_approval_card()
時沒有傳入 alert_category 和 notification_type,導致動態按鈕永遠
fallback 到通用 [批准][拒絕][靜默]。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:07:12 +08:00 |
|
OG T
|
0a4b7e9609
|
fix(classify): HostBackupFailed 精確補入 backup/TYPE-1(測試通過)
CD Pipeline / build-and-deploy (push) Has been cancelled
前次修法用 'backup' in alertname_lower 太寬,導致 BackupJobFailed warning
被分到 TYPE-1,破壞 test_backup_keyword_warning_not_type1。
改為精確白名單:
_BACKUP_TYPE1_NAMES = {HostBackupFailed, HostBackupStale, HostBackupMissing,
BackupRestoreTestFailed, BackupRestoreTestStale}
+ alertname.startswith('HostBackup') 兜底
結果:664 passed, 0 failed
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:03:46 +08:00 |
|
OG T
|
f25d82a88a
|
fix(adr075): 修補斷點E — _push_to_telegram_background 補 TYPE-8M routing
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點E:alertmanager webhook 走 _push_to_telegram_background,
未含 TYPE-8M branch,導致 meta alert 從未送出。
- webhooks.py: 新增 alert_category 參數 + TYPE-8M branch
- incident_service.py: 還原 rule 5 僅攔 watchdog/heartbeat,
移除誤加的 backup startswith 規則(VeleroBackup 由 K8s rule 接管)
Tests: 52/52 passed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 20:01:51 +08:00 |
|
OG T
|
1f7975170a
|
fix(classify): HostBackupFailed 補入 backup/TYPE-1 規則
CD Pipeline / build-and-deploy (push) Failing after 1m51s
classify_alert_early() 的 backup 規則只攔 watchdog/Heartbeat,
HostBackupFailed 先被 Host prefix 規則攔走 → host_resource/TYPE-3 → 跑 LLM → 審批卡。
修法:在 Host prefix 前新增 backup 關鍵字/前綴攔截:
- HostBackup* / Backup* / VeleroBackup* / BackupRestore*
- alertname 含 "backup"(大小寫不敏感)
影響:所有備份相關告警直接走 TYPE-1 info 通知,不進 LLM。
HostHighCpu / HostDown 等非備份的 Host 告警不受影響。
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:52:05 +08:00 |
|
OG T
|
a5f17cea79
|
fix(notification): TYPE-1 backup/info 告警不再發審批卡
CD Pipeline / build-and-deploy (push) Has been cancelled
classify_notification() 不知道 alert_category,對 backup 告警
(confidence=0, auto_executed=False)返回 TYPE-3,覆蓋掉
classify_alert_early() 已設好的 notification_type=TYPE-1。
修法:在路由分支前,讓 incident.notification_type 明確值
(TYPE-1 / TYPE-4D / TYPE-8M)覆蓋 classify_notification()。
影響:backup/info/watchdog 告警只發 send_info_notification(),
不再噴帶按鈕的審批卡到 Telegram。
2026-04-12 ogt (ADR-075 bugfix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:49:31 +08:00 |
|
AWOOOI CD
|
6490c6a885
|
chore(cd): deploy e5791b9 [skip ci]
|
2026-04-12 11:34:56 +00:00 |
|
OG T
|
e5791b9a91
|
perf(cd): 恢復 CACHE_BUST 方案,還原 5m50s Web build
CD Pipeline / build-and-deploy (push) Successful in 16m2s
實測結果:
- --no-cache: 10m50s(最慢)
- buildx registry cache: 不相容(docker driver 限制)
- CACHE_BUST=git_sha + inline cache: 5m50s(最快且安全)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:23:50 +08:00 |
|
OG T
|
7f3e585d6d
|
fix(webhooks): alertmanager handler — alert_type 超範圍改為 custom
CD Pipeline / build-and-deploy (push) Has been cancelled
AlertPayload.alert_type 只接受 8 個 Literal 值
ALERTNAME_TO_TYPE 映射回傳 host_cpu/backup_failure 等不在白名單 → ValidationError
修法:凡不在 Literal 白名單的 alert_type 一律 fallback 為 "custom"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:22:35 +08:00 |
|
OG T
|
edb97fd29b
|
fix(monitoring): 補回 4 個僅存於主機的 Prometheus 規則群組
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 41s
deploy-alerts.sh 部署時覆寫了這 4 個從未進 repo 的群組:
- awoooi_flywheel_health (5條:Playbook/Success/Vectorization/NullRate/Stuck)
- awoooi_backup_restore (2條:RestoreTestFailed/TestStale)
- awoooi_infrastructure_detailed (3條:Container/RedisStream/DiskGrowth)
- awoooi_host_connectivity (1條:NetworkPartition)
從 /home/wooo/monitoring/alerts.yml.bak_20260412_183835 還原。
offset PromQL 已修正為各個 selector 上,而非整個表達式。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:14:39 +08:00 |
|
OG T
|
5fe049de55
|
fix(backfill): 補充 ADR-075 三種新分類 (secops/flywheel_health/business)
_classify_alert() 與 classify_alert_early() 規則對齊,
確保回填腳本正確分類存量 incidents。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:13:07 +08:00 |
|
OG T
|
bc2665ef6b
|
feat(adr075): Step-5 decision_manager TYPE-5S/TYPE-6B 路由分支
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 secops elif:alert_category=secops → send_secops_card()
(resource, threat_behavior 從 incident.signals labels 提取)
- 新增 business elif:alert_category=business → send_business_alert()
(metric_name/current_value/threshold 從 Prometheus labels 提取)
- TYPE-7E escalation_monitor 標記 out-of-scope (ADR-075 範疇外)
- 兩分支均加 2026-04-12 ogt (ADR-075 Step-5) 變更標記
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 19:12:35 +08:00 |
|
AWOOOI CD
|
9f264ebad1
|
chore(cd): deploy e89d878 [skip ci]
|
2026-04-12 11:07:02 +00:00 |
|
OG T
|
f52dc459e6
|
feat(adr075): Step4 新增4組Prometheus規則 secops/business/flywheel_meta
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 41s
新增規則群組:
- awoooi_secops_alerts: UnauthorizedSSHLogin (5min>10次失敗)
- awoooi_business_alerts: AITokenCostSpike + GeminiAPIErrorRateHigh
- awoooi_flywheel_meta_alerts:
FlywheelPlaybookZero / FlywheelExecutionSuccessLow
FlywheelKMVectorizationLow / FlywheelIncidentsStuck
飛輪 meta 規則依賴 ADR-074 Exporter 指標
secops/business 規則依賴 node_exporter/awoooi custom metrics
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:51:23 +08:00 |
|
OG T
|
e89d878e06
|
fix(cd): 還原 Web build --no-cache,移除不相容的 buildx registry cache
CD Pipeline / build-and-deploy (push) Successful in 20m24s
buildx --cache-to type=registry + --output type=docker 在 docker driver 不支援
Web bundle 禁止快取(ADR-045/feedback_docker_buildkit_cache_poisoning)
快取毒化風險遠高於速度損失
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:51:15 +08:00 |
|
OG T
|
24c1b5677b
|
feat(adr075): Step1-3 classify補丁+新按鈕+TYPE-5S/6B/7E格式函數
Step-1 incident_service.py classify_alert_early():
- 新增 secops (TYPE-5S): UnauthorizedSSH/KubeAudit/CVE/WAFAttack/PodAbnormal
- 新增 business (TYPE-6B): AITokenCost/GeminiAPIError/SLOBurn/MomoScraper
- 新增 flywheel_health MCPProvider/OllamaDown/NemotronDown 前綴
- ssl_cert: 依 days_remaining 決定 TYPE-1(≥14d) vs TYPE-3(<14d)
Step-2 telegram_gateway.py _build_inline_keyboard():
- 新增 secops: [隔離] [封鎖IP] [驅逐] [確認授權]
- 新增 business: [暫停1h] [查SignOz] [忽略]
- 新增 flywheel_health: [觸發診斷] [飛輪面板] [靜默]
Step-3 telegram_gateway.py 新增格式化函數 (Tier 2):
- send_secops_card() — TYPE-5S 防禦按鈕+nonce
- send_business_alert() — TYPE-6B 業務損失速率
- send_escalation_card() — TYPE-7E P0/P1 升級,發 DM+群組
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:50:37 +08:00 |
|
OG T
|
65a5220e16
|
feat(flywheel-c2-c3): C2 hasType4接真實API + C3 WebSocket指數退避重連
CD Pipeline / build-and-deploy (push) Failing after 3m41s
C2: flywheel_stats_service 加 type4_count query → API 回傳
flywheel-diagram.tsx hasType4 改由 type4Count prop 驅動(非 false)
flywheel-kpi-card.tsx 傳入 type4Count={flowData?.type4_count}
C3: WebSocket onclose 加指數退避重連 (1s→2s→4s→最大30s)
cancelled 旗標確保 unmount 後不重連
wsRetryTimer 加入 cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:45:40 +08:00 |
|
OG T
|
079d0e89b9
|
docs(adr-075): 加入實作記錄 + LOGBOOK 更新(Phase 1+2+CR 全完成)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:44:57 +08:00 |
|
OG T
|
1cb654cf59
|
fix(adr-075): CR P0/P1 修補 — TYPE_8M enum + 死碼清理 + docstring 更新
CD Pipeline / build-and-deploy (push) Has been cancelled
P0-2: NotificationType 新增 TYPE_8M = "TYPE-8M"
classify_notification 早期回傳 TYPE-8M
decision_manager 改用 NotificationType.TYPE_8M enum 比較(移除字串字面量)
P1-1: 移除 _CATEGORY_BUTTONS 中不可達的 alertchain_health/flywheel_health 條目
P1-4: test_classify_alert_early.py docstring 更新為 13 條規則/10 分類
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:44:12 +08:00 |
|
OG T
|
561c1d806b
|
feat(adr-075): Phase 2 — TYPE-8M 飛輪/告警鏈路健康通知格式與路由
CD Pipeline / build-and-deploy (push) Failing after 4m0s
新增 send_meta_alert() — ⚙️ META SYSTEM 卡片(觸發診斷/查看面板/靜默)
decision_manager 新增 TYPE-8M elif 分支(在 TYPE-4D 後)
_alert_category 提取提前至 if 鏈前,三個分支共用
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:39:04 +08:00 |
|
OG T
|
2cef2098d3
|
feat(adr-075): 修復 Telegram 動態按鈕 4 個斷點 + 新增 7 種告警分類
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點 A: decision_manager 提取 alert_category/notification_type 傳入 send_approval_card
斷點 B: send_approval_card 新增參數並傳遞至 _build_inline_keyboard
斷點 C: 互動型通知 (TYPE-3/4/4D/8M) 禁止發 SRE 群組,防 nonce 洩漏
斷點 D: _CATEGORY_BUTTONS k8s_workload → kubernetes + 新增 6 類按鈕組
classify_alert_early 新增: alertchain_health, flywheel_health, storage,
devops_tool, external_site, ssl_cert, host_resource (從 infrastructure 分離)
Test: 52 classify + 664 total passed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:35:56 +08:00 |
|
OG T
|
db282cd0e9
|
perf(cd): Web build 加速 — buildx registry cache + turbo cache mount
CD Pipeline / build-and-deploy (push) Has been cancelled
切換 docker buildx + type=registry cache (mode=max):
- 比 inline cache 更可靠,deps/runner 層存入 Harbor web-cache:buildcache
- 移除 BUILDKIT_INLINE_CACHE=1(不再需要)
Dockerfile 補 /root/.cache/turbo mount:
- Turborepo task hash 跨 build 生效,未變動 packages 直接跳過
- 配合既有 .next/cache mount,預期節省 1-2 min
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:33:27 +08:00 |
|
AWOOOI CD
|
022b3cd7d4
|
chore(cd): deploy 7fc1e0a [skip ci]
|
2026-04-12 10:12:04 +00:00 |
|
OG T
|
7fc1e0a767
|
fix(cd): 用 jq 建 JSON 修復中文 commit message 400
CD Pipeline / build-and-deploy (push) Successful in 16m14s
python3 stdin 與 data-urlencode 兩種方式均在 runner 失敗
jq --arg 直接接收 shell 變數,正確序列化 Unicode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:02:06 +08:00 |
|
OG T
|
587d745a50
|
fix(km): 修補 KMConversionService 兩個屬性錯誤
CD Pipeline / build-and-deploy (push) Failing after 28s
- incident.title → getattr(incident, 'title', None) or alertname
(Incident model 無 title 欄位)
- km_entry.entry_id → km_entry.id
(KnowledgeEntry model 主鍵為 id 非 entry_id)
- 補跑後 KM entries 714 → 821 (+107), incidents.vectorized 全部歸零
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:52:57 +08:00 |
|
OG T
|
80cdd36b9d
|
fix(cd): 棄用 python3 JSON 序列化,改用 --data-urlencode
CD Pipeline / build-and-deploy (push) Has been cancelled
runner 容器 Python 3.10 無法正確讀含中文的 stdin
(UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5)
兩個 Notify step 統一改用 --data-urlencode text@-
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:43:51 +08:00 |
|
OG T
|
38dddcc7a2
|
fix(heartbeat): KM向量化改用raw SQL + 格式優化去除空格對齊
CD Pipeline / build-and-deploy (push) Failing after 29s
- KM vectorized 改用 raw SQL (ORM 無 embedding 欄位)
- 移除 {display:<18} 空格對齊(非等寬字體Telegram會錯位)
- 格式: Name: value 每行一項,清楚易讀
- KM向量化加狀態icon (✅ ≥90% / ⚠️ <90%)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:36:01 +08:00 |
|
OG T
|
dd1b5a4364
|
fix(cd): 修補中文 commit message 導致 Notify Pipeline 400
CD Pipeline / build-and-deploy (push) Has been cancelled
PYTHONIOENCODING=utf-8 確保 python3 stdin 正確解碼 UTF-8
影響 Notify Pipeline Start + Notify Pipeline Failure 兩個 step
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:35:00 +08:00 |
|
OG T
|
a1691c41d5
|
fix(flywheel-stats): 修補 FlywheelStatsService 三個欄位錯誤
CD Pipeline / build-and-deploy (push) Failing after 30s
- KnowledgeEntryRecord.vectorized → embedding.is_(None) (欄位不存在)
- IncidentRecord.id → IncidentRecord.incident_id (主鍵名稱)
- 修復後 /api/v1/stats/flywheel nodes 不再全部回傳 unknown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:27:35 +08:00 |
|
AWOOOI CD
|
295869d6c7
|
chore(cd): deploy 99b489c [skip ci]
|
2026-04-12 09:25:11 +00:00 |
|
OG T
|
99b489ca63
|
fix(flywheel): 修補剩餘 P0/P1 缺陷
CD Pipeline / build-and-deploy (push) Has been cancelled
- CRITICAL-1: TYPE-1 path approval_id=str(alert_id) → uuid.uuid4(),
避免 UUID(approval_id) 拋 ValueError 導致所有 Heartbeat/Info 告警崩潰
- CRITICAL-2: asyncio.create_task() 結果存入 _exec_task 並加 done_callback,
防止 GC 在執行中途回收任務
- FORMAT: _push_to_telegram_background 新增 notification_type + diff_summary 參數,
TYPE-4D → send_drift_card(),其他 → send_approval_card()(修正 ConfigDrift 顯示錯誤卡片)
- 傳遞 notification_type 至 Alertmanager 兩個呼叫點
ADR-073 四斷點修補最終收尾
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:14:57 +08:00 |
|
AWOOOI CD
|
cce55d560d
|
chore(cd): deploy f0e1413 [skip ci]
|
2026-04-12 09:10:35 +00:00 |
|
OG T
|
f0e14136ca
|
fix(flywheel): 修補飛輪四個核心斷點,讓完整流程真正串接起來
CD Pipeline / build-and-deploy (push) Has been cancelled
1. incident_service.py: save_to_episodic_memory() 補寫 alertname/notification_type/alert_category
→ 之前這3欄在DB永遠NULL,LLM無alertname,Playbook匹配全失敗
2. telegram_gateway.py: Telegram批准後呼叫 execute_approved_action()
→ 之前sign_approval()只改DB狀態,380筆批准0筆真正執行kubectl指令
3. approval_execution.py: 執行成功後呼叫 resolve_incident()
webhooks.py: auto-repair成功後呼叫 resolve_incident()
→ 之前Incident永遠停在INVESTIGATING,KM轉換永遠不觸發,Playbook=0
4. webhooks.py: TYPE-1告警短路,不進LLM
→ 之前Heartbeat/Backup/Info仍燒LLM token,產生垃圾修復建議
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 17:01:10 +08:00 |
|
AWOOOI CD
|
d2286ca827
|
chore(cd): deploy 93f9522 [skip ci]
|
2026-04-12 08:42:45 +00:00 |
|
OG T
|
93f9522d5a
|
fix(heartbeat): 對齊整點發送避免多replica各自發 + KM向量化改查embedding欄位
CD Pipeline / build-and-deploy (push) Successful in 14m10s
- _heartbeat_loop: 先 sleep 到下一個整點倍數再開始循環
避免 3 個 replica 啟動時間不同導致短時間內收到多條心跳
- heartbeat_report_service: km_vectorized 改查 KnowledgeEntryRecord.embedding IS NOT NULL
原本錯誤查 IncidentRecord.vectorized 導致顯示 0/714 (0%)
2026-04-12 ogt (ADR-073 heartbeat fix)
|
2026-04-12 16:33:15 +08:00 |
|
AWOOOI CD
|
c8e9fbb518
|
chore(cd): deploy effd788 [skip ci]
|
2026-04-12 08:23:16 +00:00 |
|
OG T
|
effd78807e
|
fix(heartbeat): blocking_timeout 5→0,多 replica 不排隊等鎖避免重複發送
CD Pipeline / build-and-deploy (push) Successful in 14m0s
3 個 replica 各自跑 loop,blocking_timeout=5.0 導致鎖釋放後
其他 replica 依序拿鎖,每次心跳最多發 3 條。
改為 blocking_timeout=0:拿不到鎖立刻跳過,同週期只發一條。
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 16:13:41 +08:00 |
|
OG T
|
a28625f088
|
fix(cr): 首席架構師 CR P0/P1/P2 全修補
CD Pipeline / build-and-deploy (push) Has been cancelled
P0-1: incident_service.py — 刪除 classify_alert_early 死碼 L131-132
P0-2: cron_backup_restore_test.sh — date +%s%3N→+%s,修正毫秒時間戳
P1-2: gitea_webhook.py — fingerprint 移除 sha_short,收斂同 branch 失敗
heartbeat: 還原原始空格對齊格式(統帥要求原本怎樣就怎樣)
P1-1(積木化)/P1-3(TYPE-4)/P2-1(timeZone)/P2-2(IP)/P2-3(WS重連) 待後續處理
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 16:10:46 +08:00 |
|
OG T
|
d72c7d5ac4
|
fix(P0): classify_alert_early 參數名稱修正 _labels→labels
CD Pipeline / build-and-deploy (push) Has been cancelled
webhooks.py 呼叫傳 labels= 但函數定義用 _labels,導致所有
Alertmanager webhook 500,告警鏈路完全中斷。
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 16:02:25 +08:00 |
|
OG T
|
36f285fb85
|
fix(heartbeat): 移除空格對齊,改用直接排版避免 Telegram 跑版
Telegram HTML 模式不渲染等寬字型,空格對齊無效。
改成不對齊但清晰的格式,每行直接顯示 label + value。
2026-04-12 ogt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 16:01:47 +08:00 |
|
AWOOOI CD
|
444b17513d
|
chore(cd): deploy 9b1812c [skip ci]
|
2026-04-12 07:52:09 +00:00 |
|
OG T
|
2f6859f76f
|
docs(logbook): Session 結尾 — 層次三 M3-M5 + 層次四 C2-C4 全完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 15:43:06 +08:00 |
|
OG T
|
9b1812cdef
|
feat(c4): ADR-073-C C4 — 飛輪人工介入路徑視覺化
CD Pipeline / build-and-deploy (push) Successful in 14m5s
新增 FlywheelDiagram SVG 元件:
- 六節點流程圖(監控→去重→診斷→推理→執行→學習)
- TYPE-3 觸發時:紅色虛線 推理→人工處理中心
- TYPE-4 觸發時:橙色虛線 推理→根因確認
- 活躍節點高亮 + incident 計數徽章
- 整合進 FlywheelKPICard(消費 /api/v1/stats/flywheel)
2026-04-12 ogt (ADR-073-C C4)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 15:41:33 +08:00 |
|
OG T
|
0c2892ac19
|
feat(c3): ADR-073-C C3 — WebSocket 飛輪即時推送
後端:
- stats.py 新增 @router.websocket('/flywheel/ws')
- 每 10 秒推送 flywheel_summary JSON
前端 FlywheelKPICard:
- WebSocket 優先,WS 斷線自動降級到 30s HTTP 輪詢
- onopen 時停止 HTTP polling,onclose 時恢復
2026-04-12 ogt (ADR-073-C C3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 15:40:20 +08:00 |
|
OG T
|
4b51f9b60d
|
feat(c2): ADR-073-C C2 — 前端飛輪 KPI 元件接真實 API
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 FlywheelKPICard 元件
- 消費 GET /api/v1/stats/summary,30 秒輪詢
- 顯示 Playbooks、修復成功率、今日轉化數、KM 向量化率
- 卡住 Incident 警示條
- 插入首頁右欄 PendingApprovalsCard 之後
2026-04-12 ogt (ADR-073-C C2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 15:39:10 +08:00 |
|