docs(logbook): record grouped alert digest deploy [skip ci]
This commit is contained in:
@@ -4874,3 +4874,94 @@ API log:
|
||||
```
|
||||
|
||||
判讀:Telegram 噪音治理第三層已上線。後續同組告警第二筆起會被收斂,不再發 Telegram 主卡,也會以 internal channel event 進 AwoooP Run 監控。下一步若仍覺得群組雜亂,應改「父卡定時更新摘要」或「戰情室 thread digest」,不要恢復每筆子告警發送。
|
||||
|
||||
### 01:59 父卡 Digest:同組告警只回覆摘要,不再洗版
|
||||
|
||||
**背景**:
|
||||
|
||||
- 前一輪已讓 grouped child alert 不再發 Telegram 主卡,並改落 AwoooP internal event。
|
||||
- 但純粹只落 Console 會讓 Telegram 戰情室不知道「這組告警還在持續」,因此補一層低噪音 digest。
|
||||
- 原則:不新增主卡、不逐筆子告警、不重進 LLM;只在父 Incident 卡下方用 reply 發短摘要,且有 Redis cooldown。
|
||||
|
||||
**改動**:
|
||||
|
||||
- `telegram_gateway.py` 新增 `append_grouped_alert_digest()`:
|
||||
- 讀取 `tg_msg:{incident_id}` 找父卡 message id。
|
||||
- 找不到父卡時只寫 structured log,靜默降級為 AwoooP-only,不發 Telegram。
|
||||
- 找到父卡後才設 `awoooi:tg_group_digest:{group_key}` NX cooldown,避免父卡還沒建立時誤吃掉 digest 機會。
|
||||
- 使用 Telegram `reply_parameters` 回覆在父卡下面,不修改原卡 inline buttons。
|
||||
- `channel_hub.py` 新增 grouped alert digest formatter / dispatcher:
|
||||
- 摘要欄位只保留類型、嚴重度、目標、命名空間、已收斂數量、父/子指紋短碼。
|
||||
- HTML escape 後才送 Telegram。
|
||||
- 透過 `parent_fingerprint` 查 `ApprovalRecord.incident_id`,找不到父 Incident 時不 top-post。
|
||||
- `alert_grouping_service.py` 補 Redis member bytes decode,確保 parent fingerprint 在不同 Redis client decode 設定下都能對上 DB。
|
||||
|
||||
**驗證**:
|
||||
|
||||
```text
|
||||
py_compile:
|
||||
apps/api/src/services/channel_hub.py
|
||||
apps/api/src/services/telegram_gateway.py
|
||||
apps/api/src/services/alert_grouping_service.py
|
||||
# passed
|
||||
|
||||
pytest:
|
||||
DATABASE_URL='postgresql+asyncpg://test:test@127.0.0.1:5432/test' \
|
||||
/Users/ogt/awoooi/apps/api/.venv/bin/python -m pytest \
|
||||
apps/api/tests/test_telegram_message_templates.py \
|
||||
apps/api/tests/test_channel_hub_grouped_alert_events.py \
|
||||
apps/api/tests/test_alert_grouping_service.py -q
|
||||
# 46 passed
|
||||
|
||||
ruff import order:
|
||||
channel_hub.py / alert_grouping_service.py /
|
||||
test_channel_hub_grouped_alert_events.py / test_alert_grouping_service.py
|
||||
# All checks passed
|
||||
|
||||
note:
|
||||
telegram_gateway.py 為既有大型檔案,未跑整檔 ruff import-order;
|
||||
本輪以 py_compile 與 Telegram/template 相關測試驗證窄改。
|
||||
```
|
||||
|
||||
**生產部署**:
|
||||
|
||||
```text
|
||||
Commit:
|
||||
6ac61ab6 fix(telegram): digest grouped alert storms
|
||||
|
||||
Gitea workflows:
|
||||
1845 CD Pipeline:
|
||||
- tests -> success
|
||||
- build-and-deploy -> success
|
||||
- post-deploy-checks -> success
|
||||
1846 Code Review -> success
|
||||
|
||||
CD deploy marker:
|
||||
14180182 chore(cd): deploy 6ac61ab [skip ci]
|
||||
|
||||
awoooi-api image:
|
||||
192.168.0.110:5000/awoooi/api:6ac61ab6d7fa4b623799227150c1f8f0856da9f1
|
||||
|
||||
awoooi-web image:
|
||||
192.168.0.110:5000/awoooi/web:6ac61ab6d7fa4b623799227150c1f8f0856da9f1
|
||||
|
||||
awoooi-worker image:
|
||||
192.168.0.110:5000/awoooi/api:6ac61ab6d7fa4b623799227150c1f8f0856da9f1
|
||||
|
||||
K8s rollout:
|
||||
awoooi-api -> 2/2 ready
|
||||
awoooi-web -> 2/2 ready
|
||||
awoooi-worker -> 1/1 ready
|
||||
|
||||
HTTP:
|
||||
/api/v1/health -> 200
|
||||
/zh-TW/awooop/runs -> 200
|
||||
/api/v1/platform/events/recent?channel_type=internal&provider_prefix=alert-group&limit=1
|
||||
-> 200
|
||||
|
||||
API / worker log:
|
||||
近 10 分鐘未見 grouped_alert_event_record_failed / Traceback /
|
||||
telegram_request_failed / telegram_api_error。
|
||||
```
|
||||
|
||||
判讀:Telegram 噪音治理第四層已上線。現在告警處理路徑是:第一張父卡讓人看見事件;同組子告警落 AwoooP event;若父卡存在,Telegram 只補低頻 digest reply。後續要再改善,應進入「父卡狀態編輯 / AwoooP Run drilldown / 每小時戰情室摘要」三選一,不再增加逐筆 Telegram 訊息。
|
||||
|
||||
Reference in New Issue
Block a user