docs(logbook): record awooop grouped alert events deploy [skip ci]
This commit is contained in:
@@ -4775,3 +4775,102 @@ API log:
|
||||
```
|
||||
|
||||
判讀:Telegram 噪音治理目前完成兩層防線:同 Incident 後續訊息接回原卡;跨 Incident 同組告警從第二個 firing 起收斂。下一步要把 grouped child alert 的摘要與計數寫進 AwoooP Timeline / Run Monitor,讓 Telegram 不洗版但 Console 仍保留完整脈絡。
|
||||
|
||||
### 01:42 AwoooP 收斂事件落庫與 Run 監控顯示
|
||||
|
||||
**背景**:
|
||||
|
||||
- 上一輪把 AlertGrouping 門檻調到 2 後,第二個同組告警會短路 LLM / Telegram,解決洗版與 token 成本問題。
|
||||
- 但該分支原本只寫 `alertmanager_grouped_skip` log,Operator Console 看不到「哪些子告警被合併」,會造成 Telegram 安靜但前端失去脈絡。
|
||||
- 本輪補上「不發 Telegram、但落 AwoooP 事件流」的控制面紀錄。
|
||||
|
||||
**改動**:
|
||||
|
||||
- `channel_hub.py` 新增 grouped alert event helper:
|
||||
- `build_grouped_alert_provider_event_id()`:產生 `alert-group:{alert_id}:{fingerprint}` 冪等 ID。
|
||||
- `format_grouped_alert_event_content()`:整理 alertname、severity、namespace、target、group count、parent/child fingerprint。
|
||||
- `record_grouped_alert_event()`:以 `channel_type=internal` 寫入 `awooop_conversation_event`,DB 失敗 fail-open,不阻斷 Alertmanager ACK。
|
||||
- `webhooks.py`:在 `grouping_result.is_grouped` 分支用 `background_tasks.add_task()` 背景落庫,仍立即回覆 `converged=True`,不進 LLM、不發 Telegram。
|
||||
- Platform API 新增 `GET /api/v1/platform/events/recent`,可依 `project_id`、`channel_type`、`provider_prefix` 查最近事件。
|
||||
- `/zh-TW/awooop/runs` 新增「最近告警收斂」區塊,讀取 `channel_type=internal&provider_prefix=alert-group`,讓 grouped child alert 出現在 Operator Console,而非 Telegram。
|
||||
|
||||
**驗證**:
|
||||
|
||||
```text
|
||||
py_compile:
|
||||
apps/api/src/services/channel_hub.py
|
||||
apps/api/src/api/v1/webhooks.py
|
||||
apps/api/src/services/platform_operator_service.py
|
||||
apps/api/src/api/v1/platform/events.py
|
||||
apps/api/src/api/v1/platform/__init__.py
|
||||
apps/api/tests/test_channel_hub_grouped_alert_events.py
|
||||
apps/api/tests/test_platform_router_order.py
|
||||
# passed
|
||||
|
||||
pytest:
|
||||
DATABASE_URL='postgresql+asyncpg://test:test@127.0.0.1:5432/test' \
|
||||
/Users/ogt/awoooi/apps/api/.venv/bin/python -m pytest \
|
||||
apps/api/tests/test_channel_hub_grouped_alert_events.py \
|
||||
apps/api/tests/test_platform_router_order.py \
|
||||
apps/api/tests/test_alert_grouping_service.py -q
|
||||
# 20 passed
|
||||
|
||||
ruff import order:
|
||||
channel_hub.py / platform_operator_service.py / platform/events.py /
|
||||
platform/__init__.py / grouped alert tests / platform router tests
|
||||
# All checks passed
|
||||
|
||||
frontend:
|
||||
pnpm --filter @awoooi/web typecheck
|
||||
# passed
|
||||
|
||||
NEXT_PUBLIC_API_URL='https://awoooi.wooo.work' pnpm --filter @awoooi/web build
|
||||
# passed
|
||||
```
|
||||
|
||||
**生產部署**:
|
||||
|
||||
```text
|
||||
Commit:
|
||||
251554c0 fix(awooop): record grouped alert events
|
||||
|
||||
Gitea workflows:
|
||||
1843 CD Pipeline:
|
||||
- tests -> success
|
||||
- build-and-deploy -> success
|
||||
- post-deploy-checks -> success
|
||||
1844 Code Review -> success
|
||||
|
||||
CD deploy marker:
|
||||
e5fd9395 chore(cd): deploy 251554c [skip ci]
|
||||
|
||||
awoooi-api image:
|
||||
192.168.0.110:5000/awoooi/api:251554c0440f0b6c0f2668dcee7780495c873c57
|
||||
|
||||
awoooi-web image:
|
||||
192.168.0.110:5000/awoooi/web:251554c0440f0b6c0f2668dcee7780495c873c57
|
||||
|
||||
awoooi-worker image:
|
||||
192.168.0.110:5000/awoooi/api:251554c0440f0b6c0f2668dcee7780495c873c57
|
||||
|
||||
K8s rollout:
|
||||
awoooi-api -> successfully rolled out
|
||||
awoooi-web -> successfully rolled out
|
||||
awoooi-worker -> successfully rolled out
|
||||
awoooi-api pods -> 2/2 Running, 0 restarts
|
||||
awoooi-web pods -> 2/2 Running, 0 restarts
|
||||
awoooi-worker pod -> 1/1 Running, 0 restarts
|
||||
|
||||
HTTP:
|
||||
/api/v1/health -> 200
|
||||
/zh-TW/awooop -> 200, no Application error
|
||||
/zh-TW/awooop/runs -> 200, no Application error, contains 最近告警收斂
|
||||
/zh-TW/awooop/approvals -> 200, no Application error
|
||||
/api/v1/platform/events/recent?channel_type=internal&provider_prefix=alert-group&limit=1
|
||||
-> 200, {"events": [], "total": 0, "limit": 1}
|
||||
|
||||
API log:
|
||||
近 10 分鐘未見 grouped_alert_event_record_failed / Traceback。
|
||||
```
|
||||
|
||||
判讀:Telegram 噪音治理第三層已上線。後續同組告警第二筆起會被收斂,不再發 Telegram 主卡,也會以 internal channel event 進 AwoooP Run 監控。下一步若仍覺得群組雜亂,應改「父卡定時更新摘要」或「戰情室 thread digest」,不要恢復每筆子告警發送。
|
||||
|
||||
Reference in New Issue
Block a user