fix(db): allow metric capacity violation types
Some checks failed
Code Review / ai-code-review (push) Successful in 11s
run-migration / migrate (push) Failing after 9s
CD Pipeline / tests (push) Successful in 1m4s
CD Pipeline / build-and-deploy (push) Successful in 3m29s
CD Pipeline / post-deploy-checks (push) Successful in 1m28s
Some checks failed
Code Review / ai-code-review (push) Successful in 11s
run-migration / migrate (push) Failing after 9s
CD Pipeline / tests (push) Successful in 1m4s
CD Pipeline / build-and-deploy (push) Successful in 3m29s
CD Pipeline / post-deploy-checks (push) Successful in 1m28s
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
-- ADR-090 capacity_violation_event metric violation types
|
||||
-- 日期:2026-05-07(台北)
|
||||
-- 目的:讓 capacity_scanner_job.py 寫入的 cpu/mem/swap 細項違規符合 DB constraint。
|
||||
--
|
||||
-- 背景:
|
||||
-- capacity_scanner_job.py 會寫入:
|
||||
-- - cpu_over_threshold
|
||||
-- - mem_over_threshold
|
||||
-- - swap_over_threshold
|
||||
-- 但原始 ADR-090 DDL 只允許較粗的 host_saturation,導致 production 出現
|
||||
-- capacity_violation_event_type_valid check violation,容量治理事件漏記。
|
||||
|
||||
BEGIN;
|
||||
|
||||
ALTER TABLE capacity_violation_event
|
||||
DROP CONSTRAINT IF EXISTS capacity_violation_event_type_valid;
|
||||
|
||||
ALTER TABLE capacity_violation_event
|
||||
ADD CONSTRAINT capacity_violation_event_type_valid
|
||||
CHECK (violation_type IN (
|
||||
'no_limit_set',
|
||||
'over_request',
|
||||
'over_limit',
|
||||
'host_saturation',
|
||||
'over_sla_budget',
|
||||
'unauthorized_new_deploy',
|
||||
'cpu_over_threshold',
|
||||
'mem_over_threshold',
|
||||
'swap_over_threshold',
|
||||
'load_over_threshold'
|
||||
));
|
||||
|
||||
COMMIT;
|
||||
|
||||
-- Rollback(需人工確認後執行):
|
||||
-- BEGIN;
|
||||
-- ALTER TABLE capacity_violation_event
|
||||
-- DROP CONSTRAINT IF EXISTS capacity_violation_event_type_valid;
|
||||
-- ALTER TABLE capacity_violation_event
|
||||
-- ADD CONSTRAINT capacity_violation_event_type_valid
|
||||
-- CHECK (violation_type IN (
|
||||
-- 'no_limit_set',
|
||||
-- 'over_request',
|
||||
-- 'over_limit',
|
||||
-- 'host_saturation',
|
||||
-- 'over_sla_budget',
|
||||
-- 'unauthorized_new_deploy'
|
||||
-- ));
|
||||
-- COMMIT;
|
||||
@@ -5268,3 +5268,59 @@ telegram_request_failed / telegram_api_error。
|
||||
- AwoooP + AI 自動化飛輪整體閉環:約 64%。
|
||||
|
||||
判讀:這輪解掉的是「人看不懂訊息狀態」的高痛點。下一步應補 AwoooP Run detail / Timeline 的狀態對照,讓每則 Telegram reply 都能在 Console 裡找到同一個 run / incident 的完整處置脈絡。
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-07(台北)— ADR-090 容量治理事件 constraint 修復
|
||||
|
||||
**背景**:
|
||||
|
||||
- production `capacity_scanner_job.py` 會寫入 `capacity_violation_event.violation_type=swap_over_threshold`。
|
||||
- ADR-090 原始 DB constraint 只允許 `host_saturation` 等粗粒度型別,導致容量治理事件寫入失敗:
|
||||
- `capacity_violation_write_failed`
|
||||
- `capacity_violation_event_type_valid`
|
||||
- 這會讓 AI 自動化飛輪少記一段容量異常事實,後續 AWOOOP / Governance Console 也無法看到完整事件脈絡。
|
||||
|
||||
**改動**:
|
||||
|
||||
- 新增 migration:
|
||||
- `apps/api/migrations/adr090_capacity_violation_metric_types_2026-05-07.sql`
|
||||
- `capacity_violation_event_type_valid` 新增允許型別:
|
||||
- `cpu_over_threshold`
|
||||
- `mem_over_threshold`
|
||||
- `swap_over_threshold`
|
||||
- `load_over_threshold`
|
||||
- 保留原有型別與人工 rollback SQL 註解。
|
||||
|
||||
**生產套用與驗證**:
|
||||
|
||||
```text
|
||||
DB:
|
||||
production PostgreSQL 192.168.0.188:5432 / awoooi_prod
|
||||
|
||||
套用方式:
|
||||
透過 awoooi-api Pod 使用 table owner 角色逐句執行 DDL。
|
||||
|
||||
注意:
|
||||
MIGRATION_DATABASE_URL 目前使用者為 awoooi_migrator,
|
||||
但 legacy table owner 是 awoooi;migrator 對 capacity_violation_event 無 ALTER owner 權限。
|
||||
本輪因此改用 DATABASE_URL 的 owner 角色套用。
|
||||
後續需補一項 DB migration governance:檢查 migrator 角色是否能管理既有 legacy tables。
|
||||
|
||||
Constraint 驗證:
|
||||
pg_get_constraintdef(capacity_violation_event_type_valid)
|
||||
包含 cpu_over_threshold / mem_over_threshold / swap_over_threshold / load_over_threshold。
|
||||
|
||||
Smoke:
|
||||
transaction rollback insert violation_type='swap_over_threshold' 通過,不留測試資料。
|
||||
|
||||
Log:
|
||||
套用後近 3 分鐘未再看到 capacity_violation_event_type_valid /
|
||||
capacity_violation_write_failed。
|
||||
```
|
||||
|
||||
**進度校準**:
|
||||
|
||||
- AwoooP + AI 自動化飛輪整體閉環:約 65%。
|
||||
|
||||
判讀:這輪修的是「治理事件資料落地」而不是畫面格式。AI 自動化要能閉環,scanner / governance / AWOOOP 必須先能完整記錄事實;否則前端再漂亮也只是看不到真相的控制台。
|
||||
|
||||
Reference in New Issue
Block a user