docs(awooop): record t37 telegram callback closure

2026-05-18 00:12:08 +08:00
parent 68b20be2b4
commit 40ec5055e1
1 changed files with 37 additions and 0 deletions
--- a/docs/LOGBOOK.md
+++ b/docs/LOGBOOK.md
@@ -1,3 +1,40 @@
+## 2026-05-17 | T37 Telegram Approval Callback 補上 executor handoff
+
+**背景**：T36 後接著查 Telegram 截圖中的「批准後仍 blocked/manual_required、歷史統計 400、無法判斷是否真的 AI 自動化處理」。Live API 顯示 `INC-20260513-79ED5E` 仍是 `status=investigating`、`resolved_at=null`，且有 approval id；Run List 只有 legacy Telegram/Webhook evidence rows，remediation summary 為 `no_evidence`。這代表至少有一條 callback 路徑可能只完成 Telegram/approval 視覺蓋章，沒有把 executor / incident writeback 拉起來。
+
+**修正**：
+- `/api/v1/telegram/webhook` fallback 路徑：
+  - Telegram approve 成功且 `execution_triggered=true` 時，排程 `ApprovalExecutionService.execute_approved_action()`。
+  - response 加上 `execution_scheduled`，避免只知道 approved、不知道是否交給 executor。
+  - Telegram reject 成功後呼叫 `IncidentApprovalService.on_approval_status_change(..., rejected)`，讓 Incident 不再停在 investigating。
+- Active long-polling 路徑：
+  - 原本 approve 已有 `exec:{approval.id}` Redis lock 與 executor scheduling；本輪補上 reject 後的 Incident 狀態同步。
+  - 新增 `telegram_rejection_incident_synced_via_polling` / `telegram_rejection_incident_sync_failed_via_polling` 結構化 log，後續可直接用 SigNoz / logs 查 closure。
+- 驗證時確認 `polling_active=false` 是 AWOOOI API pod 設計：`main.py` 註明 API 不做 Long Polling，OpenClaw/188 是唯一 polling 實例。這不是 production API health 紅燈，但仍要在後續盤點 188 polling runtime。
+
+**本地驗證**：
+- `python -m py_compile apps/api/src/api/v1/telegram.py apps/api/src/services/telegram_gateway.py apps/api/tests/test_telegram_webhook_execution_handoff.py apps/api/tests/test_telegram_gateway_polling_handoff.py`：pass。
+- `ruff check --select F,E9 src/api/v1/telegram.py src/services/telegram_gateway.py tests/test_telegram_webhook_execution_handoff.py tests/test_telegram_gateway_polling_handoff.py`：pass。
+- `DATABASE_URL=postgresql+asyncpg://test:test@localhost:5432/test pytest tests/test_telegram_webhook_execution_handoff.py tests/test_telegram_gateway_polling_handoff.py tests/test_telegram_message_templates.py tests/test_telegram_adr050.py tests/test_approval_execution_no_action.py -q`：74 passed。
+- `git diff --check`：pass。
+
+**推版與 production 驗證**：
+- `913e1abc fix(telegram): execute approved callbacks` 已推 Gitea main；Code Review run `2231` success；CD run `2230` tests / build-and-deploy / post-deploy-checks success；deploy marker `06f64c6d chore(cd): deploy 913e1ab [skip ci]`。
+- `9e1b15da fix(telegram): sync rejected polling callbacks` 已推 Gitea main；Code Review run `2236` success；CD run `2235` tests / build-and-deploy / post-deploy-checks success；deploy marker `68b20be2 chore(cd): deploy 9e1b15d [skip ci]`。
+- Production image：`awoooi-api` 與 `awoooi-worker` 均為 `192.168.0.110:5000/awoooi/api:9e1b15dabf80db952a1faa5b00525f0475b93fd8`，replicas ready。
+- `https://awoooi.wooo.work/api/v1/health`：200 healthy，PostgreSQL / Redis / Ollama / OpenClaw / SigNoz all up。
+- `https://awoooi.wooo.work/api/v1/telegram/health`：200 configured，bot token / SRE group / whitelist 已設定；API pod `polling_active=false` 符合目前「API 不做 Long Polling」設計。
+- 本輪未主動送 Telegram approval/reject live-fire，避免對真實群組製造新告警或誤觸執行；以 unit/route tests、Gitea CD、production image/health 驗證部署。
+
+**目前整體進度**：
+- Alertmanager 低風險自動修復主線：約 98%。
+- 完整 AI 自動化管理產品化：約 99%。
+- 告警詳情/歷史/主卡/前端 deep-link 可追溯：約 99%。
+- Telegram approval / reject callback 閉環：約 92%。
+- Telegram 首屏與追查訊息流程可判讀：約 96%。
+- 前端 AI 自動化管理介面同步：約 98%。
+- T37 補上 Telegram callback 到 executor / incident sync 的關鍵斷點。下一段應做「188 OpenClaw polling runtime + callback logs + stuck incident reconciliation」盤點，確認 active polling 實例真的載到新程式，並處理 `INC-20260513-79ED5E` 這類既有 stuck incident 是否需要人工安全 reconciliation。
+
 ## 2026-05-17 | T36 Incident Evidence Header 同步到詳情與工作台

 **背景**：T34/T35 已讓 Telegram 與 Run List 可用 Incident ID 導到同一組 remediation evidence，但 Run Detail、Approval Detail、Work Items 仍各自呈現資料。Operator 從告警點進不同頁面時，還是要自行判斷「這個頁面和同一個 Incident/Run/Approval/Work Item 是否同一條鏈」。此外 Omni Terminal 浮動按鈕仍可能遮住右下角表格資訊。