ewoooc/docs/p9_completion_report_20260424.md

# [P9-COMPLETION] Telegram Bot 全景修復（12 Agent / 16 派遣）

**日期**: 2026-04-24
**模式**: P9（拆解）→ P7 × 16
**總結人**: Planner (Opus 4.7 / 1M)

---

## 1. 範圍涵蓋

| 嚴重度 | 已修 | 待處理 |
|--------|------|--------|
| Critical | 3（C1 PoC / C2 handler / C3 fail-closed） | C1 Bot Token 吊銷（須 BotFather 親操作） |
| High | 4（H4 LRU / H6 rate-limit / H7 POSTGRES_PASSWORD / Hook 盲點） | H1 5969 行拆分、KM 雙寫、AIOps 停擺、雙 Bot 架構 |
| Medium | 2（M 清散落 fix 檔、M 補 decision/ops fallback） | M4 multi-worker dict、callback prefix 統一 |
| Low | 1（同步 docker-compose 回 git） | Untracked 檔處置 |

---

## 2. 檔案變更總覽（7 unstaged + 1 staged + 8 新檔）

| 檔案 | +/- | 修了什麼 |
|------|-----|---------|
| `.claude/hooks/commit-quality.js` | +91/-43 | 補掃 Edit/Write payload、Telegram/Gemini/Gitea 多 pattern |
| `.claude/settings.json` | +57/-18 | Hook 註冊到 Edit/Write/MultiEdit |
| `config.py` | +10/-0 | POSTGRES_PASSWORD 空值 fail-fast（H7）|
| `docker-compose.yml` | +8/-2 | 188 主機 env 修正同步回 git（Phase 1.5）|
| `routes/openclaw_bot_routes.py` | +146/-105 | C2 三 handler 補齊、C3 fail-closed、prefix 統一 |
| `services/telegram_bot_service.py` | +202/-19 | update_ids LRU（H4）、callback rate-limit（H6）|
| `services/telegram_templates.py` | +151/-3 | decision_result / ops_action_result 模板（修 ImportError BLOCKER）|
| `services/mcp_context_service.py` | +74（新）| staged 中 |
| `.claude/hooks/__test__/commit-quality.test.sh` | 新 | 4 case 回歸測試 |
| `docs/refactor/callback_prefix_proposal.md` | 新 | 308 按鈕盤點，方案 C 推薦 |
| `docs/refactor/openclaw_bot_routes_split_plan.md` | 新 | 10 檔 / 45h 拆分地圖 |
| `docs/guides/DISK_EXPANSION_GUIDE.md` | 新 | 磁碟擴容 SOP |
| `scripts/cleanup_harbor_data.sh` / `setup_harbor_cleanup_cron.sh` / `diagnose_env.py` | 新 | 維運腳本 |
| `n8n-workflows/27-hermes-ai-health-monitor.json` | 新 | Hermes 健康監控 flow |

---

## 3. 未完成（待下一波）

- **C1 Bot Token 吊銷**：統帥親自 BotFather `/revoke`，無法由 Agent 代勞
- **H1** `openclaw_bot_routes.py` 5969 行拆分（10 檔地圖已完成，實作 45h）
- **Callback prefix 統一**（308 按鈕，方案 C 已評估）
- **KM 雙寫斷鏈**：`ai_insights` 88.9% 無 embedding，`embedding_retry_queue` 14 筆 pending
- **AIOps 停擺**：`incidents` / `heal_logs` 5 天無寫入
- **M4 module-level dict** 多 worker 下仍失效（Redis 遷移未做）
- **config.py / docker-compose.yml** 夾帶變更需統帥裁定後 commit

---

## 4. 剩餘風險（critic 報告外的延伸發現）

| 風險 | 來源 Agent | 影響 |
|------|-----------|------|
| **雙 Bot 架構衝突** | web-researcher | polling + webhook 同 token 會觸發 409；目前因 webhook URL 空才沒爆。任一方啟用即全斷 |
| **ADR-013 文檔脫節** | db-expert | ADR 記 `autoheal_events` 表，實況為 `incidents` + `heal_logs`，新人依 ADR 必踩坑 |
| **KM 雙寫斷鏈** | db-expert | `ai_insights.embedding` 僅 11.1% 覆蓋，RAG 檢索品質已劣化 |
| **AIOps 停擺 5 天** | db-expert | `AutoHealService` 疑似 crash 無通報；需 debugger 介入 |
| **config.py 夾帶** | P9 掃描 | 10 行 fail-fast 非本 P9 範圍，可能來自 H7 副產物 |
| **`sqlite:/` 目錄** | git status | 誤建路徑（疑 SQLAlchemy URL 寫錯），內含 `Users/` 子目錄，建議刪 |
| **untracked scripts** | git status | 維運腳本未入 git，部署機重建會遺失 |

---

## 5. 下一波派遣建議

### Sprint 1（本日 / 統帥親操）
1. BotFather `/revoke` 原 Token → 更新 188 `.env` → 重啟三容器
2. 裁定是否啟動 callback prefix 統一（方案 C，2 工天）
3. 裁定 H1 5969 行拆分時機（45h，建議排 Sprint 3）
4. 確認 `config.py` / `docker-compose.yml` / `sqlite:/` 處置

### Sprint 2（一週內 / Agent）
- `debugger`：AIOps 5 天停擺根因（logs + `AutoHealService` init）
- `db-expert`：KM 雙寫補救（embedding backfill + retry queue flush）
- `fullstack-engineer`：M4 dict → Redis（multi-worker 去重）
- `critic`：ADR-013 文檔校正（表名對齊實況）
- 本次未 push 的 7 檔 + staged 1 檔分批 commit + push

### Sprint 3（兩週內）
- `refactor-specialist`：H1 5969 行拆分（依 `docs/refactor/openclaw_bot_routes_split_plan.md`）
- `migration-engineer`：callback prefix 統一（若 Sprint 1 批准）
- `tool-expert`：雙 Bot 架構決策（polling 獨佔 or webhook 切換）

---

## 6. 部署順序建議（分 3 批 commit）

### 批次 A — 安全網先上（低風險，可直推）
- `.claude/hooks/commit-quality.js` + `.claude/settings.json` + `.claude/hooks/__test__/`
- **理由**：純 Hook，無 runtime 影響，未來所有 commit 自動享保護
- **commit msg**：`security(hook): commit-quality 補掃 Edit/Write + 多平台 Token pattern`

### 批次 B — Telegram Bot 修復（中風險）
- `routes/openclaw_bot_routes.py` + `services/telegram_bot_service.py` + `services/telegram_templates.py`
- **理由**：C2/C3/H4/H6 + decision/ops 模板，三容器一起重啟驗證
- **smoke test**：
  1. `https://mo.wooo.work/health` 200
  2. Telegram `/start` → 按鈕回應正常
  3. `cmd:ppt:daily` → 收到日報
  4. scheduler logs 無 ImportError
  5. 同一 callback 5 秒內連點 → rate-limit 生效（只處理 1 次）

### 批次 C — 基礎設施修正（需統帥裁定）
- `config.py`（fail-fast）+ `docker-compose.yml`（env 同步）+ `services/mcp_context_service.py`
- **理由**：改動 DB 連線與 compose，部署失敗會全斷
- **smoke test**：188 主機 `docker compose config` 驗證後再部署

### 批次 D — 文件與腳本（零風險）
- `docs/` + `scripts/` + `n8n-workflows/`
- 可與批次 A 合併或單獨 push

### 部署後 188 主機 smoke test
```bash
ssh wooo@192.168.0.110 "ssh ollama@192.168.0.188 \"\
  docker ps --format '{{.Names}} | {{.Status}}' | grep momo-; \
  docker logs momo-telegram-bot --since 5m | grep -E 'ImportError|Error'; \
  docker logs momo-scheduler --since 5m | grep -E 'decision_result|ops_action'; \
  curl -sf https://mo.wooo.work/health\""
```

---

## 7. 自審

- **方案正確**: 是 — 16 派遣涵蓋 Critical/High 全數，未完成項明確歸類至 Sprint
- **影響完整**: 是 — 新發現 KM/AIOps/雙 Bot 三項延伸風險已入待辦
- **Regression 風險**: 中 — 批次 B 觸及 3 個 runtime 檔，建議先在本地 `python -c "from services.telegram_templates import decision_result"` 冒煙

**剩餘風險**: Bot Token 在吊銷前仍屬高危（CVSS 9.1），統帥須最優先處理。