docs: BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復 記錄

LOGBOOK + ADR-092 附錄 C — 2026-04-21 修復紀錄

E2E 驗證: telegram_approval_card_sent message_id=25045 (SignOzDown) ✓

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-21 21:58:48 +08:00
parent 0a72ae21e4
commit e2742ce9f3
2 changed files with 93 additions and 0 deletions

View File

@@ -6,6 +6,69 @@
---
## 📍 2026-04-21 下午 — BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復
### 問題
1. **Telegram BUTTON_DATA_INVALID (HTTP 400)**`devops_tool` 類別按鈕 nonce 超過 64 bytes Telegram 限制(`host_restart_service` nonce = 77B
2. **Gitea Code Review "AI 分析失敗"** — OpenClaw `/api/v1/analyze/code-review` 端點從未實作404
3. **Push review `'dict' object has no attribute 'issues'`**`local_code_review_service.review_push()` 回傳 dict呼叫端當 Pydantic model 用
### 根因 & 修法
| 問題 | 根因 | 修法 |
|------|------|------|
| BUTTON_DATA_INVALID | UUID 36 chars + action name (20) + ts + rand = 77B > 64 | base64url encode UUID bytes: 36→22 chars`host_restart_service` = 63B |
| Code review 404 | OpenClaw 只有 `/analyze/incident``/analyze/error` | `_call_openclaw_code_review` 改用 `local_code_review_service.review_pr()` |
| push review AttributeError | review_push() 回 dict呼叫端 `analysis.issues` 屬性訪問 | `_call_openclaw_push_review` 加 dict→CodeReviewResult 轉換 |
### E2E 驗證
- `host_restart_service` nonce = 63B ✓,所有 actions ≤ 64B ✓
- round-trip UUID decode = True ✓
- `telegram_approval_card_sent` message_id=25045 (SignOzDown devops_tool) ✓
### Commits
- `bd73548` BUTTON_DATA_INVALID 根因修復nonce 超 64B
- `caeb7a9` base64url UUID 壓縮(徹底修法)
- `acab1cd` Gitea code review 改 local service
- `8fd31ec` (deployed) pipeline 1009 成功
### 副發現
- `KM_CONVERTED` 缺失於 `alert_event_type` PG enumpre-existingnon-blocking
- SLO watchdog 回報 18 PENDING 無 TG 確認(是 BUTTON_DATA_INVALID 期間積累的歷史記錄)
---
## 📍 2026-04-21 凌晨 — aider-watch v2 完成 (ADR-091全景 E2E 驗證)
### 完成內容
- **aider CLI 安裝**aider v0.86.2OpenRouter Elephant Alpha ($0 free)OAuth 鑑權
- **aider-watch v2**Mac client → awoooi 飛輪完整閉環
- ServerAiderBatchIn / aider_events 表 / Redis stream / AiderEventProcessor worker
- Clientaiderw wrapper / buffer fallback / launchd 5min flush
- AI Routerfeedback_from_aider_events COALESCE SQLsession_end model 優先)
### E2E 驗證全過3 測試)
- C1: webhook → Redis → PG ✅2 rows written
- C2: 斷網 → buffer → flush → PG ✅buffer drain 後 1 row
- C3: model_stats_since COALESCE → `{'openrouter/elephant-alpha': 1.0}`
### 修復過程踩坑(全景比對發現)
| 坑 | 問題 | 修法 |
|----|------|------|
| stdlib logging | logger.info("...", count=N) → KeyError | → structlog.get_logger |
| worker pool | get_worker_redis() 在 lifespan 未初始化 → RuntimeError 靜默崩潰 | → init_worker_redis_pool() 加到 start() |
| model=unknown | session_start 發出時 model 未知SQL 只讀 session_start | → session_end 補 model+cwdSQL COALESCE |
| 假陽性 incident | error_count>=1 就建告警(包含 "no error" 等正常輸出) | → 只在 exit_code!=0 建 incident |
| 死程式碼 | get_aider_event_repository() 有資源洩漏 | → 移除 |
### Git 提交(共 11+ commits以 feat/fix 為主)
最後 commit`9e9bd86 fix(aider-watch): code-review fixes (4 issues)`
### 下一步(已排 Backlog
- `USE_AIDER_FEEDBACK=True` 灰度7天後若 elephant-alpha success_rate 穩定)
- `session_start` 補回 model需等 banner parse 完再發,或改成 patch event
---
## 📍 2026-04-20 上午 — P0.1 + P0.2 + P0.3 三項 Drift/Target 修復
### 統帥三問 RCA 後決議

View File

@@ -172,3 +172,33 @@ Grade: mature(90+) / in_progress(70-90) / starter(50-70) / initial(<50)
| C4 | watchdog 不偵測鏈路斷裂 | W-4 缺失 | `_count_approved_playbooks()`;為 0 → TYPE-8M | de2d34d |
**架構鐵律**`PlaybookSource.YAML_RULE` playbooks 是自動修復鏈路的「基礎設施」evolver 的 trust-based 退場邏輯不得觸及此類 playbooks。
---
## 附錄 C2026-04-21 — BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復
**觸發**Telegram 所有 `devops_tool` 類別告警卡片發送失敗HTTP 400 BUTTON_DATA_INVALID+ Gitea PR Code Review 顯示「AI 分析失敗」。
### Root Cause 鏈
| 症狀 | 斷點 | 根因 |
|------|------|------|
| Telegram 400 BUTTON_DATA_INVALID | `generate_callback_nonce` | UUID(36) + action(20) + ts(10) + rand(8) + colons = 77B > 64B Telegram 限制 |
| Gitea PR "AI 分析失敗" | `_call_openclaw_code_review` | OpenClaw 只有 `/analyze/incident``/analyze/error``/analyze/code-review` 從未實作404|
| Push review AttributeError | `_call_openclaw_push_review` | `local_code_review_service.review_push()` 回傳 dict呼叫端對 dict 做屬性訪問(`analysis.issues`|
### 修復
1. **nonce 壓縮** `security_interceptor.py``generate_callback_nonce` 用 base64url encode UUID bytes36→22 chars`parse_callback_data` 對應 decode`host_restart_service` nonce = 63B
2. **code review 改 local** `gitea_webhook_service.py``_call_openclaw_code_review` 改用 `local_code_review_service.review_pr()`Ollama + Gemini fallback
3. **push review dict→model** `gitea_webhook_service.py``_call_openclaw_push_review` 加 dict→`CodeReviewResult` 轉換
### E2E 驗證2026-04-21 21:57 台北)
- `host_restart_service` nonce = 63B ✓,所有 7 個 actions ≤ 64B ✓
- UUID round-trip decode = True ✓
- `telegram_approval_card_sent` message_id=25045SignOzDown devops_tool✓ 無 BUTTON_DATA_INVALID
### Commits
- `acab1cd` fix(gitea): code-review 改 local service + push review dict→CodeReviewResult
- `bd73548` fix(telegram): BUTTON_DATA_INVALID nonce 超 64B 根因修復
- `8fd31ec` fix(telegram): nonce UUID base64url 壓縮(徹底解決)