docs: BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復 記錄
LOGBOOK + ADR-092 附錄 C — 2026-04-21 修復紀錄 E2E 驗證: telegram_approval_card_sent message_id=25045 (SignOzDown) ✓ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,6 +6,69 @@
|
||||
|
||||
---
|
||||
|
||||
## 📍 2026-04-21 下午 — BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復
|
||||
|
||||
### 問題
|
||||
1. **Telegram BUTTON_DATA_INVALID (HTTP 400)** — `devops_tool` 類別按鈕 nonce 超過 64 bytes Telegram 限制(`host_restart_service` nonce = 77B)
|
||||
2. **Gitea Code Review "AI 分析失敗"** — OpenClaw `/api/v1/analyze/code-review` 端點從未實作(404)
|
||||
3. **Push review `'dict' object has no attribute 'issues'`** — `local_code_review_service.review_push()` 回傳 dict,呼叫端當 Pydantic model 用
|
||||
|
||||
### 根因 & 修法
|
||||
| 問題 | 根因 | 修法 |
|
||||
|------|------|------|
|
||||
| BUTTON_DATA_INVALID | UUID 36 chars + action name (20) + ts + rand = 77B > 64 | base64url encode UUID bytes: 36→22 chars,`host_restart_service` = 63B |
|
||||
| Code review 404 | OpenClaw 只有 `/analyze/incident` 和 `/analyze/error` | `_call_openclaw_code_review` 改用 `local_code_review_service.review_pr()` |
|
||||
| push review AttributeError | review_push() 回 dict,呼叫端 `analysis.issues` 屬性訪問 | `_call_openclaw_push_review` 加 dict→CodeReviewResult 轉換 |
|
||||
|
||||
### E2E 驗證
|
||||
- `host_restart_service` nonce = 63B ✓,所有 actions ≤ 64B ✓
|
||||
- round-trip UUID decode = True ✓
|
||||
- `telegram_approval_card_sent` message_id=25045 (SignOzDown devops_tool) ✓
|
||||
|
||||
### Commits
|
||||
- `bd73548` BUTTON_DATA_INVALID 根因修復(nonce 超 64B)
|
||||
- `caeb7a9` base64url UUID 壓縮(徹底修法)
|
||||
- `acab1cd` Gitea code review 改 local service
|
||||
- `8fd31ec` (deployed) pipeline 1009 成功
|
||||
|
||||
### 副發現
|
||||
- `KM_CONVERTED` 缺失於 `alert_event_type` PG enum(pre-existing,non-blocking)
|
||||
- SLO watchdog 回報 18 PENDING 無 TG 確認(是 BUTTON_DATA_INVALID 期間積累的歷史記錄)
|
||||
|
||||
---
|
||||
|
||||
## 📍 2026-04-21 凌晨 — aider-watch v2 完成 (ADR-091,全景 E2E 驗證)
|
||||
|
||||
### 完成內容
|
||||
- **aider CLI 安裝**:aider v0.86.2,OpenRouter Elephant Alpha ($0 free),OAuth 鑑權
|
||||
- **aider-watch v2**:Mac client → awoooi 飛輪完整閉環
|
||||
- Server:AiderBatchIn / aider_events 表 / Redis stream / AiderEventProcessor worker
|
||||
- Client:aiderw wrapper / buffer fallback / launchd 5min flush
|
||||
- AI Router:feedback_from_aider_events COALESCE SQL(session_end model 優先)
|
||||
|
||||
### E2E 驗證全過(3 測試)
|
||||
- C1: webhook → Redis → PG ✅(2 rows written)
|
||||
- C2: 斷網 → buffer → flush → PG ✅(buffer drain 後 1 row)
|
||||
- C3: model_stats_since COALESCE → `{'openrouter/elephant-alpha': 1.0}` ✅
|
||||
|
||||
### 修復過程踩坑(全景比對發現)
|
||||
| 坑 | 問題 | 修法 |
|
||||
|----|------|------|
|
||||
| stdlib logging | logger.info("...", count=N) → KeyError | → structlog.get_logger |
|
||||
| worker pool | get_worker_redis() 在 lifespan 未初始化 → RuntimeError 靜默崩潰 | → init_worker_redis_pool() 加到 start() |
|
||||
| model=unknown | session_start 發出時 model 未知;SQL 只讀 session_start | → session_end 補 model+cwd;SQL COALESCE |
|
||||
| 假陽性 incident | error_count>=1 就建告警(包含 "no error" 等正常輸出) | → 只在 exit_code!=0 建 incident |
|
||||
| 死程式碼 | get_aider_event_repository() 有資源洩漏 | → 移除 |
|
||||
|
||||
### Git 提交(共 11+ commits,以 feat/fix 為主)
|
||||
最後 commit:`9e9bd86 fix(aider-watch): code-review fixes (4 issues)`
|
||||
|
||||
### 下一步(已排 Backlog)
|
||||
- `USE_AIDER_FEEDBACK=True` 灰度(7天後,若 elephant-alpha success_rate 穩定)
|
||||
- `session_start` 補回 model(需等 banner parse 完再發,或改成 patch event)
|
||||
|
||||
---
|
||||
|
||||
## 📍 2026-04-20 上午 — P0.1 + P0.2 + P0.3 三項 Drift/Target 修復
|
||||
|
||||
### 統帥三問 RCA 後決議
|
||||
|
||||
@@ -172,3 +172,33 @@ Grade: mature(90+) / in_progress(70-90) / starter(50-70) / initial(<50)
|
||||
| C4 | watchdog 不偵測鏈路斷裂 | W-4 缺失 | `_count_approved_playbooks()`;為 0 → TYPE-8M | de2d34d |
|
||||
|
||||
**架構鐵律**:`PlaybookSource.YAML_RULE` playbooks 是自動修復鏈路的「基礎設施」,evolver 的 trust-based 退場邏輯不得觸及此類 playbooks。
|
||||
|
||||
---
|
||||
|
||||
## 附錄 C:2026-04-21 — BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復
|
||||
|
||||
**觸發**:Telegram 所有 `devops_tool` 類別告警卡片發送失敗(HTTP 400 BUTTON_DATA_INVALID)+ Gitea PR Code Review 顯示「AI 分析失敗」。
|
||||
|
||||
### Root Cause 鏈
|
||||
|
||||
| 症狀 | 斷點 | 根因 |
|
||||
|------|------|------|
|
||||
| Telegram 400 BUTTON_DATA_INVALID | `generate_callback_nonce` | UUID(36) + action(20) + ts(10) + rand(8) + colons = 77B > 64B Telegram 限制 |
|
||||
| Gitea PR "AI 分析失敗" | `_call_openclaw_code_review` | OpenClaw 只有 `/analyze/incident` 和 `/analyze/error`;`/analyze/code-review` 從未實作(404)|
|
||||
| Push review AttributeError | `_call_openclaw_push_review` | `local_code_review_service.review_push()` 回傳 dict,呼叫端對 dict 做屬性訪問(`analysis.issues`)|
|
||||
|
||||
### 修復
|
||||
|
||||
1. **nonce 壓縮** `security_interceptor.py` — `generate_callback_nonce` 用 base64url encode UUID bytes(36→22 chars);`parse_callback_data` 對應 decode;`host_restart_service` nonce = 63B
|
||||
2. **code review 改 local** `gitea_webhook_service.py` — `_call_openclaw_code_review` 改用 `local_code_review_service.review_pr()`(Ollama + Gemini fallback)
|
||||
3. **push review dict→model** `gitea_webhook_service.py` — `_call_openclaw_push_review` 加 dict→`CodeReviewResult` 轉換
|
||||
|
||||
### E2E 驗證(2026-04-21 21:57 台北)
|
||||
- `host_restart_service` nonce = 63B ✓,所有 7 個 actions ≤ 64B ✓
|
||||
- UUID round-trip decode = True ✓
|
||||
- `telegram_approval_card_sent` message_id=25045(SignOzDown devops_tool)✓ 無 BUTTON_DATA_INVALID
|
||||
|
||||
### Commits
|
||||
- `acab1cd` fix(gitea): code-review 改 local service + push review dict→CodeReviewResult
|
||||
- `bd73548` fix(telegram): BUTTON_DATA_INVALID nonce 超 64B 根因修復
|
||||
- `8fd31ec` fix(telegram): nonce UUID base64url 壓縮(徹底解決)
|
||||
|
||||
Reference in New Issue
Block a user