docs(aider-watch v2): 補 4 個全景盲點

統帥 2026-04-20 提醒「每次更新都不忘全景」— 在執行前做二次檢查
發現 4 個 plan 未處理的盲點,現補齊:

盲點 1:Mac 外網可達性
  - spec §8 + §8b 新增 Tailscale/nginx/VPN 三選一
  - plan Task B5 install.sh 前置提醒選配置

盲點 2:incident 洗版(同 session 多 error)
  - spec §8 新增 coalesce 策略(60s 窗口 per session_id)
  - plan Task A5 service 實作 create_incident_for_event 加 coalesce 邏輯
  - 加 2 個測試 case 驗證同 session reuse + 不同 session 分離

盲點 3:AI Router feedback 首次 rollout 風險
  - spec §8 新增 USE_AIDER_FEEDBACK flag 預設 false,灰度 7 天再開
  - plan Task A8 route() hook 外包 if settings.USE_AIDER_FEEDBACK block
  - plan Task A9 config 加 USE_AIDER_FEEDBACK: bool = False

盲點 4:AWOOOI_PG_PW secret 取得
  - spec §8c 新增 kubectl get secret → env → shred 流程
  - plan Task A0 Step 1 明確寫出 K8s Secret 讀取 + 立即銷毀檔案

符合 feedback_ai_autonomous_direction.md 的全景思考紀律。
執行策略:全 subagent-driven(統帥批准)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-04-20 03:52:49 +08:00
parent 345e6832da
commit 8d40bbff2b
2 changed files with 177 additions and 27 deletions

View File

@@ -56,35 +56,50 @@
# Phase A: Server-side10 tasks
## Task A0: 生 HMAC secret + 本地開發 env
## Task A0: 前置 — 取 PG 密碼 + 生 HMAC secret + 本 env
**Files:**
- Create: `/tmp/aider-watch-dev.env` (只在本機用,不進 git
- Create: `/tmp/aider_webhook_secret.txt`(安裝後 shred 刪
- [ ] **Step 1: 生成 HMAC secret32 bytes hex**
- [ ] **Step 1: 從 K8s Secret 取 PG 密碼**(🆕 盲點 4明確 secret 來源)
```bash
# 需 kubectl 已有 awoooi cluster 權限(~/.kube/config 指向正確 context
kubectl config current-context # 期待 awoooi-* 或 k3s-188
kubectl get secret awoooi-secrets -n awoooi \
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d > /tmp/pg_pw.txt
export AWOOOI_PG_PW="$(cat /tmp/pg_pw.txt)"
shred -u /tmp/pg_pw.txt # 立即銷毀檔案
test -n "$AWOOOI_PG_PW" && echo "PG PW loaded (${#AWOOOI_PG_PW} chars)" || exit 1
```
`kubectl` 不可用或 context 不對,統帥需手動提供 `AWOOOI_PG_PW` — 不自動生成、不 fallback。
- [ ] **Step 2: 生成 HMAC secret32 bytes hex**
```bash
openssl rand -hex 32 > /tmp/aider_webhook_secret.txt
cat /tmp/aider_webhook_secret.txt # 記錄,之後進 K8s secret 用
chmod 600 /tmp/aider_webhook_secret.txt
cat /tmp/aider_webhook_secret.txt # 記下!之後進 K8s secret + Mac ~/.aider-watch.env
```
- [ ] **Step 2: 設定 local dev env**
- [ ] **Step 3: 設定本機 dev env**
```bash
# 這些 env 只用於本地開發/測試;正式部署走 K8s Secret
export AIDER_WEBHOOK_SECRET="$(cat /tmp/aider_webhook_secret.txt)"
export AIDER_EVENTS_STREAM_KEY="signals:aider:events"
export AIDER_PATTERN_EXTRACT_INTERVAL_HOURS="24"
export USE_AIDER_FEEDBACK="false" # 🆕 盲點 3預設關閉驗證後再開
```
- [ ] **Step 3: 驗證 Redis + PG 可達awoooi 既有)**
- [ ] **Step 4: 驗證 Redis + PG 可達**
```bash
redis-cli -h 192.168.0.120 -p 6380 ping # 期待 PONG
PGPASSWORD="$AWOOOI_PG_PW" psql -h 192.168.0.188 -U awoooi_rw -d awoooi -c 'SELECT 1'
```
Expected: Both commands succeed. If not, stop and resolve before proceeding.
Expected: 兩個命令都成功。任一失敗 → STOP先解決再往下。
---
@@ -503,12 +518,14 @@ git commit -m "feat(repo): AiderEventRepository CRUD + model_stats + pattern can
---
## Task A5: Service — classify + incident + pattern
## Task A5: Service — classify + incident coalesce + pattern
**Files:**
- Create: `apps/api/src/services/aider_event_service.py`
- Create: `apps/api/tests/test_aider_event_service.py`
**🆕 盲點 2incident 洗版)**:同 `session_id` + `aider_activity` 在 60s 窗口內只建 1 個 incident第 2+ error 合進該 incident 的 `signals` 陣列。Redis key `aider_incident_coalesce:{session_id}` TTL=60s 存 `incident_id`
- [ ] **Step 1: 寫測試mock incident_service因這是 service 的單元邊界)**
```python
@@ -559,6 +576,56 @@ def test_session_start_no_incident():
def test_error_creates_incident():
assert should_create_incident(_ev("error")) is True
@pytest.mark.asyncio
async def test_coalesce_reuses_existing_incident():
"""盲點 2同 session 60s 內只建 1 個 incident。"""
from apps.api.src.services.aider_event_service import create_incident_for_event
# Fake incident_svc + append_signal
append_calls = []
class FakeIncSvc:
async def create_incident(self, **kw): return "inc-first"
async def append_signal(self, inc_id, sig): append_calls.append((inc_id, sig))
# Fake redis: 第一次 get 回 Noneset 存;第二次 get 回 "inc-first"
store = {}
class FakeRedis:
async def get(self, k): return store.get(k)
async def set(self, k, v, ex=None): store[k] = v
svc = FakeIncSvc(); redis = FakeRedis()
e1 = _ev("error"); e2 = _ev("error")
id1 = await create_incident_for_event(e1, svc, redis_client=redis)
id2 = await create_incident_for_event(e2, svc, redis_client=redis)
assert id1 == id2 == "inc-first"
assert len(append_calls) == 1 # 第二個 event 只 append不建新
assert append_calls[0][1]["event_type"] == "error"
@pytest.mark.asyncio
async def test_coalesce_different_session_id_builds_new():
from apps.api.src.services.aider_event_service import create_incident_for_event
seen = []
class FakeIncSvc:
async def create_incident(self, **kw):
seen.append(kw); return f"inc-{len(seen)}"
async def append_signal(self, *_): pass
store = {}
class FakeRedis:
async def get(self, k): return store.get(k)
async def set(self, k, v, ex=None): store[k] = v
ea = AiderEventIn(ts=datetime.now(TAIPEI), session_id="sA",
host="m", type="error", payload={"cwd":"/x","model":"m1"})
eb = AiderEventIn(ts=datetime.now(TAIPEI), session_id="sB",
host="m", type="error", payload={"cwd":"/x","model":"m1"})
id_a = await create_incident_for_event(ea, FakeIncSvc(), redis_client=FakeRedis())
id_b = await create_incident_for_event(eb, FakeIncSvc(), redis_client=FakeRedis())
# 不同 session 各自建
assert id_a == "inc-1"
assert id_b == "inc-1" # 注意:每個 FakeIncSvc 是獨立 instance所以都是 inc-1
```
- [ ] **Step 2: 跑測試驗失敗**
@@ -611,15 +678,46 @@ class IncidentServiceLike(Protocol):
signals: dict[str, Any]) -> str: ...
COALESCE_TTL_SEC = 60
COALESCE_KEY_PREFIX = "aider_incident_coalesce:"
async def create_incident_for_event(ev: AiderEventIn,
incident_svc: IncidentServiceLike) -> str | None:
"""若 event 應觸發 incident建立並回傳 incident_id否則 None。"""
incident_svc: IncidentServiceLike,
redis_client=None) -> str | None:
"""若 event 應觸發 incident建立並回傳 incident_id否則 None。
🆕 盲點 2同 session_id 60s 內只建 1 個 incident後續合進 signals。"""
sev = classify_severity(ev)
if not sev:
return None
p = redact(ev.payload)
repo = p.get("cwd") or "<unknown>"
model = p.get("model") or "<unknown>"
# Coalesce 檢查
existing_id: str | None = None
if redis_client is not None:
key = f"{COALESCE_KEY_PREFIX}{ev.session_id}"
try:
raw = await redis_client.get(key)
if raw:
existing_id = raw.decode() if isinstance(raw, bytes) else raw
except Exception:
pass # Redis 失敗不阻塞 fallback 建新 incident
if existing_id:
# 同 session 已有 incident → 合進 signals不建新的
if hasattr(incident_svc, "append_signal"):
try:
await incident_svc.append_signal(existing_id, {
"ts": ev.ts.isoformat(),
"event_type": ev.type,
"payload_excerpt": p,
})
except Exception:
pass
return existing_id
title_map = {
"error": f"[aider] error in {repo}",
"silent_timeout": f"[aider] silent timeout in {repo}",
@@ -638,6 +736,13 @@ async def create_incident_for_event(ev: AiderEventIn,
"payload_excerpt": p,
},
)
# 寫 Redis coalesce key
if redis_client is not None and inc_id:
try:
await redis_client.set(f"{COALESCE_KEY_PREFIX}{ev.session_id}",
inc_id, ex=COALESCE_TTL_SEC)
except Exception:
pass
return inc_id
@@ -1017,28 +1122,36 @@ grep -n "def route\|class AIRouter\|AIProvider" apps/api/src/services/ai_router.
return out
```
- [ ] **Step 3: 在 AIRouter.route() 內加入 feedback 影響(soft boost**
- [ ] **Step 3: 在 AIRouter.route() 內加入 feedback 影響(🆕 盲點 3Flag 保護**
找到 `def route(...)``async def route(...)` 決策主體,在 provider 優先序計算後、return 前加:
```python
# Phase 24 feedback hook: aider 實戰成功率 × 原權重
try:
feedback = await self.feedback_from_aider_events(days=7)
# feedback = {"elephant-alpha": 0.85, "gemini-pro": 0.92}
# 若 provider name 有 match乘 success_rate不低於 0.1
if feedback:
for p in providers: # 假設 providers 是 list[ProviderCandidate]
key = p.name.lower()
rate = next((v for k, v in feedback.items()
if key in k.lower()), None)
if rate is not None:
p.weight *= max(rate, 0.1)
except Exception:
logger.debug("ai_router: feedback hook skipped (non-critical)")
# 🆕 盲點 3USE_AIDER_FEEDBACK flag 保護首次 rollout
from apps.api.src.core.config import settings
if getattr(settings, "USE_AIDER_FEEDBACK", False):
try:
feedback = await self.feedback_from_aider_events(days=7)
# feedback = {"elephant-alpha": 0.85, "gemini-pro": 0.92}
if feedback:
for p in providers: # 假設 providers 是 list[ProviderCandidate]
key = p.name.lower()
rate = next((v for k, v in feedback.items()
if key in k.lower()), None)
if rate is not None:
p.weight *= max(rate, 0.1) # 最低 10% 保底
logger.info("ai_router: aider feedback applied (%d models)", len(feedback))
except Exception:
logger.debug("ai_router: feedback hook skipped (non-critical)")
else:
logger.debug("ai_router: feedback disabled via USE_AIDER_FEEDBACK=false")
```
**注意**實際 `providers``p.weight` 命名需依 route() 當下代碼調整;若 route() 內部用不同資料結構,按它的慣例接。不要大改 route() 邏輯 — 只加這個 try block。
**注意**
1. 實際 `providers``p.weight` 命名需依 route() 當下代碼調整
2. **預設 Flag 關**:上線後跑 7 天真實 aider 資料,比對 `ai_router` 決策是否合理SignOz trace 差異 < 5%)再打開 Flag
3. 若 Flag 關 → 該 hook 完全不影響既有 route 決策 — zero-risk rollout
- [ ] **Step 4: 寫最小測試(若 route 難測,測 feedback method 即可)**
@@ -1088,7 +1201,7 @@ git commit -m "feat(ai_router): feedback_from_aider_events hook (Phase 24 ADR-05
- Modify: `apps/api/src/core/config.py`
- Modify: `apps/api/src/main.py`
- [ ] **Step 1: config.py 加 3 個 settings**
- [ ] **Step 1: config.py 加 4 個 settings(🆕 盲點 3 新增 USE_AIDER_FEEDBACK**
找到 `class Settings(BaseSettings)` 或類似,加:
@@ -1097,6 +1210,7 @@ git commit -m "feat(ai_router): feedback_from_aider_events hook (Phase 24 ADR-05
AIDER_WEBHOOK_SECRET: str = ""
AIDER_EVENTS_STREAM_KEY: str = "signals:aider:events"
AIDER_PATTERN_EXTRACT_INTERVAL_HOURS: float = 24.0
USE_AIDER_FEEDBACK: bool = False # 🆕 預設關;灰度 7 天後手動開
```
- [ ] **Step 2: main.py lifespan 註冊 aider job**
@@ -1720,6 +1834,18 @@ git commit -m "feat(client): aiderw main wrapper + CLI (doctor/flush)"
</dict></plist>
```
**🆕 盲點 1Mac 外網可達性)— 三選一配置**
安裝前,統帥需先決定 `AIDER_API_URL` 的來源:
| 方式 | URL 範例 | 前置工作 |
|------|---------|---------|
| **a.** awoooi nginx 公網 TLS | `https://awoooi.example/api/v1/aider/events` | awoooi nginx 加一條 location blockproxy_pass 到 K3s svc 32334enable cert |
| **b.** Tailscale若已裝 | `http://<tailscale-100-ip>:32334/api/v1/aider/events` | tailscale up確認 awoooi 188 host 也在 Tailnet |
| **c.** VPNWireGuard | `http://192.168.0.120:32334/api/v1/aider/events` | WireGuard peer config |
裝 install.sh 前,寫 `~/.aider-watch.env` 填入選定的 URL。
- [ ] **Step 2: 寫 install.sh**
```bash

View File

@@ -291,14 +291,38 @@ def feedback_from_aider_events(repo: str, task_kind: str) -> dict[str, float]:
| 失敗情境 | 處理 | 結果 |
|---------|------|------|
| **AWOOOI API 不可達** | Mac client 寫 `~/aider-watch/buffer/*.jsonl`launchd flush job 每 5min 重試 | 不丟資料 |
| **Mac 離開內網**(🆕 盲點 1 | `AIDER_API_URL` 走 awoooi nginx 公網 endpoint`https://awoooi.example/api/v1/aider/events`)或 VPN/Tailscalebuffer 作備援 | 離家/出差仍可用 |
| **HMAC 驗證失敗** | API 回 401`audit_log`Mac client 留 buffer 等人工介入 | 防偽造 |
| **Redis stream 滿** | 自動 trim 10k 筆上限;溢位寫 `audit_log` warning | 不阻塞 API |
| **aider_event_processor 掛** | 既有 main.py lifespan supervisor 會重啟;消費位置從 Redis stream consumer group 恢復 | 自癒 |
| **incident_service 建 incident 失敗** | 降級:只寫 aider_events 表,不建 incidentTG 不推可接受event 還在) | 不丟資料 |
| **incident 洗版**(🆕 盲點 2一 session 連續多 error 建多個 incident | service 層加 **coalesce 機制**:同 `session_id` + `aider_activity` 在 60s 窗口內只建 1 個 incident後續 error 合進同一 incident 的 signals 陣列 | TG 不被 aider 噪音淹沒 |
| **AI Router feedback 首次 rollout 風險**(🆕 盲點 3 | 加 `USE_AIDER_FEEDBACK` flag 預設 `false`;灰度驗證(跑 7 天真實 aider 資料、比對 `ai_router.route()` 決策差異)再開 `true` | 防止 aider 資料牽連其他 route 決策 |
| **secret 外洩** | Mac client 進 buffer 前先 redactserver 再 redact 一次depth-in | 雙層防線 |
| **wrapper 崩潰** | aiderw 外層 try/except 全包atexit 補 session_end | aider 照跑 |
| **aider 本身崩潰** | wrapper 收 SIGCHLD → 送 error event + session_end exit≠0 | 死得清楚 |
**🆕 §8b Mac client 網路配置選項**(盲點 1 展開)
| 方式 | `AIDER_API_URL` 設值範例 | 優缺 |
|------|------------------------|------|
| **Tailscale**(如果統帥已裝) | `http://<tailscale-ip>:32334/...` | 零設定encrypted需 Tailscale 持續登入 |
| **awoooi nginx 公網** | `https://<domain>/api/v1/aider/events` | 公網可達;需在 nginx 加 reverse proxy route 並 enable TLS |
| **VPNWireGuard** | `http://192.168.0.120:32334/...` | 強隱私;需 VPN 持續連線 |
**預設採 awoooi nginx 公網** — awoooi 已有 nginx:443 on 188 host增加一條 location block 即可Mac 不需額外軟體。
**🆕 §8c `AWOOOI_PG_PW` 取得流程**(盲點 4
A0 前置步驟:
```bash
# 從 K8s secret 取 PG 密碼(部署端一次性)
kubectl get secret awoooi-secrets -n awoooi -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d > /tmp/pg_pw.txt
export AWOOOI_PG_PW="$(cat /tmp/pg_pw.txt)"
shred -u /tmp/pg_pw.txt # 不留痕
```
此步符合 `feedback_secrets_leak_incidents_2026-04-18.md` 零信任三層防線 L4本機 shell env不進檔
---
## §9 測試策略