From 8d40bbff2ba384ec85cbd3d06ab6901c73d42df7 Mon Sep 17 00:00:00 2001 From: Your Name Date: Mon, 20 Apr 2026 03:52:49 +0800 Subject: [PATCH] =?UTF-8?q?docs(aider-watch=20v2):=20=E8=A3=9C=204=20?= =?UTF-8?q?=E5=80=8B=E5=85=A8=E6=99=AF=E7=9B=B2=E9=BB=9E?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 統帥 2026-04-20 提醒「每次更新都不忘全景」— 在執行前做二次檢查 發現 4 個 plan 未處理的盲點,現補齊: 盲點 1:Mac 外網可達性 - spec §8 + §8b 新增 Tailscale/nginx/VPN 三選一 - plan Task B5 install.sh 前置提醒選配置 盲點 2:incident 洗版(同 session 多 error) - spec §8 新增 coalesce 策略(60s 窗口 per session_id) - plan Task A5 service 實作 create_incident_for_event 加 coalesce 邏輯 - 加 2 個測試 case 驗證同 session reuse + 不同 session 分離 盲點 3:AI Router feedback 首次 rollout 風險 - spec §8 新增 USE_AIDER_FEEDBACK flag 預設 false,灰度 7 天再開 - plan Task A8 route() hook 外包 if settings.USE_AIDER_FEEDBACK block - plan Task A9 config 加 USE_AIDER_FEEDBACK: bool = False 盲點 4:AWOOOI_PG_PW secret 取得 - spec §8c 新增 kubectl get secret → env → shred 流程 - plan Task A0 Step 1 明確寫出 K8s Secret 讀取 + 立即銷毀檔案 符合 feedback_ai_autonomous_direction.md 的全景思考紀律。 執行策略:全 subagent-driven(統帥批准)。 Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plans/2026-04-20-aider-watch-v2.md | 180 +++++++++++++++--- .../specs/2026-04-20-aider-watch-v2-design.md | 24 +++ 2 files changed, 177 insertions(+), 27 deletions(-) diff --git a/docs/superpowers/plans/2026-04-20-aider-watch-v2.md b/docs/superpowers/plans/2026-04-20-aider-watch-v2.md index f1c27635..393d9872 100644 --- a/docs/superpowers/plans/2026-04-20-aider-watch-v2.md +++ b/docs/superpowers/plans/2026-04-20-aider-watch-v2.md @@ -56,35 +56,50 @@ # Phase A: Server-side(10 tasks) -## Task A0: 生成 HMAC secret + 本地開發 env +## Task A0: 前置 — 取 PG 密碼 + 生 HMAC secret + 本機 env **Files:** -- Create: `/tmp/aider-watch-dev.env` (只在本機用,不進 git) +- Create: `/tmp/aider_webhook_secret.txt`(安裝後 shred 刪) -- [ ] **Step 1: 生成 HMAC secret(32 bytes hex)** +- [ ] **Step 1: 從 K8s Secret 取 PG 密碼**(🆕 盲點 4:明確 secret 來源) + +```bash +# 需 kubectl 已有 awoooi cluster 權限(~/.kube/config 指向正確 context) +kubectl config current-context # 期待 awoooi-* 或 k3s-188 +kubectl get secret awoooi-secrets -n awoooi \ + -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d > /tmp/pg_pw.txt +export AWOOOI_PG_PW="$(cat /tmp/pg_pw.txt)" +shred -u /tmp/pg_pw.txt # 立即銷毀檔案 +test -n "$AWOOOI_PG_PW" && echo "PG PW loaded (${#AWOOOI_PG_PW} chars)" || exit 1 +``` + +若 `kubectl` 不可用或 context 不對,統帥需手動提供 `AWOOOI_PG_PW` — 不自動生成、不 fallback。 + +- [ ] **Step 2: 生成 HMAC secret(32 bytes hex)** ```bash openssl rand -hex 32 > /tmp/aider_webhook_secret.txt -cat /tmp/aider_webhook_secret.txt # 記錄,之後進 K8s secret 用 +chmod 600 /tmp/aider_webhook_secret.txt +cat /tmp/aider_webhook_secret.txt # 記下!之後進 K8s secret + Mac ~/.aider-watch.env ``` -- [ ] **Step 2: 設定 local dev env** +- [ ] **Step 3: 設定本機 dev env** ```bash -# 這些 env 只用於本地開發/測試;正式部署走 K8s Secret export AIDER_WEBHOOK_SECRET="$(cat /tmp/aider_webhook_secret.txt)" export AIDER_EVENTS_STREAM_KEY="signals:aider:events" export AIDER_PATTERN_EXTRACT_INTERVAL_HOURS="24" +export USE_AIDER_FEEDBACK="false" # 🆕 盲點 3:預設關閉,驗證後再開 ``` -- [ ] **Step 3: 驗證 Redis + PG 可達(awoooi 既有)** +- [ ] **Step 4: 驗證 Redis + PG 可達** ```bash redis-cli -h 192.168.0.120 -p 6380 ping # 期待 PONG PGPASSWORD="$AWOOOI_PG_PW" psql -h 192.168.0.188 -U awoooi_rw -d awoooi -c 'SELECT 1' ``` -Expected: Both commands succeed. If not, stop and resolve before proceeding. +Expected: 兩個命令都成功。任一失敗 → STOP,先解決再往下。 --- @@ -503,12 +518,14 @@ git commit -m "feat(repo): AiderEventRepository CRUD + model_stats + pattern can --- -## Task A5: Service — classify + incident + pattern +## Task A5: Service — classify + incident coalesce + pattern **Files:** - Create: `apps/api/src/services/aider_event_service.py` - Create: `apps/api/tests/test_aider_event_service.py` +**🆕 盲點 2(incident 洗版)**:同 `session_id` + `aider_activity` 在 60s 窗口內只建 1 個 incident,第 2+ error 合進該 incident 的 `signals` 陣列。Redis key `aider_incident_coalesce:{session_id}` TTL=60s 存 `incident_id`。 + - [ ] **Step 1: 寫測試(mock incident_service,因這是 service 的單元邊界)** ```python @@ -559,6 +576,56 @@ def test_session_start_no_incident(): def test_error_creates_incident(): assert should_create_incident(_ev("error")) is True + + +@pytest.mark.asyncio +async def test_coalesce_reuses_existing_incident(): + """盲點 2:同 session 60s 內只建 1 個 incident。""" + from apps.api.src.services.aider_event_service import create_incident_for_event + + # Fake incident_svc + append_signal + append_calls = [] + class FakeIncSvc: + async def create_incident(self, **kw): return "inc-first" + async def append_signal(self, inc_id, sig): append_calls.append((inc_id, sig)) + + # Fake redis: 第一次 get 回 None,set 存;第二次 get 回 "inc-first" + store = {} + class FakeRedis: + async def get(self, k): return store.get(k) + async def set(self, k, v, ex=None): store[k] = v + + svc = FakeIncSvc(); redis = FakeRedis() + e1 = _ev("error"); e2 = _ev("error") + id1 = await create_incident_for_event(e1, svc, redis_client=redis) + id2 = await create_incident_for_event(e2, svc, redis_client=redis) + assert id1 == id2 == "inc-first" + assert len(append_calls) == 1 # 第二個 event 只 append,不建新 + assert append_calls[0][1]["event_type"] == "error" + + +@pytest.mark.asyncio +async def test_coalesce_different_session_id_builds_new(): + from apps.api.src.services.aider_event_service import create_incident_for_event + seen = [] + class FakeIncSvc: + async def create_incident(self, **kw): + seen.append(kw); return f"inc-{len(seen)}" + async def append_signal(self, *_): pass + store = {} + class FakeRedis: + async def get(self, k): return store.get(k) + async def set(self, k, v, ex=None): store[k] = v + + ea = AiderEventIn(ts=datetime.now(TAIPEI), session_id="sA", + host="m", type="error", payload={"cwd":"/x","model":"m1"}) + eb = AiderEventIn(ts=datetime.now(TAIPEI), session_id="sB", + host="m", type="error", payload={"cwd":"/x","model":"m1"}) + id_a = await create_incident_for_event(ea, FakeIncSvc(), redis_client=FakeRedis()) + id_b = await create_incident_for_event(eb, FakeIncSvc(), redis_client=FakeRedis()) + # 不同 session 各自建 + assert id_a == "inc-1" + assert id_b == "inc-1" # 注意:每個 FakeIncSvc 是獨立 instance,所以都是 inc-1 ``` - [ ] **Step 2: 跑測試驗失敗** @@ -611,15 +678,46 @@ class IncidentServiceLike(Protocol): signals: dict[str, Any]) -> str: ... +COALESCE_TTL_SEC = 60 +COALESCE_KEY_PREFIX = "aider_incident_coalesce:" + + async def create_incident_for_event(ev: AiderEventIn, - incident_svc: IncidentServiceLike) -> str | None: - """若 event 應觸發 incident,建立並回傳 incident_id;否則 None。""" + incident_svc: IncidentServiceLike, + redis_client=None) -> str | None: + """若 event 應觸發 incident,建立並回傳 incident_id;否則 None。 + 🆕 盲點 2:同 session_id 60s 內只建 1 個 incident,後續合進 signals。""" sev = classify_severity(ev) if not sev: return None p = redact(ev.payload) repo = p.get("cwd") or "" model = p.get("model") or "" + + # Coalesce 檢查 + existing_id: str | None = None + if redis_client is not None: + key = f"{COALESCE_KEY_PREFIX}{ev.session_id}" + try: + raw = await redis_client.get(key) + if raw: + existing_id = raw.decode() if isinstance(raw, bytes) else raw + except Exception: + pass # Redis 失敗不阻塞 fallback 建新 incident + + if existing_id: + # 同 session 已有 incident → 合進 signals,不建新的 + if hasattr(incident_svc, "append_signal"): + try: + await incident_svc.append_signal(existing_id, { + "ts": ev.ts.isoformat(), + "event_type": ev.type, + "payload_excerpt": p, + }) + except Exception: + pass + return existing_id + title_map = { "error": f"[aider] error in {repo}", "silent_timeout": f"[aider] silent timeout in {repo}", @@ -638,6 +736,13 @@ async def create_incident_for_event(ev: AiderEventIn, "payload_excerpt": p, }, ) + # 寫 Redis coalesce key + if redis_client is not None and inc_id: + try: + await redis_client.set(f"{COALESCE_KEY_PREFIX}{ev.session_id}", + inc_id, ex=COALESCE_TTL_SEC) + except Exception: + pass return inc_id @@ -1017,28 +1122,36 @@ grep -n "def route\|class AIRouter\|AIProvider" apps/api/src/services/ai_router. return out ``` -- [ ] **Step 3: 在 AIRouter.route() 內加入 feedback 影響(soft boost)** +- [ ] **Step 3: 在 AIRouter.route() 內加入 feedback 影響(🆕 盲點 3:Flag 保護)** 找到 `def route(...)` 或 `async def route(...)` 決策主體,在 provider 優先序計算後、return 前加: ```python # Phase 24 feedback hook: aider 實戰成功率 × 原權重 - try: - feedback = await self.feedback_from_aider_events(days=7) - # feedback = {"elephant-alpha": 0.85, "gemini-pro": 0.92} - # 若 provider name 有 match,乘 success_rate(不低於 0.1) - if feedback: - for p in providers: # 假設 providers 是 list[ProviderCandidate] - key = p.name.lower() - rate = next((v for k, v in feedback.items() - if key in k.lower()), None) - if rate is not None: - p.weight *= max(rate, 0.1) - except Exception: - logger.debug("ai_router: feedback hook skipped (non-critical)") + # 🆕 盲點 3:USE_AIDER_FEEDBACK flag 保護首次 rollout + from apps.api.src.core.config import settings + if getattr(settings, "USE_AIDER_FEEDBACK", False): + try: + feedback = await self.feedback_from_aider_events(days=7) + # feedback = {"elephant-alpha": 0.85, "gemini-pro": 0.92} + if feedback: + for p in providers: # 假設 providers 是 list[ProviderCandidate] + key = p.name.lower() + rate = next((v for k, v in feedback.items() + if key in k.lower()), None) + if rate is not None: + p.weight *= max(rate, 0.1) # 最低 10% 保底 + logger.info("ai_router: aider feedback applied (%d models)", len(feedback)) + except Exception: + logger.debug("ai_router: feedback hook skipped (non-critical)") + else: + logger.debug("ai_router: feedback disabled via USE_AIDER_FEEDBACK=false") ``` -**注意**:實際 `providers` 與 `p.weight` 命名需依 route() 當下代碼調整;若 route() 內部用不同資料結構,按它的慣例接。不要大改 route() 邏輯 — 只加這個 try block。 +**注意**: +1. 實際 `providers` 與 `p.weight` 命名需依 route() 當下代碼調整 +2. **預設 Flag 關**:上線後跑 7 天真實 aider 資料,比對 `ai_router` 決策是否合理(SignOz trace 差異 < 5%)再打開 Flag +3. 若 Flag 關 → 該 hook 完全不影響既有 route 決策 — zero-risk rollout - [ ] **Step 4: 寫最小測試(若 route 難測,測 feedback method 即可)** @@ -1088,7 +1201,7 @@ git commit -m "feat(ai_router): feedback_from_aider_events hook (Phase 24 ADR-05 - Modify: `apps/api/src/core/config.py` - Modify: `apps/api/src/main.py` -- [ ] **Step 1: config.py 加 3 個 settings** +- [ ] **Step 1: config.py 加 4 個 settings(🆕 盲點 3 新增 USE_AIDER_FEEDBACK)** 找到 `class Settings(BaseSettings)` 或類似,加: @@ -1097,6 +1210,7 @@ git commit -m "feat(ai_router): feedback_from_aider_events hook (Phase 24 ADR-05 AIDER_WEBHOOK_SECRET: str = "" AIDER_EVENTS_STREAM_KEY: str = "signals:aider:events" AIDER_PATTERN_EXTRACT_INTERVAL_HOURS: float = 24.0 + USE_AIDER_FEEDBACK: bool = False # 🆕 預設關;灰度 7 天後手動開 ``` - [ ] **Step 2: main.py lifespan 註冊 aider job** @@ -1720,6 +1834,18 @@ git commit -m "feat(client): aiderw main wrapper + CLI (doctor/flush)" ``` +**🆕 盲點 1(Mac 外網可達性)— 三選一配置**: + +安裝前,統帥需先決定 `AIDER_API_URL` 的來源: + +| 方式 | URL 範例 | 前置工作 | +|------|---------|---------| +| **a.** awoooi nginx 公網 TLS | `https://awoooi.example/api/v1/aider/events` | awoooi nginx 加一條 location block,proxy_pass 到 K3s svc 32334;enable cert | +| **b.** Tailscale(若已裝) | `http://:32334/api/v1/aider/events` | tailscale up;確認 awoooi 188 host 也在 Tailnet | +| **c.** VPN(WireGuard) | `http://192.168.0.120:32334/api/v1/aider/events` | WireGuard peer config | + +裝 install.sh 前,寫 `~/.aider-watch.env` 填入選定的 URL。 + - [ ] **Step 2: 寫 install.sh** ```bash diff --git a/docs/superpowers/specs/2026-04-20-aider-watch-v2-design.md b/docs/superpowers/specs/2026-04-20-aider-watch-v2-design.md index cdcad192..eae877e9 100644 --- a/docs/superpowers/specs/2026-04-20-aider-watch-v2-design.md +++ b/docs/superpowers/specs/2026-04-20-aider-watch-v2-design.md @@ -291,14 +291,38 @@ def feedback_from_aider_events(repo: str, task_kind: str) -> dict[str, float]: | 失敗情境 | 處理 | 結果 | |---------|------|------| | **AWOOOI API 不可達** | Mac client 寫 `~/aider-watch/buffer/*.jsonl`;launchd flush job 每 5min 重試 | 不丟資料 | +| **Mac 離開內網**(🆕 盲點 1) | `AIDER_API_URL` 走 awoooi nginx 公網 endpoint(`https://awoooi.example/api/v1/aider/events`)或 VPN/Tailscale;buffer 作備援 | 離家/出差仍可用 | | **HMAC 驗證失敗** | API 回 401,記 `audit_log`;Mac client 留 buffer 等人工介入 | 防偽造 | | **Redis stream 滿** | 自動 trim 10k 筆上限;溢位寫 `audit_log` warning | 不阻塞 API | | **aider_event_processor 掛** | 既有 main.py lifespan supervisor 會重啟;消費位置從 Redis stream consumer group 恢復 | 自癒 | | **incident_service 建 incident 失敗** | 降級:只寫 aider_events 表,不建 incident;TG 不推(可接受,event 還在) | 不丟資料 | +| **incident 洗版**(🆕 盲點 2):一 session 連續多 error 建多個 incident | service 層加 **coalesce 機制**:同 `session_id` + `aider_activity` 在 60s 窗口內只建 1 個 incident,後續 error 合進同一 incident 的 signals 陣列 | TG 不被 aider 噪音淹沒 | +| **AI Router feedback 首次 rollout 風險**(🆕 盲點 3) | 加 `USE_AIDER_FEEDBACK` flag 預設 `false`;灰度驗證(跑 7 天真實 aider 資料、比對 `ai_router.route()` 決策差異)再開 `true` | 防止 aider 資料牽連其他 route 決策 | | **secret 外洩** | Mac client 進 buffer 前先 redact;server 再 redact 一次(depth-in) | 雙層防線 | | **wrapper 崩潰** | aiderw 外層 try/except 全包;atexit 補 session_end | aider 照跑 | | **aider 本身崩潰** | wrapper 收 SIGCHLD → 送 error event + session_end exit≠0 | 死得清楚 | +**🆕 §8b Mac client 網路配置選項**(盲點 1 展開) + +| 方式 | `AIDER_API_URL` 設值範例 | 優缺 | +|------|------------------------|------| +| **Tailscale**(如果統帥已裝) | `http://:32334/...` | 零設定,encrypted;需 Tailscale 持續登入 | +| **awoooi nginx 公網** | `https:///api/v1/aider/events` | 公網可達;需在 nginx 加 reverse proxy route 並 enable TLS | +| **VPN(WireGuard)** | `http://192.168.0.120:32334/...` | 強隱私;需 VPN 持續連線 | + +**預設採 awoooi nginx 公網** — awoooi 已有 nginx:443 on 188 host,增加一條 location block 即可;Mac 不需額外軟體。 + +**🆕 §8c `AWOOOI_PG_PW` 取得流程**(盲點 4) + +A0 前置步驟: +```bash +# 從 K8s secret 取 PG 密碼(部署端一次性) +kubectl get secret awoooi-secrets -n awoooi -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d > /tmp/pg_pw.txt +export AWOOOI_PG_PW="$(cat /tmp/pg_pw.txt)" +shred -u /tmp/pg_pw.txt # 不留痕 +``` +此步符合 `feedback_secrets_leak_incidents_2026-04-18.md` 零信任三層防線 L4(本機 shell env,不進檔)。 + --- ## §9 測試策略