Your Name
bb5f16f8ef
fix(aiops-p2): P2.1 LLM品質三修 — Evidence-First + consensus confidence + raw_evidence注入
...
CD Pipeline / build-and-deploy (push) Has been cancelled
根因:
- consensus_engine 四 ExpertAgent confidence=0.0 → 加權投票 total=0 → 永遠返回 NO_ACTION
- prompts.py 無 Evidence-First 指令 → LLM 靠記憶推理,無真實環境約束
- openclaw.py analyze_alert 建 prompt 未注入 MCP evidence (diagnosis_context)
修復:
- consensus_engine: SRE/Security/Cost/Performance 依訊號強度設 0.45~0.80 confidence
- consensus_engine: _normalize_action 加「重新啟動」別名 → RESTART
- consensus_engine: SecurityAgent 移除未使用的 _target 變數
- prompts.py: 加 Evidence-First Protocol + Skepticism Rules 區塊
- openclaw.py: analyze_alert 提取 diagnosis_context → <raw_evidence> 注入 full_prompt
驗證: consensus score 從 0.0 → 0.744(CrashLoop 測試案例)
P2.1 fix 2026-04-24 ogt + Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 15:52:25 +08:00
Your Name
359a6ee495
fix(test-schema): approval_records 補 matched_playbook_id 欄位
...
CD Pipeline / build-and-deploy (push) Has been cancelled
CI B5 整合測試失敗根因:04ff225 在 ORM model 加 matched_playbook_id,
但 tests/integration/setup_test_schema.sql 未同步,導致
test_approval_lifecycle / test_incident_approval_association 拋
UndefinedColumnError 阻擋 CD Pipeline build-and-deploy。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 15:48:37 +08:00
Your Name
04ff22563e
fix(aiops-p1): Playbook 學習閉環 5斷點全修 + DB Migration(ADR-092 B4)
...
run-migration / migrate (push) Failing after 14s
CD Pipeline / build-and-deploy (push) Failing after 2m7s
【P0.4 補丁】pre_decision_investigator Prometheus query 欄位缺失
- _build_tool_params() 補 "query" 欄位(prometheus_query tool 必要參數)
- 新增 _build_prometheus_query() — 依告警類型生成 PromQL(CPU/Memory/Crash/Disk/HTTP/Pod/fallback)
- 修復後 D3_METRICS 感官維度實際取得資料(原本 100% 回 missing_query_parameter)
【P1 Playbook 學習閉環 B1-B5 全修】
- B2 db/models.py: ApprovalRecord 新增 matched_playbook_id 欄位 + ix_approval_matched_playbook index
- B2 db/models.py: TimelineEvent 新增 incident_id 欄位(MCP 稽核用)+ index
- B3 approval_db.py: record→ApprovalRequest 補回 incident_id + matched_playbook_id
- B4 approval_repository.py: 同 B3(兩個轉換函式必須同步)
- B5 approval_db.py: approval_request_to_record_data 補 matched_playbook_id → DB 才能存值
【P1.5 KM 寫入】approval_execution.py: fire-and-forget → await wait_for(30s)
- 根因:asyncio.create_task 在 Pod recycle 時被殺,KM 寫入靜默遺失
- 修復:await asyncio.wait_for(..., timeout=30.0) + TimeoutError log
【Migration 文件】adr092_p1_learning_chain_fix.sql
- ALTER TABLE approval_records ADD COLUMN matched_playbook_id VARCHAR(36)
- ALTER TABLE timeline_events ADD COLUMN incident_id VARCHAR(64)
- 執行:psql $DATABASE_URL -f apps/api/migrations/adr092_p1_learning_chain_fix.sql
【附帶 Agent 改動】
- decision_manager: Phase 2 YAML NO_ACTION 優先門(主機層/外部服務跳過 Agent Debate)
- alert_rules.yaml: Sentry/ClickHouse + HostDiskUsageHigh/Critical 新規則
- solver_agent: action_title 語意合成兜底(取代靜默丟棄)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 15:41:35 +08:00
Your Name
7f4088bcd0
fix(aiops-p0): 六大病根 P0 全面修復(ADR-092 B4)
...
【P0.1】knowledge_extractor_service.py:210 — AttributeError 修復
- Signal.description 欄位不存在(100% 失敗,KM 每天+5 根因)
- 改用 alert_name + annotations.summary 拼接文字
【P0.2+P0.3】Gate 9+11 唯讀指令鬆綁
- blast_radius_calculator: kubectl get/top/describe/logs/version → score=1(非 50)
- operation_parser: 增加 INVESTIGATE 類型識別(唯讀 kubectl 不回 None)
- executor.py: OperationType 新增 INVESTIGATE enum
- approval_execution.py: INVESTIGATE 路徑直接呼叫 execute_kubectl_command
【P0.4】MCP SSH/K8s Provider 修復
- decision_manager: params= → parameters=(符合 MCPToolProvider.execute 簽名)
- decision_manager: MCPToolResult .get() → .success/.output(dataclass 用法)
- decision_manager + ssh_provider: 補入 hosts 120/121(原 default 缺失)
- auto_approve: phase2_agent_debate source bypass confidence 閾值
【P0.5】告警規則語義矛盾修復
- alert_rules.yaml: 8 條 kubectl 查詢規則 RESTART_DEPLOYMENT → NO_ACTION
(CrashLoopBackOff/PostgreSQL 連線/慢查詢/MinIO 磁碟/K3s 節點/告警鏈路/SSL/CoreDNS 等)
- incident_service.py: cAdvisor/CoreDNS 從 general 拆出獨立分類
【P0.6】proactive_inspector 動態基線 PromQL 全修
- 5 個 MONITORED_METRICS PromQL 全部修正(cadvisor label/datname/blackbox)
- db_connection_pool: datname="awoooi" → "awoooi_prod"
- http_error_rate: 無效 http_requests_total → blackbox probe_success
- cpu/memory: namespace label → name=~"k8s_api_awoooi-api.*"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 15:32:23 +08:00
Your Name
45dbe07188
fix(flywheel): 自動化飛輪六大能力修復(ADR-092 B3)
...
run-migration / migrate (push) Failing after 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 53s
Type Sync Check / check-type-sync (push) Successful in 2m54s
CD Pipeline / build-and-deploy (push) Has been cancelled
Ansible Lint / lint (push) Has been cancelled
【根因鏈修復】
MCP Provider bugs → PreDecisionInvestigator 失敗 → Agent Debate 無上下文
→ LLM 逾時 → description="待分析" → ADR-091 鐵閘攔截 → tg_sent 未設
→ W-2 Watchdog 誤報「靜默故障」
【六大修復】
1. MCP Provider 三蟲修復
- ssh_provider: asyncssh.run() → conn.run()
- prometheus_provider: KeyError 'query' → .get() 容錯
- k8s_provider: 空 pod_name → 早返回錯誤字典
2. Agent Debate / 決策品質
- decision_manager: 逾時降級文字改為明確描述(繞過 ADR-091 鐵閘)
- intent_classifier: LLM 逾時降級至關鍵字分類(非 None)
3. Watchdog 誤報修復(ADR-092 B3)
- W-2: tg_sent Redis TTL → telegram_message_id IS NULL(DB 真值)
- W-5 新增: suggested_action IN 空/待分析/NO_ACTION + tg_id IS NULL
- approval_timeout_resolver: 60min → 15min,batch 50 → 200
4. Config Drift 自動化
- drift_adopt_service: auto_adopt_if_safe() 六條件安全閘
- drift.py: 背景任務先嘗試自動採納再發人工 Telegram 卡片
5. Playbook 飛輪穩定
- playbook_seed_service: 修復幂等性(deprecated 不視為缺失)
- playbook_evolver: 只載 DRAFT+APPROVED(非全部 294 筆)
6. 可觀測性
- alert_rule_engine: auto_rule 結構化日誌 + Redis 計數器(pipeline)
- auto_approve: reject 原因 Redis 計數器
- heartbeat_report_service: 新增「⚙️ 自動化統計(今日)」區塊
【待人工執行】
psql $DATABASE_URL -f apps/api/migrations/cleanup_duplicate_deprecated_playbooks.sql
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-24 10:55:50 +08:00
Your Name
9244c5e845
feat(heartbeat): 系統報告新增 5 大動態區塊
...
CD Pipeline / build-and-deploy (push) Successful in 13m50s
新增告警流水線(24h)、DB/Redis 狀態、K8s Pods、Scanner 狀態、Telegram Bot
各區塊採 asyncio.gather(return_exceptions=True) 平行探測,任一失敗不影響其他
新增 AlertPipelineStats/DbRedisStats/PodInfo/ScannerStats/TelegramBotStats dataclasses
_build_warnings() 加入 DB/Redis 異常、PENDING>10、Pod 未就緒/高重啟次數判斷
report_to_telegram_html() 對應輸出 5 個新 HTML 區塊
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 09:29:16 +08:00
AWOOOI CD
3bd105be9a
chore(cd): deploy 88af639 [skip ci]
2026-04-22 01:18:56 +00:00
Your Name
88af639651
fix(report): 修正 approval_records.status 大小寫不一致
...
CD Pipeline / build-and-deploy (push) Successful in 9m46s
DB 以 SQLEnum 儲存 enum name(EXECUTION_FAILED 大寫),
而非 enum value(execution_failed 小寫)。
SQL 加 UPPER(status::text) 確保不論大小寫皆能命中。
驗證:live DB 查詢 success=0, failed=2(之前永遠 0/0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 09:10:39 +08:00
Your Name
6810ab359d
fix(report): 日報重發 + 自動修復 0% 兩大根因修復
...
CD Pipeline / build-and-deploy (push) Has been cancelled
問題一:日度巡檢報告重複發送(多 Pod 各自跑 daily job)
- 根因:run_daily_report_loop 沒有接 leader lock
其他 scanner(capacity/hermes/compliance)都有呼叫
try_acquire_daily_lock,唯獨日報 loop 缺失
- 修法:asyncio.sleep 後加 try_acquire_daily_lock("daily_report")
搶不到 lock 的 Pod 直接 continue,等下一個 08:00
問題二:自動修復成功率永遠 0.0%
- 根因:_collect_repair_stats 查 incidents.outcome->>'execution_success'
但整條執行鏈路(approval_execution.py NO_ACTION + 真實執行)
從未將 execution_success 寫回 incidents.outcome JSON
導致查詢永遠回 0
- 修法:改查 approval_records.status(EXECUTION_SUCCESS / EXECUTION_FAILED)
這是唯一被穩定寫入的 source of truth
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 09:03:44 +08:00
AWOOOI CD
757a58cc60
chore(cd): deploy 1625e7b [skip ci]
2026-04-21 18:10:42 +00:00
Your Name
1625e7bd19
fix(telegram): 按鈕回覆靜默兩大根因修復
...
CD Pipeline / build-and-deploy (push) Successful in 17m40s
問題一:ai_advisory_* 按鈕(容量預測/合規等)
- 按下後只發 toast(2-3 秒消失),群組永無回覆
- 修法:_handle_ai_advisory_action 加 message_id 參數,
answer_callback 後額外 sendMessage reply 到原卡片
問題二:已解決告警再次點「批准」
- sign_approval early-return(status != pending)但
_notify_approval_result 仍發「⚡ 執行中...」→ 永無後續
- 修法:僅 approval.status == APPROVED 時才發「執行中...」
其他終態改發「ℹ️ 此告警已處理(狀態:...)」並 return
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:57:55 +08:00
AWOOOI CD
ca8361e0bc
chore(cd): deploy 6d5f070 [skip ci]
2026-04-21 17:56:34 +00:00
Your Name
6d5f07045d
fix(ci): B5 整合測試補 DATABASE_URL — Settings 必填修復
...
CD Pipeline / build-and-deploy (push) Successful in 10m56s
B5 step 只設 TEST_DATABASE_URL,但 import chain 在 collection 階段
就初始化 Settings(),導致 DATABASE_URL Field required 崩潰。
補入同值的 DATABASE_URL 讓 Pydantic 通過驗證。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:46:04 +08:00
Your Name
a6788c2baa
fix(tests): 移 DB 測試到 integration 層修復 CI asyncpg 密碼錯誤
...
CD Pipeline / build-and-deploy (push) Failing after 1m55s
test_aider_event_processor.py 的三個真實 DB 測試在 CI 單元測試層
(tests/)因連線 awoooi_dev DB 失敗(密碼不符)而中斷。
正確架構:
tests/ — 單元測試,CI 直接跑,無 DB
tests/integration/ — 整合測試,CI --ignore,K8s E2E 覆蓋
修復:
- tests/test_aider_event_processor.py 只保留無 DB 的 malformed payload 測試
- 三個 DB 測試移至 tests/integration/test_aider_event_processor_integration.py
改用 conftest db_session fixture,不自建 engine(避免密碼硬碼)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:41:34 +08:00
Your Name
5e353407f7
fix(ci): DATABASE_URL 必填後 CI 單元測試報 ValidationError 修復
...
CD Pipeline / build-and-deploy (push) Failing after 41s
C4 安全修復移除 changeme 預設值後,Pydantic Settings 在 CI 環境找不到
DATABASE_URL 導致 import chain 崩潰(pydantic_core.ValidationError)。
單元測試本身不連 DB,只需 Settings 能初始化。加入 CI placeholder:
DATABASE_URL="${DATABASE_URL:-postgresql+asyncpg://ci:ci@localhost/ci}"
若 CI 已注入真實 secret 則使用真實值;否則使用 localhost placeholder。
影響範圍:cd.yaml Run API Tests、cd-dev.yaml Run API Tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:35:19 +08:00
Your Name
479f8d8971
refactor(tests): 技術債清零 — 移除 FakeRepo/FakeSession Mock DB 違規
...
CD Pipeline / build-and-deploy (push) Failing after 35s
## ai_router.py
- 抽取 _aggregate_feedback_stats() 純函數,feedback_from_aider_events 呼叫它
## aider_event_processor.py
- _process_one 加 _session_factory=None DI 參數(預設 get_session_factory())
- 可注入測試 factory,不改既有生產邏輯
## test_ai_router_feedback.py(完全重寫)
- 移除 FakeRepo/FakeSession,改為直接測試 _aggregate_feedback_stats 純函數
- 新增 test_feedback_skips_missing_model 邊界條件
- DB 失敗降級行為 test 保留(只 patch get_session_factory,無 FakeRepo)
## test_aider_event_processor.py(完全重寫)
- 移除 FakeRepo/FakeSession,改用真實 PostgreSQL(real_factory fixture)
- Redis xack + IncidentEngine 保留 mock(外部 broker/AI 服務,符合例外)
- 每個測試後 rollback,不污染 dev DB
## setup_test_schema.sql
- 補入 aider_events_payload_gin GIN index(與 adr091 生產 migration 一致)
## integration/conftest.py
- 補注解說明密碼名稱 awoooi_prod_2026 的歷史混淆
- 修正 assert 邏輯:檢查 DB 名稱而非 URL 字串,避免密碼含 prod 觸發誤判
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:33:30 +08:00
Your Name
d0591c54b0
fix(security): 體健修復 — 7項 Critical/Major 安全問題全修
...
CD Pipeline / build-and-deploy (push) Failing after 35s
## Critical 修復 (C1-C5)
- C1: git rm --cached 03-secrets.yaml(CHANGE_ME 模板不再追蹤)
- C2: git rm --cached awoooi.db + .gitignore 加 *.db(SQLite HARD_RULES 違規)
- C3: sentry-tunnel SENTRY_HOST 改為 process.env fallback
- C4: config.py DATABASE_URL 移除 changeme default,改為必填
- C5: run_migration.py 改為 os.environ["DATABASE_URL"]
## Major 修復 (M1-M4)
- M1: auto_repair /execute 加 CSRF 保護 + AutoRepairPanel.tsx 同步
- M2: drift /rollback /adopt 加 CSRF 保護(/internal/scan 保持無 CSRF)
- M3: terminal /intent 加 CSRF 保護 + terminal.store.ts 同步
- M4: live-dashboard HOST_IPS + host-grid VIP 改為 env var
## 其他
- 新增 apps/web/.env.example(6 個 env var 說明)
- K8s deployment-web 補入 3 個新 env var
- 整合測試:新增 aider_event_repository + ai_router_feedback 真實 DB 測試
- test_terminal.py CSRF dependency override 修復
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:27:39 +08:00
Your Name
3dbb3d70b4
feat(claude): 新增 awoooi-guard.js 守衛 hook
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 00:24:18 +08:00
Your Name
8f15c57019
feat(claude): 套用 ty-ai-standards Global-Local 架構
...
- 新增 .claude/agents/:12 個標準化 subagents(critic / debugger / planner 等)
- 新增 .claude/hooks/secrets.local.json:AWOOOI 專屬 Token 偵測 patterns
- 新增 .claude/hooks/branch-protection.local.json:保護 production 分支
- 更新 .claude/settings.json:加入 hooks 區段(全域 hooks 疊加執行)
- 更新 CLAUDE.md:加入全域參照行 + 安全架構說明
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 00:18:14 +08:00
AWOOOI CD
49e465954c
chore(cd): deploy 4fc1f49 [skip ci]
2026-04-21 14:35:32 +00:00
Your Name
4fc1f49dca
fix(pipeline): 三斷點修復 — SLO公式+NO_ACTION堆積+幻覺降級風險
...
CD Pipeline / build-and-deploy (push) Successful in 14m3s
D1 flywheel_stats_service: execution_count 欄位不存在 → 改讀
success_count+failure_count;消除飛輪執行成功率永遠 0.0% 假象
D2 openclaw._validate_deployment_inventory: 幻覺 deployment 降級後
原 HIGH/CRITICAL risk 未清零 → 加 result.risk_level = AIRiskLevel.LOW
D3 webhooks.py (兩處 alert path): NO_ACTION/INVESTIGATE/OBSERVE 三類
非破壞性動作強制 risk_level = LOW,跳過 Telegram 批准直接 auto-approve
→ approval_execution.py 的 NO_ACTION handler 立即標 EXECUTION_SUCCESS
Root cause 鏈:BUTTON_DATA_INVALID 修復後 TG 按鈕可發,但 NO_ACTION
積壓的 35 筆 PENDING 是因 HIGH risk 無法走 auto-approve 路徑導致。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 22:26:07 +08:00
Your Name
e2742ce9f3
docs: BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復 記錄
...
LOGBOOK + ADR-092 附錄 C — 2026-04-21 修復紀錄
E2E 驗證: telegram_approval_card_sent message_id=25045 (SignOzDown) ✓
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 21:59:00 +08:00
AWOOOI CD
0a72ae21e4
chore(cd): deploy 8fd31ec [skip ci]
2026-04-21 13:38:44 +00:00
Your Name
8fd31eca66
fix(telegram): nonce UUID base64url 壓縮 — 徹底解決 BUTTON_DATA_INVALID
...
CD Pipeline / build-and-deploy (push) Successful in 9m45s
前次修法(truncate random)不完整:host_restart_service(20 chars) 即使去掉 random
仍 68 bytes > 64 限制。
根本修法:UUID (36 chars) → base64url encode UUID bytes → 22 chars
nonce 格式:{action}:{b64url_uuid}:{timestamp}:{random}
最長 case: host_restart_service(20)+22+10+8+3 colons = 63 bytes
generate_callback_nonce: UUID → base64url 22 chars
parse_callback_data: 22-char b64url → 還原完整 UUID,handler 不需改動
全 action 驗證:approve/silence/reject/docker_restart/host_restart_service/renew_cert
全部 ≤ 63 bytes,UUID round-trip 正確。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 21:30:20 +08:00
AWOOOI CD
4bc183742f
chore(cd): deploy bd73548 [skip ci]
2026-04-21 13:26:51 +00:00
Your Name
bd735482f7
fix(telegram): BUTTON_DATA_INVALID — nonce 超過 64 bytes 根因修復
...
CD Pipeline / build-and-deploy (push) Has been cancelled
根因:Telegram callback_data 上限 64 bytes。
5 個長 action 名(docker_restart/host_restart_service 等)+ UUID approval_id
= 71-77 bytes → BUTTON_DATA_INVALID。
修復:
1. security_interceptor.generate_callback_nonce:若 nonce > 63 bytes,
改用 3-part 格式(捨棄 random)— timestamp 仍保時間唯一性。
2. security_interceptor.parse_callback_data:接受 3-part 或 4-part 格式。
3. telegram_gateway:移除 debug payload logging(診斷完成)。
影響 action:docker_restart / host_restart_service / host_clear_log /
reload_nginx / renew_cert(全部 > 7 chars + UUID = 64 bytes 以上)。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 21:17:49 +08:00
AWOOOI CD
a2777aee04
chore(cd): deploy 685f5c6 [skip ci]
2026-04-21 13:05:41 +00:00
Your Name
685f5c684f
debug(telegram): log full payload on 4xx to diagnose BUTTON_DATA_INVALID
...
CD Pipeline / build-and-deploy (push) Successful in 13m29s
前次 response_body 已確認錯誤碼,這次記錄完整 payload(payload_preview 前
1000 bytes)以找出觸發 BUTTON_DATA_INVALID 的確切欄位。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 20:56:28 +08:00
AWOOOI CD
4bc52a9bdc
chore(cd): deploy acab1cd [skip ci]
2026-04-21 07:29:25 +00:00
Your Name
acab1cd95e
fix(gitea-review): PR/push AI analysis always failing — 兩個根因修復
...
CD Pipeline / build-and-deploy (push) Successful in 17m26s
Root cause 1 (push review): local_code_review_service.review_push() 回傳
dict,但呼叫端直接存取 analysis.issues → AttributeError。
修復:_call_openclaw_push_review 將 dict 轉成 CodeReviewResult。
Root cause 2 (PR review): openclaw_http_service 呼叫
/api/v1/analyze/code-review 但 OpenClaw 從未實作此 endpoint(404)。
修復:_call_openclaw_code_review 改走 local_code_review_service.review_pr()
(Ollama qwen2.5-coder + Gemini fallback)。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-21 15:19:14 +08:00
AWOOOI CD
3c266190cf
chore(cd): deploy 3323a90 [skip ci]
2026-04-20 17:13:47 +00:00
Your Name
3323a9052c
debug: log telegram 400 response body to diagnose card send failure
CD Pipeline / build-and-deploy (push) Successful in 12m38s
2026-04-21 01:05:21 +08:00
Your Name
9e9bd8679f
fix(aider-watch): code-review fixes (4 issues)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
1. aiderw: session_end 補 model+cwd (AI Router feedback loop 修通)
2. repository: model_stats_since SQL 改 COALESCE(session_end, session_start) model
3. aider_event_service: classify_severity 移除 error_count 觸發告警(防假陽性)
4. worker: run_aider_event_processor_loop 包 proc.start() try/except(防靜默崩潰)
2026-04-20 @ Asia/Taipei
2026-04-21 00:59:21 +08:00
AWOOOI CD
e60c064bdc
chore(cd): deploy 9a44516 [skip ci]
2026-04-20 12:29:49 +00:00
Your Name
994817a23a
docs: ADR-092 附錄 A+B + LOGBOOK + MASTER §8 記錄四修與 C1-C4 全流程串接
...
- ADR-092: 附錄 A(B1-B4 四修 root cause + commit)+ 附錄 B(C1-C4 斷點修復表 + 架構鐵律)
- LOGBOOK: 新增 2026-04-20 晚 C1-C4 章節(斷點清單 + commits + 驗收步驟)
- MASTER §8: 追加 C1-C4 changelog(§3/§1.1 對齊 + 修復後行為說明)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-20 20:24:41 +08:00
Your Name
9a44516bf8
fix(aider-processor): init_worker_redis_pool before XREADGROUP
...
CD Pipeline / build-and-deploy (push) Successful in 9m35s
Worker pool 在 main.py lifespan 未初始化(signal_worker 同問題)。
在 AiderEventProcessor.start() 冪等呼叫 init_worker_redis_pool(),
確保 _consume_loop() 的 get_worker_redis() 不拋 RuntimeError。
2026-04-20 @ Asia/Taipei
2026-04-20 20:21:15 +08:00
Your Name
de2d34d4cd
fix(playbook): C1-C4 全流程串接 — evolver保護+seeder復活+規則即時建立+watchdog W-4
...
CD Pipeline / build-and-deploy (push) Has been cancelled
C1: playbook_evolver — yaml_rule source playbooks 加 YAML_RULE guard,
evolver 不再封存 seeder 建立的 APPROVED playbook,保護自動修復鏈路
C2: playbook_seed_service — idempotency SQL 排除 DEPRECATED 記錄,
evolver 封存後重啟可復活 yaml_rule playbooks
C3: alert_rule_engine — AI 自動生成規則成功後立即呼叫 seed_playbooks_from_rules(),
不等下次重啟即可建立對應 APPROVED Playbook
C4: ai_slo_watchdog_job — 新增 W-4 APPROVED playbook 數量為 0 告警,
鏈路斷裂立即 TYPE-8M;total checks 由 3 升為 4
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 20:18:11 +08:00
Your Name
7ca6d12ce2
fix(aider): remove dead get_aider_event_repository factory (resource leak)
...
get_db_context import unused after removing broken factory function.
Worker manages its own session via get_session_factory(). 2026-04-20 @ Asia/Taipei
2026-04-20 20:18:11 +08:00
AWOOOI CD
f9ff23f007
chore(cd): deploy 156a52f [skip ci]
2026-04-20 12:09:31 +00:00
Your Name
39ac292c90
docs(master): §8 追加 ADR-092 四修記錄 + project_current_status 更新
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 20:01:50 +08:00
Your Name
156a52f807
fix(aiops): ADR-092 三修 — Playbook enum崩潰 + Telegram永久靜默 + 採納失敗 + AI自健診
...
CD Pipeline / build-and-deploy (push) Successful in 13m33s
B1 playbook_service.py: evolver setattr傳str而非PlaybookStatus enum
→ _pg_upsert playbook.status.value炸(163次/48h),修:update_with_validation強制enum轉型
B2 approval_db.py + webhooks.py: find_by_fingerprint PENDING誤收斂
→ PENDING≠Telegram已發;修:成功push後mark tg_sent:{fingerprint} Redis(24h TTL)
→ find_by_fingerprint debounce窗外PENDING必須Redis確認才收斂
drift_adopt_service.py: telegram_gateway呼叫adopt_drift(report_id)但方法不存在
→ 新增adopt_drift()包裝:從DB載入DriftReport後委派adopt(),修復採納失敗
B3 ai_slo_watchdog_job.py + main.py: AI無法感知自身故障(MASTER §1.1盲區)
→ 新增每15分鐘自健診:W-1 SLO違反 W-2 TG靜默偵測 W-3 飛輪成功率
→ 任一異常→TYPE-8M send_meta_alert;Redis去重1h
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 20:00:06 +08:00
Your Name
1744b1e923
fix(aider): stdlib logging → structlog + typing-extensions dep (E2E修復)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- aider_events.py: logging.getLogger → structlog.get_logger (keyword args compatible)
- pyproject.toml: add typing-extensions>=4.0 (python-ulid 3.x requires Self)
2026-04-20 @ Asia/Taipei
2026-04-20 19:59:35 +08:00
AWOOOI CD
72aea671b3
chore(cd): deploy ce918ee [skip ci]
2026-04-20 11:48:59 +00:00
Your Name
ce918ee44e
feat(client): B5 install.sh + launchd aider-flush plist
...
CD Pipeline / build-and-deploy (push) Successful in 10m18s
Mac 端安裝腳本:pipx install aider-watch-client → symlink 到 /opt/homebrew/bin →
驗 ~/.aider-watch.env 必要 key → 建 ~/aider-watch 工作目錄 →
載 launchd com.awoooi.aider-flush(每 5min flush buffer)→ 跑 aider-watch doctor。
走 a 路線(LAN direct AIDER_API_URL=http://192.168.0.120:32334/api/v1/aider/events) 。
全景檢查:家用場景,B3 buffer + 5min flush 已覆蓋短暫斷網,無需 Tailscale。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 19:40:02 +08:00
Your Name
b7d612526a
chore(client): gitignore egg-info + remove accidentally committed generated files
2026-04-20 19:40:02 +08:00
Your Name
36610e2744
feat(client): Mac aider-watch client (B1-B4: scaffolding + api_client + buffer + aiderw)
2026-04-20 19:40:02 +08:00
Your Name
e1539a813e
feat(config+main): aider-watch v2 settings + router + lifespan register
...
- Add 4 settings to config.py: AIDER_WEBHOOK_SECRET, AIDER_EVENTS_STREAM_KEY, AIDER_PATTERN_EXTRACT_INTERVAL_HOURS, USE_AIDER_FEEDBACK (ADR-091)
- Import aider_events_v1 router in main.py imports (alphabetical after ai_slo_v1)
- Register aider_events_v1.router in include_router block (after alert_operation_logs_v1)
- Register run_aider_event_processor_loop() in lifespan (after compliance_scanner_loop)
- All 65 tests pass (24 action_parsing + 41 aider-watch tests)
Co-Authored-By: Claude Haiku 4.5 (1M context) <noreply@anthropic.com >
2026-04-20 19:40:02 +08:00
Your Name
40771cda6d
feat(ai_router): feedback_from_aider_events read-only hook (Phase 24 A8)
2026-04-20 19:40:01 +08:00
Your Name
df72da69e2
feat(worker): AiderEventProcessor — Redis stream consumer + incident + DB write
...
- Implement Task A7: background worker consuming signals:aider:events stream
- Parse AiderEventIn from Redis XREADGROUP messages
- Call IncidentEngine.process_signal for incident-worthy events
- Persist aider_events to PostgreSQL with optional incident_id FK
- XACK on success, preserve in pending list on DB failure (retry)
- ACK on parse failure (bad JSON avoids pending list jam)
- Match signal_worker.py pattern: no Active Sweeper (MVP)
- Unit tests: 4 tests covering incident creation, non-incident events, malformed payloads, engine failures
Tests: 37 passed (4 new + 33 existing regression)
2026-04-20 19:40:01 +08:00
Your Name
cd894310dc
feat(api): POST /api/v1/aider/events HMAC webhook + Redis stream push
...
- Router layer: HTTP validation + HMAC-SHA256 signature verification
- Service layer: Redis stream push (aider_event_service.push_aider_batch_to_stream)
- leWOOOgo積木化遵循: Router → Service → Redis
- All 6 tests passing (signature validation, batch limits, edge cases)
2026-04-20 19:40:01 +08:00