Commit Graph

12 Commits

Author SHA1 Message Date
Your Name
ee2cc2bfc3 fix(alerts): 收斂 Telegram 告警到 SRE 戰情室
Some checks failed
CD Pipeline / tests (push) Failing after 1m23s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 15s
2026-06-12 11:06:16 +08:00
Your Name
7f3722c7f7 fix(ai): improve docker repair verification signals
Some checks failed
CD Pipeline / tests (push) Successful in 1m22s
Code Review / ai-code-review (push) Successful in 13s
CD Pipeline / build-and-deploy (push) Successful in 4m10s
CD Pipeline / post-deploy-checks (push) Has been cancelled
2026-06-01 19:27:36 +08:00
Your Name
1a2b04f5cf fix(awooop): persist signal metadata and auto-repair prestate
Some checks failed
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m14s
CD Pipeline / build-and-deploy (push) Failing after 3m33s
CD Pipeline / post-deploy-checks (push) Has been skipped
2026-05-18 10:59:54 +08:00
Your Name
813d088339 feat(auto-repair): route ssh diagnostics through mcp gateway
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
run-migration / migrate (push) Successful in 9s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / build-and-deploy (push) Successful in 3m17s
CD Pipeline / post-deploy-checks (push) Successful in 1m16s
2026-05-14 21:13:05 +08:00
Your Name
5fb73a5612 fix(verifier): recognize rollout success evidence
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m3s
CD Pipeline / build-and-deploy (push) Successful in 3m38s
CD Pipeline / post-deploy-checks (push) Successful in 1m30s
2026-05-14 00:37:32 +08:00
Your Name
518a16e895 fix(awooop): persist auto repair verification fallback
Some checks failed
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Failing after 3m16s
CD Pipeline / post-deploy-checks (push) Has been skipped
2026-05-13 18:47:46 +08:00
Your Name
1a03bceb5c feat(awooop): route post verify mcp through gateway
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m7s
CD Pipeline / build-and-deploy (push) Successful in 10m15s
CD Pipeline / post-deploy-checks (push) Successful in 1m54s
2026-05-13 10:30:03 +08:00
Your Name
ecc65be6e1 fix(telegram): route direct sends through gateway
All checks were successful
Code Review / ai-code-review (push) Successful in 13s
CD Pipeline / tests (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Successful in 3m29s
CD Pipeline / post-deploy-checks (push) Successful in 1m20s
2026-05-07 00:33:27 +08:00
Your Name
fa0e956c0e fix(mcp): tag legacy provider calls with audit context
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 59s
CD Pipeline / build-and-deploy (push) Successful in 3m22s
CD Pipeline / post-deploy-checks (push) Successful in 1m19s
2026-05-06 17:18:52 +08:00
Your Name
61f5a6a419 fix(telegram): route alerts to SRE war room
Some checks failed
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
2026-04-30 15:01:23 +08:00
Your Name
3668d49f2f feat(flywheel): W2 三件 + KMWriter critic 修法(1635 tests 全綠)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m38s
W2 (onboarder 4 週飛輪 80→90 路徑第二週) + critic PR review 5 個 critical/major
全部修完,default flag=false 安全無爆炸風險。

## W2 三件 PR

### PR-R2 — AOL → catalog confidence EWMA 回灌(修飛輪斷鏈 C2)
- 新檔 `apps/api/src/jobs/aol_to_catalog_writeback_job.py`
- 邏輯:每小時掃 AOL 計算 EWMA confidence (alpha=0.3) 回灌 alert_rule_catalog
- 失敗閾值 N=5 連續低成功率 → review_status='draft'
- Hermes _fetch_noisy_rules SQL 加 OR review_status='draft'
- ENABLE_AOL_WRITEBACK_JOB=false (default)
- 8 個測試(mock path 修正:lazy import → patch src.db.base.get_db_context)

### PR-V1 — self_healing_validator 串接 (修飛輪斷鏈 C6)
- 新檔 `apps/api/src/services/self_healing_validator.py`(純函數 assess_self_healing)
- post_execution_verifier.py step 5 串接(feature flag gate)
- evidence_snapshot.py 加 self_healing_score / self_healing_detail 欄位
- db/models.py + base.py ALTER IF NOT EXISTS
- score < 0.5 → 觸發 rollback 提案 Telegram alert(不自動執行)
- ENABLE_SELF_HEALING_VALIDATOR=false (default)
- 7 個測試

### PR-L1 — KM ↔ Playbook 雙向回路 (修飛輪斷鏈 C3+C4)
- learning_service.py 三條新邏輯:
  1. _write_playbook_evolution_km:promote/demote 寫 KM 演化條目
  2. _check_and_mark_playbook_review:N=5 累積觸發 review_required
  3. _demote_alert_rule_catalog_confidence:DEPRECATED → confidence×=0.5
- PlaybookRecord 加 review_required 欄位(schema migration via base.py)
- ENABLE_KM_PLAYBOOK_FEEDBACK_LOOP=false (default)
- KM_PLAYBOOK_REVIEW_THRESHOLD=5 可調
- 6 個測試

## KMWriter Critic 5 個 Critical/Major 修復(之前 critic PR review 發現)
之前 push commit c5753e1c 已修,本 commit 補回 stash 中的對應檔案:
- C1 km_writer.py:194 backfill 自打臉(已修:同步 await + DLQ)
- C2 km_writer.py:391 KM_WRITE_AWAIT=false 路徑收緊
- M1 decision_manager.py:2178/2203 移除 _fire_and_forget
- M2 incident_service.py:1099 自製 path 加 retry+DLQ
- M3 km_writer.py:166 冪等聲明對齊(UPSERT + partial unique index)

## 驗證
- 1635 unit tests 全綠(+27 from 1608)
- 與 fb0c72db (推翻 A2 Ollama primary) 共存無衝突
- 所有新 Job/Service default flag=false(不爆炸)

## 期望影響
飛輪斷鏈 C2 + C3 + C4 + C6 全修
飛輪自主化評分:65 → 85 預估(W2 完成後)

啟用順序(待 prod fb0c72db 驗證 OLLAMA primary 跑得起來後):
1. ENABLE_AOL_WRITEBACK_JOB=true
2. ENABLE_KM_PLAYBOOK_FEEDBACK_LOOP=true
3. ENABLE_SELF_HEALING_VALIDATOR=true

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 19:44:04 +08:00
OG T
f1cbf6db7d feat(adr-081): Phase 1 感官縱深 — 8D 情報蒐集 + 執行後驗證
成品:
- IncidentEvidence DB model(8D 感官 + pre/post 執行狀態)
- EvidenceSnapshot dataclass(build_summary → LLM 上下文)
- SanitizationService(Prompt Injection 0-tolerance,12 pattern)
- MCPToolRegistry(動態工具登記,suggest_tools 不寫死告警類型)
- PreDecisionInvestigator(8D 並行感官,P99 < 8s,Redis 30s 快取)
- PostExecutionVerifier(warmup 10s → 後狀態評估 success/degraded/failed)
- decision_manager + approval_execution 接線(feature flag 守衛)

Gate 1 修復:D4/D5/D7/D8 補 sanitize_dict_values;移除裸 "error" failure
signal 防 error_rate key 誤判;evidence_snapshot rowcount 零行警告。

測試:130 passed(+111 新增)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 13:08:38 +08:00