awoooi

Author	SHA1	Message	Date
Your Name	6e04fe9c8a	feat(playbook): generate drafts with local llm Some checks failed CD Pipeline / tests (push) Successful in 1m28s Details Code Review / ai-code-review (push) Successful in 29s Details Type Sync Check / check-type-sync (push) Failing after 2m41s Details CD Pipeline / build-and-deploy (push) Successful in 8m40s Details CD Pipeline / post-deploy-checks (push) Successful in 3m10s Details	2026-04-30 23:04:58 +08:00
Your Name	9908fdf50d	feat(p3.1-t2-patha): DiagnosisAggregator 路徑 A + Solver F4 critical reject + 對齊測試 Some checks failed CD Pipeline / build-and-deploy (push) Failing after 1m59s Details Wave 8 P3.1-T2 PathA 啟用 + Solver F4 安全強化 + test 對齊： PathA — DiagnosisAggregator 信號分類層補 PDI: - ENABLE_DIAGNOSIS_AGGREGATOR default=False → True · PathA 純信號分類層（OOMKilled/CrashLoop 等業務邏輯） · 不重複呼叫 K8s/SignOz API（只取 PDI 已收集的 raw 資料） · 安全 default on — 純邏輯處理，無外部依賴重疊 - diagnosis_aggregator.py +155 行（PathA 實作） - pre_decision_investigator.py 已接 (commit `3a2cd151`) F4 — Solver critical risk reject: - solver_agent.py: _validate_recommended_action 拒絕 risk=critical · 鐵律：critical 動作必須走人工審批，不可變 Telegram 按鈕 · log warning + return None（被 _extract 過濾掉） - _extract_recommended_actions 改返回 (list, status_str) tuple · status="ok"/"empty"/"all_invalid" 供呼叫端決策 - protocol.py +16 / metrics.py +9 / ai_router.py +18 — 配套 metric + protocol field 測試對齊: - test_solver_recommended_actions.py 拆 test_all_valid → low/medium/high accepted + test_critical_rejected - result tuple unpack: result, _ = _extract_recommended_actions(...) - test_diagnosis_aggregator_stub.py: feature flag default 改 True 對齊 PathA Tests: 51 passed (solver 28 + aggregator 16 + router fallback 8) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Multiple Engineers (Wave 8 P3.1-T2 PathA + F4) <noreply@anthropic.com>	2026-04-27 14:42:29 +08:00
Your Name	fb130c9a28	feat(p3.1-t2): DiagnosisAggregator stub tests + sanitization 補強 + metrics 補欄 Some checks failed CD Pipeline / build-and-deploy (push) Failing after 2m16s Details Wave 8 P3.1-T2 後續補測 + 配套：新增測試: - test_diagnosis_aggregator_stub.py (238 行) — 15 tests · stub fixture 驗證 _collect_diagnosis_aggregator 接線 · feature flag default off 不呼叫 · timeout 邊界 / exception fail-soft 修改: - core/metrics.py +23 — 新增 DiagnosisAggregator 相關 Prometheus 指標 - sanitization_service.py +24 — 補強 prompt sanitize 邊界（vuln #4 配套） - RUNBOOK-AGENT-STEP-LATENCY.md / agent_step_latency_rules.yaml — 微調 Tests: 15 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 08:30:26 +08:00
Your Name	fefe4c21cd	fix(inc-20260425): A1+A2 後續 — Solver/Critic timeout + auto_repair 接線 + Runbook + Grafana Some checks failed CD Pipeline / build-and-deploy (push) Has been cancelled Details 延續 `595629c0` INC-20260425 修復，補三段 Agent + 全鏈路觀測： A1 後續 — Solver/Critic 三段 timeout 接線: - solver_agent.py: AGENT_SOLVER_TIMEOUT_SEC=20.0（env override） - critic_agent.py: AGENT_CRITIC_TIMEOUT_SEC=15.0（env override） - protocol.py: 三 Agent 共用 observe_agent_step() 包裹呼叫 · success/timeout/error outcome label · histogram 寫入 aiops_agent_step_duration_seconds A2 後續 — auto_repair_service 改用 _diagnose_fallback_chain: - auto_repair_service.py +46 行 — 切換 DIAGNOSE 路由到新 chain（NEMO→GEMINI→CLAUDE） - 完全避開 Ollama CPU 238s 二次 timeout 新增 metrics: - core/metrics.py +59 行 — 配合 observe_agent_step 的 histogram bucket + label cardinality 新增測試 (862 行): - test_agent_step_timeouts.py (475) — 三 Agent 各 timeout 邊界 + outcome label - test_ai_router_diagnose_fallback.py (387) — _diagnose_fallback_chain 正確序新增配套: - docs/runbooks/RUNBOOK-AGENT-STEP-LATENCY.md (350) — INC 故障排查 + 觀測指引 - ops/monitoring/grafana/agent_step_latency_rules.yaml (160) · 三 Agent histogram alert rules（p99 > timeout 80% → warning）驗收: 33 tests pass (test_agent_step_timeouts 22 + test_ai_router_diagnose_fallback 11) INC-20260425 雙修總工作量（595629c0 + 此 commit）: · 5 個 service/agent 檔修改 · 1 個新 observability 模組 · 4 個新測試/配套檔 · 1372+187 = 1559 行新增 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 (INC-20260425 後續) <noreply@anthropic.com>	2026-04-27 08:15:53 +08:00
Your Name	2c57b71db9	feat(wave5-p2): GovernanceAgent 4 項自檢 + Ollama 健康告警規則 + Prometheus metrics 整合 All checks were successful CD Pipeline / build-and-deploy (push) Successful in 10m45s Details MASTER plan_complete_v3.md Wave 5 P2.2 + P2.3 完成（multiple engineers 在限額前完成代碼，補 commit）： P2.2 — GovernanceAgent 4 項自檢: - governance_agent.py (342 行) — 每 1 小時自檢循環: · trust_drift（信任度漂移檢測） · knowledge_degradation（知識退化檢測） · llm_hallucination（LLM 幻覺檢測） · execution_blast_radius（執行爆炸半徑檢測） - main.py lifespan: asyncio.create_task(run_governance_loop()) 啟動 try/except 包裹，schedule 失敗不阻斷主流程 - failover_alerter.py: alert_governance(event_type, payload) 1h dedup 四類事件 → Telegram MarkdownV2 告警 P2.3 — Ollama 健康規則 + Prometheus Metrics: - ops/monitoring/ollama_health_rules.yaml (148 行): · OllamaHealthDegraded / OllamaPrimaryDown · OllamaFailoverTriggered / GeminiQuotaExceeded · 補 Prometheus 取資料的 alert rules - core/metrics.py (57 行): · GEMINI_DAILY_CALL_COUNT / GEMINI_DAILY_QUOTA Gauge · OLLAMA_FAILOVER_TRIGGERED_TOTAL Counter · OLLAMA_CURRENT_PRIMARY_IS_OLLAMA Gauge - ollama_failover_manager.py: · _check_gemini_quota: 每次 check 同步更新 Gauge（讓 Prometheus 取最新值） · select_provider: failover 時 inc Counter + 切 Primary Gauge · try/except 包裹，metric 失敗不阻斷主路由 E2E 測試: - test_failover_e2e_dispatch.py (365 行) 完整 dispatch 路徑：health check → failover decide → alerter → metrics Tests: 54 passed (e2e_dispatch + failover_manager + failover_alerter) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Multiple Engineers (上 session Wave 5) <noreply@anthropic.com>	2026-04-26 20:56:19 +08:00
OG T	d89f0520f9	fix(api): 修復 34 個 Ruff lint 錯誤 - 自動修復 import 排序、unused imports - 手動修復 raise from、isinstance union、unused variable - scripts/ 暫時保留 (非 CI 阻擋) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-29 15:27:49 +08:00

6 Commits