Commit Graph

240 Commits

Author SHA1 Message Date
Your Name
6868a9a93d feat(awooop): link telegram alerts to incident runs
Some checks failed
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Failing after 1m58s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
2026-05-17 22:17:21 +08:00
Your Name
665e72ba33 feat(awooop): filter runs by remediation evidence
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m4s
CD Pipeline / build-and-deploy (push) Successful in 4m9s
CD Pipeline / post-deploy-checks (push) Successful in 1m26s
2026-05-17 21:13:54 +08:00
Your Name
0592402779 feat(awooop): surface remediation evidence in run lists
All checks were successful
Code Review / ai-code-review (push) Successful in 12s
CD Pipeline / tests (push) Successful in 1m9s
CD Pipeline / build-and-deploy (push) Successful in 4m1s
CD Pipeline / post-deploy-checks (push) Successful in 1m54s
2026-05-17 20:26:03 +08:00
Your Name
392cfb9025 feat(governance): surface remediation dry run history
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m1s
CD Pipeline / build-and-deploy (push) Successful in 3m45s
CD Pipeline / post-deploy-checks (push) Successful in 1m39s
2026-05-14 22:56:51 +08:00
Your Name
04fdaee83a feat(governance): add remediation dry run entrypoint
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m5s
CD Pipeline / build-and-deploy (push) Successful in 3m43s
CD Pipeline / post-deploy-checks (push) Successful in 1m33s
2026-05-14 22:20:34 +08:00
Your Name
809bc9670b feat(governance): surface adr100 slo states
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m0s
CD Pipeline / build-and-deploy (push) Successful in 4m0s
CD Pipeline / post-deploy-checks (push) Successful in 1m55s
2026-05-14 19:57:32 +08:00
Your Name
5604dd0256 fix(auto-repair): mark approval execution status
All checks were successful
Code Review / ai-code-review (push) Successful in 9s
CD Pipeline / tests (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Successful in 3m27s
CD Pipeline / post-deploy-checks (push) Successful in 1m30s
2026-05-14 01:00:49 +08:00
Your Name
d835b666cf fix(alertmanager): keep auto repair moving on ai fallback
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Successful in 3m25s
CD Pipeline / post-deploy-checks (push) Successful in 1m30s
2026-05-14 00:06:49 +08:00
Your Name
77aace7515 feat(awooop): show inbound event dossiers
All checks were successful
Code Review / ai-code-review (push) Successful in 17s
CD Pipeline / tests (push) Successful in 1m19s
CD Pipeline / build-and-deploy (push) Successful in 3m30s
CD Pipeline / post-deploy-checks (push) Successful in 1m34s
2026-05-13 21:59:16 +08:00
Your Name
795085170a feat(awooop): persist inbound source envelopes
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m23s
CD Pipeline / build-and-deploy (push) Successful in 3m37s
CD Pipeline / post-deploy-checks (push) Successful in 1m34s
2026-05-13 21:29:04 +08:00
Your Name
c2d01eb6f1 feat(awooop): mirror alertmanager events into truth chain
All checks were successful
Code Review / ai-code-review (push) Successful in 19s
CD Pipeline / tests (push) Successful in 2m10s
CD Pipeline / build-and-deploy (push) Successful in 3m22s
CD Pipeline / post-deploy-checks (push) Successful in 1m17s
2026-05-13 20:16:42 +08:00
Your Name
596f2f6820 fix(awooop): link auto approved execution evidence
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m17s
CD Pipeline / build-and-deploy (push) Successful in 3m42s
CD Pipeline / post-deploy-checks (push) Successful in 1m21s
2026-05-13 19:14:17 +08:00
Your Name
518a16e895 fix(awooop): persist auto repair verification fallback
Some checks failed
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Failing after 3m16s
CD Pipeline / post-deploy-checks (push) Has been skipped
2026-05-13 18:47:46 +08:00
Your Name
356bfce2c8 fix(awooop): expose quality summary aggregate
All checks were successful
Code Review / ai-code-review (push) Successful in 16s
CD Pipeline / tests (push) Successful in 1m7s
CD Pipeline / build-and-deploy (push) Successful in 3m27s
CD Pipeline / post-deploy-checks (push) Successful in 1m32s
2026-05-13 16:34:20 +08:00
Your Name
ae7c7cbd23 feat(awooop): summarize automation quality
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m14s
CD Pipeline / build-and-deploy (push) Successful in 3m43s
CD Pipeline / post-deploy-checks (push) Successful in 1m29s
2026-05-13 15:56:42 +08:00
Your Name
af9798a62e feat(awooop): surface reconciliation in incident timeline
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m9s
CD Pipeline / build-and-deploy (push) Successful in 5m4s
CD Pipeline / post-deploy-checks (push) Successful in 1m15s
2026-05-13 09:22:51 +08:00
Your Name
f7c84530d6 feat(awooop): 新增 truth-chain 查詢 API
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m16s
CD Pipeline / build-and-deploy (push) Successful in 3m29s
CD Pipeline / post-deploy-checks (push) Successful in 1m19s
2026-05-12 22:55:36 +08:00
Your Name
ad8ead2546 fix(awooop): route ci notifications through event mirror
Some checks failed
Code Review / ai-code-review (push) Successful in 14s
CD Pipeline / tests (push) Successful in 1m18s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-12 13:58:08 +08:00
Your Name
66e22e26cb feat(awooop): add run detail timeline
All checks were successful
Code Review / ai-code-review (push) Successful in 12s
CD Pipeline / tests (push) Successful in 1m18s
CD Pipeline / build-and-deploy (push) Successful in 3m58s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
2026-05-07 03:55:01 +08:00
Your Name
251554c044 fix(awooop): record grouped alert events
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m6s
CD Pipeline / build-and-deploy (push) Successful in 3m48s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
2026-05-07 01:35:09 +08:00
Your Name
edef1aa4c7 fix(incidents): batch decision token lookup
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m5s
CD Pipeline / build-and-deploy (push) Successful in 3m20s
CD Pipeline / post-deploy-checks (push) Successful in 1m19s
2026-05-06 21:27:46 +08:00
Your Name
a0179cec6e fix(incidents): keep list endpoint pure read
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m7s
CD Pipeline / build-and-deploy (push) Successful in 3m26s
CD Pipeline / post-deploy-checks (push) Successful in 1m17s
2026-05-06 21:17:25 +08:00
Your Name
7473a01322 fix(awooop): route runs list before dynamic run lookup
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m3s
CD Pipeline / build-and-deploy (push) Successful in 3m22s
CD Pipeline / post-deploy-checks (push) Successful in 1m16s
2026-05-06 17:29:56 +08:00
Your Name
4111ea4f9f fix(ai): remove 188 ollama provider
All checks were successful
Code Review / ai-code-review (push) Successful in 12s
CD Pipeline / tests (push) Successful in 1m13s
CD Pipeline / build-and-deploy (push) Successful in 3m36s
CD Pipeline / post-deploy-checks (push) Successful in 1m20s
2026-05-06 14:34:48 +08:00
OG T
c696b99ccf fix(awooop): authenticate approval decisions
All checks were successful
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m3s
CD Pipeline / build-and-deploy (push) Successful in 3m28s
CD Pipeline / post-deploy-checks (push) Successful in 1m25s
2026-05-06 13:05:51 +08:00
Your Name
09256be62c fix(rag): use bge embeddings on GCP Ollama lane
Some checks failed
Code Review / ai-code-review (push) Successful in 11s
CD Pipeline / tests (push) Successful in 1m22s
CD Pipeline / build-and-deploy (push) Failing after 2h14m5s
CD Pipeline / post-deploy-checks (push) Has been cancelled
2026-05-06 05:49:37 +08:00
Your Name
3b64d66836 fix(alerts): guard approval actions and wire playbook learning
All checks were successful
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Successful in 3m31s
CD Pipeline / post-deploy-checks (push) Successful in 1m18s
2026-05-06 03:34:24 +08:00
Your Name
a2c4b3d47e fix(awooop): align console with flywheel execution metrics
Some checks failed
Code Review / ai-code-review (push) Has been cancelled
CD Pipeline / tests (push) Successful in 2m22s
CD Pipeline / build-and-deploy (push) Successful in 3m54s
CD Pipeline / post-deploy-checks (push) Successful in 1m17s
2026-05-06 00:46:08 +08:00
Your Name
e22b8e7ab2 feat(awooop): Operator Console API + 前端(leWOOOgo 積木化修復)
All checks were successful
Code Review / ai-code-review (push) Successful in 42s
後端:
- 新增 platform_operator_service.py(DB 存取集中 Service 層)
- Router 層移除 Depends(get_db),改呼叫 Service 函數
- tenants/contracts/operator_runs 三個 Router 符合 leWOOOgo 規範
- __init__.py 整合四個 platform router

前端:
- apps/web/src/app/[locale]/awooop/ 完整建立(7 個頁面)
- layout.tsx:四分頁導覽(tenants/contracts/runs/approvals)
- 全部使用 @/i18n/routing(Link/usePathname/useRouter)避免 i18n 路徑問題
- approvals page:10s 自動刷新、timeout 倒數、緊急紅色高亮

ADR-106/107/112/114/115/116

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:00:20 +08:00
Your Name
8629ac709b feat(awooop): Phase 1-8 完整實作 — AwoooP Agent Platform 六平面架構
Some checks failed
run-migration / migrate (push) Failing after 59s
Code Review / ai-code-review (push) Successful in 1m8s
Type Sync Check / check-type-sync (push) Successful in 2m27s
## Phase 1-3: Control Plane + Contract System
- awooop_phase1_control_plane_2026-05-04.sql: 12 張核心表 + RLS
- awooop_phase1_batch1_rls_2026-05-04.sql: 全部 FORCE RLS + GRANT
- packages/awooop-contracts/: 六合約 JSON Schema + golden fixtures
- src/models/awooop_contracts.py: Pydantic v2 contract models(extra=forbid)
- src/repositories/contract_repository.py: contract lifecycle(draft→published→active)
- src/services/contract_service.py: HMAC publish sig + Redis multi-sig activate
- src/services/schema_validator.py: LLM output validator(retry×3, E-SCHEMA-001)

## Phase 2: Tenant Isolation
- awooop_phase2_budget_ledger_2026-05-04.sql: budget_ledger + RLS
- src/services/budget_service.py: Token Budget Hard Kill 三層防線
- src/core/context.py: PROJECT_ID ContextVar(31 background loop 自動繼承)
- src/db/base.py + models.py: project_id 欄位 + RLS set_config 注入
- src/hermes/nl_gateway.py: project_id Redis key 前綴(Phase A 雙寫)
- src/services/anomaly_counter.py: per-project 改造(Phase A fallback)

## Phase 4: Platform Shell in Shadow Mode
- awooop_phase4_run_state_2026-05-04.sql: run_state + step_journal + idempotency
- src/services/run_state_machine.py: 8-state FSM + SKIP LOCKED + stale reaper
- src/services/platform_runtime.py: UUID v7 + W3C trace_id + shadow_execute
- src/services/audit_sink.py: PII/secret redaction 9 patterns
- src/api/v1/platform/runs.py: POST/GET /v1/platform/runs(Router→Service 架構)
- src/workers/platform_worker.py: SKIP LOCKED worker + heartbeat + reaper loop
- src/main.py: platform router + lifespan worker start/stop

## Phase 5: MCP Gateway 五閘門
- awooop_phase5_mcp_gateway_2026-05-04.sql: 4 表 + RLS
- src/plugins/mcp/gateway.py: McpGateway(Gate 1~5, E-MCP-GATE-001~009)
- src/plugins/mcp/redaction_middleware.py: 雙層 redaction + 16K 截斷
- src/plugins/mcp/registry.py: __provider name mangling(ADR-116)
- src/plugins/mcp/credential_resolver.py: k8s secret ref 解析
- tests/test_mcp_credential_isolation.py: 10 個迴歸測試(secret leak 防再現)

## Phase 6-8: EwoooC + Channel Hub + Approval Token
- awooop_phase6_ewoooc_onboarding_2026-05-04.sql: ewoooc tenant + 4 read-only MCP tools
- awooop_phase7_channel_hub_2026-05-04.sql: conversation_event + outbound_message
- src/services/provider_proxy.py: ProviderProxy + PlatformEnvelope(ADR-115)
- src/services/channel_hub.py: Telegram inbound mirror + Progressive Feedback(30s)
- src/services/awooop_approval_token.py: HS256 + jti NX replay 防護 + suggest mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 19:31:53 +08:00
Your Name
f6b698c873 fix(aiops): Critic 修復 — PromQL 注入防線 + flag=False escalation bug + 計數虛報
All checks were successful
Code Review / ai-code-review (push) Successful in 53s
Bug 1 (drift.py): DRIFT_AUTO_ADOPT_ENABLED=false 時仍設 auto_block_reason
  → 導致 escalation 被觸發,把「停用」誤判為「阻擋事故」
  修法: flag=False 不設 auto_block_reason,視為靜默停用

Bug 2 (coverage_evaluator_job.py): asset name/host/namespace/ip 直接 f-string
  進 PromQL,無白名單驗證
  → 髒資料可生成語意污染規則或讓 Prometheus reload 失敗
  修法: 加 _safe_label_val 正規表達式白名單(^[a-zA-Z0-9._\-]+$),
        不合法直接 skip + debug log

Bug 3 (coverage_evaluator_job.py): ON CONFLICT DO NOTHING 衝突時 created 仍 +1
  → stats["rules_auto_created"] 計數虛高,Redis 冷卻被誤設
  修法: 改用 INSERT ... RETURNING rule_name,fetchone() 確認實際插入才計數和設冷卻

附加: Redis RuntimeError 單獨 catch + log(不再靜默 pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:31:53 +08:00
Your Name
72cd79ed8b fix(aiops): Task2 drift auto-adopt 根因修復 + Task3 coverage gap 規則自動生成
All checks were successful
Code Review / ai-code-review (push) Successful in 48s
Task 2 — Drift 自動採納修根因:
  根因: _analyze_and_notify() 中 report 是 in-memory 物件,
        update_interpretation() 只更新 DB,不回寫 report.interpretation,
        導致 auto_adopt_if_safe() 永遠看到 None → 觸發「尚無 Nemotron 意圖分析」
        → Drift 自動採納 0 筆
  修法: report.interpretation = interpretation(DB 寫入後立即回寫記憶體)
  附加: DRIFT_AUTO_ADOPT_ENABLED flag(default=True,回滾: kubectl set env ...=false)

Task 3 — Coverage Gap → AI 規則自動生成執行器:
  根因: evaluate_once() 只分析 red 缺口,但無執行器將分析轉為實際規則
        → alert_rule_catalog 的 ai_generated source 永遠為 0 條
  修法: 新增 _auto_create_rules_for_uncovered_assets(run_id)
    · 查 auto_alerting=red 的 top 5 host/k8s_workload asset
    · 依 asset_type 生成範本化 PromQL rule(host→up, k8s→replicas_available)
    · UPSERT 進 alert_rule_catalog(source='ai_generated', review_status='pending_review')
    · Redis 24h 冷卻防重複,Redis 不可用時降級繼續
  附加: COVERAGE_AUTO_RULE_ENABLED flag(default=True,回滾: kubectl set env ...=false)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:22:51 +08:00
Your Name
f2f5148ca6 fix(awooop): Phase 2 第二批 P0 安全強化 + Redis key 命名空間修正
## P0-05 Callback Nonce 防偽造(ADR-116)
- security_interceptor.py:generate_callback_nonce() 新增 HMAC-SHA256[:16] 附加
  - 新 5-part 格式:{action}:{short_id}:{ts}:{rand}:{hmac16}
  - CALLBACK_HMAC_SECRET 未設定時降級 warning(向後相容)
- security_interceptor.py:parse_callback_data() 新增 5-part 分支 + HMAC 驗證
- config.py:新增 CALLBACK_HMAC_SECRET: str = Field(default="")

## P0-06 Webhook HMAC Replay 防護(ADR-116)
- security_interceptor.py:新增 check_webhook_nonce()(Service 層,get_redis 在此層合法)
- webhooks.py:verify_webhook_signature() 新增兩個可選 Header
  - X-Webhook-Timestamp:±300s 窗口驗證(若提供)
  - X-Webhook-Nonce:呼叫 check_webhook_nonce()(Redis NX dedup,fail open)
  - 移除直接 get_redis import(leWOOOgo 積木化修正)

## P0-11 ollama:current_primary Redis key 遷移 Phase A(ADR-110)
- ollama_auto_recovery.py:_REDIS_PRIMARY_KEY = "platform:ollama:current_primary"
  - 雙寫舊 key "ollama:current_primary"(Phase A 30 天)
  - 讀取以新 key 為主,fallback 舊 key

## P0-12 consensus Redis key 加 project namespace Phase A
- consensus_engine.py:新增 _consensus_key() / _consensus_legacy_key() helper
  - 新 key:{project_id}:consensus:{consensus_id}
  - project_id=None 時 fallback __platform__:consensus:{consensus_id}
  - Phase A 雙寫 + fallback 讀取,現有呼叫方零修改

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 13:54:38 +08:00
Your Name
e45b055e0e feat(governance): AI 治理事件處理鏈四軌交付(C/D/B/A)
Some checks failed
Code Review / ai-code-review (push) Successful in 48s
run-migration / migrate (push) Failing after 45s
CD Pipeline / tests (push) Successful in 3m46s
Type Sync Check / check-type-sync (push) Successful in 2m8s
CD Pipeline / build-and-deploy (push) Failing after 31m14s
CD Pipeline / post-deploy-checks (push) Has been skipped
【十二人專家團隊全景掃描 + 並行四軌實施】

統帥質疑「有讓 12-agent 一起協作嗎」後,依照團隊規則完成全鏈路交付:
onboarder + critic + db-expert + debugger + frontend-designer 並行掃描,
找到 6 大 Gap,再由 fullstack-engineer × 4、refactor-specialist 協作落地。

【Track C — trust_drift 雙寫整併】

兩條獨立寫 event_type=trust_drift 路徑互不呼叫,下游 consumer 拿到雙份資料
無法判定 source-of-truth。整併保留 governance_agent.check_trust_drift(功能
更全:auto-deprecate + Telegram + PG),TrustDriftDetector 降為純統計 lib,
W-6 watchdog 改呼叫 governance_agent。新增 TestSinglePgWritePerDriftScenario
驗證同一 drift 場景只觸發一次 PG 寫入。

  變更:
    - apps/api/src/services/trust_drift_detector.py(lib only,不再寫 PG)
    - apps/api/tests/test_trust_drift_watchdog.py(W-6 改 mock governance_agent)

【Track D — governance_remediation_dispatch 派遣表】

ai_governance_events 是不可變 Event Sourcing,不能塞執行狀態。新建派遣表
作為投影層:1 event → 0..N dispatches,狀態可變、可重試、可審計。

  - PgEnum 5 種 event_type + 7 階段狀態機(pending → dispatched → executing →
    succeeded/failed/cancelled/skipped)
  - 失敗重試 INSERT 新 row(不改舊 row 的 status,保留審計痕跡)
  - Partial unique index ux_grd_one_active_per_event 強制「同事件唯一活躍」
  - 4 個複合 index 支援 worker poll、去重查詢、觀測面板
  - FK 對應 ai_governance_events / playbooks / incidents / approval_records
    全部 SET NULL(avoid cascade lock,但 governance_event 用 RESTRICT)

  變更:
    - apps/api/src/db/models.py(GovernanceRemediationDispatch ORM class)
    - apps/api/migrations/governance_remediation_dispatch_2026-05-03.sql
    - apps/api/src/repositories/governance_remediation_dispatch_repo.py
      (6 個 async 函式 + 3 個自訂例外:DispatchAlreadyActive /
       InvalidStatusTransition / DispatchNotFound)
    - apps/api/src/models/governance_dispatch.py(DecisionContextV1 等 4 schema)
    - apps/api/tests/test_governance_remediation_dispatch.py(29 tests)

【Track B — /governance 頁面】

後端 PR1 三個 endpoint + 前端 PR2-5 完整三 Tab。

PR1 後端:
  - GET /api/v1/ai/governance/events(events_tab,含 event_type/severity/
    狀態/時間範圍篩選 + 分頁)
  - GET /api/v1/ai/governance/queue(queue_tab,含 graceful fallback:
    dispatch 表不存在時回 table_pending=True 不拋 500)
  - GET /api/v1/ai/governance/summary(slo_tab 30d 違反時序圖)
  - severity 映射規則寫死(critic 建議未來移 settings)

PR2-5 前端:
  - /governance 路由 + AppLayout + Compliance Badge 橫幅 + PageTabs
  - SLO Tab:3 KPI 卡片(Syne 28px + StatusOrb + 7d sparkline)+
    30d 違反 stacked BarChart
  - Events Tab:篩選列 + 表格 + inline 展開行(JSON / 修復建議 / 派遣記錄)
  - Queue Tab:HITL 待辦卡片 + 信任度進度條 + 批准/拒絕按鈕(本 PR console.log)
  - Sidebar 加入「AI 治理」入口(ShieldCheck icon)
  - i18n 雙語完整(governance namespace + nav.governance)
  - 7 個新元件:slo-kpi-card / slo-violation-chart / events-table /
    events-filter-bar / event-detail-drawer / queue-item-card / queue-history-tabs

  變更:
    - apps/api/src/api/v1/ai_governance.py(router)
    - apps/api/src/services/governance_query_service.py
    - apps/api/src/models/governance.py(Pydantic V2 schemas)
    - apps/api/tests/test_ai_governance_endpoints.py(21 tests)
    - apps/web/src/app/[locale]/governance/(page + 3 tabs)
    - apps/web/src/components/governance/(7 元件)
    - apps/web/messages/{zh-TW,en}.json(governance namespace)
    - apps/web/src/components/layout/sidebar.tsx(+1 行)
    - apps/api/src/main.py(router include)

【Track A — GovernanceDispatcher 決策融合】

把治理事件接到 remediation 執行器,走北極星方向決策融合(LLM × Playbook trust
× MCP),符合「禁寫死規則」鐵律。

  - 設計鐵律:DecisionFusionAdapter 是新增 wrapper,**不修改任何 Tier 3 檔**
    (decision_manager / learning_service / trust_engine),只 consume 既有 API
  - 三維融合公式:confidence = 0.4×llm + 0.3×playbook_trust + 0.3×mcp_consistency
    (權重加 TODO 標明未來由 AI 自學調整)
  - 三分支決策路徑:
    confidence ≥ 0.85 → auto_dispatch(status=dispatched)
    0.65 ≤ confidence < 0.85 → pending_approval(HITL)
    confidence < 0.65 → skip + log
  - decision_context JSONB 完整記錄三維輸入快照(給未來 fine-tune 用)
  - poll 30s 掃 unresolved 事件,仿 governance loop 模式
  - 重複事件擋去重(呼叫 get_active_for_event)

  變更:
    - apps/api/src/services/governance_dispatcher.py
    - apps/api/src/services/decision_fusion_adapter.py
    - apps/api/tests/test_governance_dispatcher.py(14 tests)
    - apps/api/src/main.py(lifespan task 接 run_governance_dispatcher_loop)

【驗證】

1836 個 unit test 全過(29 skipped 為既有 PG integration env 問題)

【調度教訓 — 已記入 memory】

- vuln-verifier 應在 fullstack-engineer **之前**跑(避免並行讀到已修代碼誤判)
- critic 雙輪審查不可省(第二輪抓到 NaN sentinel + Prom rule 連鎖)
- 北極星「禁寫死規則」搭配 decision-fusion 確實實施

【未動 Tier 3 — 已驗證】

git diff 確認本 commit 完全沒改 decision_manager.py / learning_service.py /
trust_engine.py,只新增 wrapper service consume 既有 API。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:42:40 +08:00
Your Name
dedb12085b chore(governance,watchdog): enrich alerts and enable prometheus multiproc
Some checks failed
CD Pipeline / tests (push) Failing after 1m22s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 43s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 57s
2026-05-02 23:44:12 +08:00
Your Name
9db87f177e fix(aiops): suppress repeated llm alert loops
Some checks failed
CD Pipeline / tests (push) Successful in 1m37s
Code Review / ai-code-review (push) Successful in 28s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 13:02:07 +08:00
Your Name
97be5dedd7 fix(aiops): escalate failed host verification
Some checks failed
CD Pipeline / tests (push) Successful in 1m27s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 10:47:42 +08:00
Your Name
ca22ec2fd2 fix(aiops): route backup failures rule-first
All checks were successful
CD Pipeline / tests (push) Successful in 1m51s
Code Review / ai-code-review (push) Successful in 30s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Successful in 8m21s
CD Pipeline / post-deploy-checks (push) Successful in 4m18s
2026-05-01 10:11:10 +08:00
Your Name
f0d14ab6c4 fix(aiops): escalate blocked auto repair
Some checks failed
CD Pipeline / tests (push) Successful in 1m33s
Code Review / ai-code-review (push) Successful in 28s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 40s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-30 23:49:17 +08:00
Your Name
61f5a6a419 fix(telegram): route alerts to SRE war room
Some checks failed
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
2026-04-30 15:01:23 +08:00
Your Name
80defbed7c fix(aiops): fallback and escalate automation blockers
Some checks failed
CD Pipeline / tests (push) Successful in 2m41s
Code Review / ai-code-review (push) Successful in 24s
CD Pipeline / build-and-deploy (push) Successful in 7m51s
CD Pipeline / post-deploy-checks (push) Failing after 2m15s
2026-04-30 14:13:57 +08:00
Your Name
ed2a4838f2 fix(auto): use action parser for repair gates
Some checks failed
CD Pipeline / tests (push) Failing after 1m2s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 24s
2026-04-30 14:06:09 +08:00
Your Name
4a57c2d04f feat(flywheel): expose incident processing timeline
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 10m56s
2026-04-29 23:38:30 +08:00
Your Name
dc18b0ebd6 fix(prometheus_url): drift 殘存追修 — kured 守門員 + monitoring API
debugger 全 codebase 追根溯源後揪出 5 處 PROMETHEUS_URL drift 殘存
(根因:docs/reference/SERVICE-ENDPOINTS.md 早期把 Prometheus 標在 188
是整個 codebase drift 的源頭)。

本次修最急的 2 處:

## 🔴🔴 kured.yaml:132(守門員失效風險)
- 188 → 110
- kured 跑 reboot 前會查 Prometheus alerts,連錯主機 = 跳過保護直接 reboot 主機
- 對齊 ConfigMap + config.py PROMETHEUS_URL

## 🟡 monitoring.py:67(單一事實源)
- 寫死 110:9090 改用 settings.PROMETHEUS_URL
- 主機巧合正確但繞過 ConfigMap 注入機制
- 未來 Prometheus 再遷移避免再次 drift

## 暫不修
- k3s_monitor_service.py:38 用 121:30090 是 K3s NodePort 內網端點
  與外部 PROMETHEUS_URL 概念不同,需新增 PROMETHEUS_INTERNAL_URL setting
- 其他 docstring + 文件 drift(SERVICE-ENDPOINTS.md 等)留待後續

## 驗證
1552 unit tests 全綠(無回歸)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:44:39 +08:00
Your Name
a0502b778e feat(auto-execute): CS3 alertmanager AI path 高信心自動執行(修法3擴展)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 9m41s
- CS3(alertmanager AI path)補入與 CS1 相同的 5 safety gate 自動執行邏輯
  - confidence >= 0.85 + !CRITICAL + kubectl非空 + !NO_ACTION + !DESTRUCTIVE
  - 使用 _cs3_destr_patterns(from auto_approve)做破壞性指令攔截
  - 例外包覆 try/except,不影響主流程
- 新增 test_cs3_auto_execute.py,9 tests 全通過
- CS4(LLM fallback)action=OBSERVE/confidence=0.0 → 不需要 auto-execute,維持現狀

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 19:46:56 +08:00
Your Name
e3bad58842 feat(auto-rate): CS1 LLM 高信心度路徑自動執行(confidence ≥ 0.85)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 9m53s
繼 CS2 rule_engine 後,CS1 LLM 路徑也開啟自動執行:
- confidence >= 0.85 + low/medium risk + kubectl 有值 → auto-execute
- CRITICAL / DESTRUCTIVE_PATTERNS / NO_ACTION → 絕對不執行
- 例外降級到 PENDING,不 crash
- 9 tests 驗收(1469 passed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:12:30 +08:00
Your Name
e5f8d90451 feat(auto-rate): rule_engine 路徑開啟自動執行,預計 42% → 70%+
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
修法 3(debugger 建議):CS2 is_rule_based=True + kubectl 有值 + 非 CRITICAL/DESTRUCTIVE → 直接 auto-execute,不建 PENDING record

安全防線(5 層):
- CRITICAL risk → 絕對不自動執行
- _DESTRUCTIVE_PATTERNS 命中 → 絕對不自動執行
- NO_ACTION → 不執行
- kubectl 空字串 → 不執行
- 任何例外 → catch + 降級到 PENDING,不 crash

15 tests 驗收(1487 passed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:08:50 +08:00
Your Name
a184b82ed1 feat(webhook): shadow-run auto_approve.evaluate + 補 metadata kwarg
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
4 個 webhook call site 問題修復(debugger 根因分析 2026-04-27):
- 補 metadata kwarg → extra_metadata 不再為 NULL(source/confidence_score/is_rule_based/playbook_id)
- shadow-run policy.evaluate() → logger.info 觀測 should_auto_approve
- 不改任何執行決策:status 仍 pending,Telegram 推送不變
- 9 tests 驗收 metadata 非 null + shadow log 格式 + 例外不 propagate

下一步:shadow 觀測 1-2 天後開啟修法 3(rule_based 路徑自動執行)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 16:00:00 +08:00
Your Name
025a493f06 feat(p3.2+adr-100): Model Version Tracker + SLO 自治 + KB rot cleaner
Some checks failed
run-migration / migrate (push) Failing after 12s
CD Pipeline / build-and-deploy (push) Has been cancelled
Wave 8 P3.2 模型版本追蹤 + ADR-100 SLO 自我治理 + 配套:

P3.2 — Model Version Tracking:
- model_version_probe.py (268 行) — 探測 Ollama / OpenRouter 等 provider 的 model version
- model_version_tracker.py (101 行) — 對齊 PG provider_version_history 表
- migrations/p3_2_provider_version_history.sql + rollback — 25 行 schema
- db/models.py +32 行 — ProviderVersionHistory ORM

ADR-100 — AI 自主化 SLO:
- docs/adr/ADR-100-ai-autonomous-slo.md (167 行) — 飛輪 SLO 設計與閾值
- ops/monitoring/slo-rules.yml (254 行) — Prometheus SLO recording rules + alerts
- ops/monitoring/tests/test_slo_rules.yaml (242 行) — promtool unit tests

整合修改:
- main.py +72 行 — Lifespan 啟動 model_version_probe + KB rot cleaner schedule
- gitea_webhook.py +45 行 — webhook 接收 model 版本變化通知
- ci_auto_repair.py / evidence_snapshot.py / pre_decision_investigator.py — 配合接線

新測試:
- test_kb_rot_cleaner_schedule.py (120 行) — 9 tests pass
- test_slo_rules.yaml — promtool 驗收

Tests: 9 passed (test_kb_rot_cleaner_schedule)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Multiple Engineers (P3.2 + ADR-100) <noreply@anthropic.com>
2026-04-27 14:54:19 +08:00
Your Name
3a2cd15144 feat(p3.1-t2): Tier-2 三服務感知強化 — Sentry 簽章 + DiagnosisAggregator + Solver actions test
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Wave 8 P3.1-T2 三項感知強化(多 engineer 補完):

Sentry Webhook 簽章驗證:
- sentry_webhook.py: 接入 SentryWebhookService.verify_sentry_signature()
- 拒絕無效 sentry-hook-signature → 401 → 防偽造攻擊

DiagnosisAggregator Pod 深診斷整合:
- pre_decision_investigator.py: 新增 _collect_diagnosis_aggregator()
- ENABLE_DIAGNOSIS_AGGREGATOR feature flag 守衛(default=False)
- evidence_snapshot.py: extra_diagnosis 欄位 + build_summary 顯示
- timeout=3.0s + try/except 隔離(fail-soft)
- Conservative 策略:待重疊分析確認 vs PreDecisionInvestigator 不重複

config.py:
- 新增 ENABLE_DIAGNOSIS_AGGREGATOR Field(default=False,K8s ConfigMap 動態啟用)

Solver B1 補測(commit 7c726ebc 對應):
- test_solver_recommended_actions.py — 20 tests + 3 skipped
- 驗證結構化 recommended_actions(北極星 §1.1 修復多樣性 ≥ 40%)
- LLM 失敗 graceful degraded(candidates=[], degraded=True)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Multiple Engineers (Wave 8 P3.1-T2) <noreply@anthropic.com>
2026-04-27 08:24:15 +08:00