Your Name
|
f154ac022e
|
feat(playbook): version generated playbooks
CD Pipeline / tests (push) Successful in 1m34s
Code Review / ai-code-review (push) Successful in 28s
Type Sync Check / check-type-sync (push) Successful in 1m10s
CD Pipeline / build-and-deploy (push) Successful in 10m19s
CD Pipeline / post-deploy-checks (push) Successful in 3m1s
|
2026-04-30 23:59:39 +08:00 |
|
Your Name
|
6e04fe9c8a
|
feat(playbook): generate drafts with local llm
CD Pipeline / tests (push) Successful in 1m28s
Code Review / ai-code-review (push) Successful in 29s
Type Sync Check / check-type-sync (push) Failing after 2m41s
CD Pipeline / build-and-deploy (push) Successful in 8m40s
CD Pipeline / post-deploy-checks (push) Successful in 3m10s
|
2026-04-30 23:04:58 +08:00 |
|
OG T
|
800ab1685f
|
fix(playbook+flywheel): 修復 PlaybookSource enum + repair_steps 相容 + KM stats raw SQL
CD Pipeline / build-and-deploy (push) Successful in 14m58s
Type Sync Check / check-type-sync (push) Failing after 1m17s
修復三個串聯 bug,讓 Playbook seed 能正常執行:
1. PlaybookSource 新增 YAML_RULE enum(alert_rules.yaml 匯入專用)
2. playbook_seed_service: source=YAML_RULE,dedup 改用 raw SQL by name,
不再呼叫 list_playbooks(舊格式 repair_steps 會 validation error)
3. playbook_repository._orm_to_pydantic: 舊格式 repair_steps 補齊
step_number/action_type 必填欄位(向下相容)
4. flywheel_stats_service: embedding IS NULL 改用 raw SQL,
修復 KnowledgeEntryRecord ORM 無 embedding 屬性的 AttributeError
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-15 23:32:04 +08:00 |
|
OG T
|
7da64eaad2
|
feat(Phase 3): 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent
CD Pipeline / build-and-deploy (push) Failing after 19m7s
Type Sync Check / check-type-sync (push) Failing after 1m18s
ADR-083 Phase 3 學習閉環重建:
**三根因修復**
- approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2
(成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash)
- models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位
- decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id
- learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback)
**2x EWMA 負向強化**
- models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位)
- repositories/playbook_repository.py: update_stats 加 EWMA
成功: trust = 0.9 × old + 0.1 × 1.0
失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x)
trust < 0.1 → log warning,等 Evolver 封存
**Evolver Agent(新建)**
- services/playbook_evolver.py: 三功能全靜態規則
1. 低信任封存: trust < 0.1 → DEPRECATED
2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED
3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
AIOPS_P3_EVOLVER_ENABLED=False 預設關閉
**文件**
- ADR-083 學習閉環重建
- MASTER §8 Phase 3 完工記錄
AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟
Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
|
2026-04-15 14:01:37 +08:00 |
|
OG T
|
88696dba9b
|
feat(sprint5.1): Data Safety Guardrails 全鏈路整合 (L1-L5)
CD Pipeline / build-and-deploy (push) Failing after 1m33s
Type Sync Check / check-type-sync (push) Failing after 58s
Layer 0 - K8s RBAC:
- k8s/rbac/api-velero-reader.yaml: awoooi-executor SA Velero backup reader
Layer 1 - DB Migration (已在 188 執行):
- M-002: approval_records 新增 approval_level/votes/required_votes
- M-003: alert_event_type ENUM 新增 8 個值
Layer 2 - IaC:
- ops/config/service-registry.yaml: 全服務 Stateful 分級清單 (BLOCK/CRITICAL_HITL/STANDARD_HITL/AUTO)
Layer 3 - Python Services:
- service_registry.py: 讀取 YAML,提供 is_blocked/requires_multisig/get_required_votes
- velero_client.py: kubectl 查詢 Velero 備份年齡,失敗 fallback 999h
- preflight_service.py: Pre-flight 安全檢查 (Q2/Q4 決策)
Layer 1-M001 - Playbook model:
- playbook.py: 新增 requires_approval_level/stateful_targets/requires_pre_backup
Layer 4 - 業務邏輯:
- alert_operation_log_repository.py: 新增 8 個 event_type (Guardrail/Pre-flight/MultiSig/備份)
- auto_repair_service.py: 注入 Service Registry Guardrail 檢查 (BLOCK → 直接拒絕)
- webhooks.py: ALERT_RECEIVED 溯源記錄 + auto_repair flag Q9 + Langfuse trace_id Q10
- db/models.py: ApprovalRecord 同步 approval_level/votes/required_votes 欄位
- docker-health-monitor.sh: 純感知層改造(移除所有 docker restart 邏輯)
Layer 5 - Telegram 通知:
- telegram_gateway.py: T1-T6 六個新通知方法 (Guardrail/Pre-flight/Backup/MultiSig/ChangeApplied)
參考: ADR-062 Data Safety Guardrails, ADR-063 Service Registry IaC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-08 16:24:09 +08:00 |
|
OG T
|
5499169996
|
feat(auto-repair): 打通自動修復閉環 (ADR-058)
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 53s
問題: 告警鏈路從未呼叫 auto_repair_service,機制完全死路
修正:
1. webhooks.py: alertmanager_webhook 建立 Incident 後觸發 _try_auto_repair_background
2. playbook.py: is_high_quality 門檻降低 (冷啟動期)
- success_count: 10 → 3
- success_rate: 95% → 80%
3. tests: test_evaluate_not_high_quality 更新為新門檻
流程: Alertmanager → API → Incident → evaluate → P2以下+高品質Playbook → 自動執行 → Telegram通知
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 22:08:08 +08:00 |
|
OG T
|
bf4f81412c
|
feat(api): ActionType.SSH_COMMAND + auto_repair_service SSH分支 (Task 12)
- playbook.py: 新增 SSH_COMMAND ActionType
- auto_repair_service._execute_step: SSH_COMMAND 分支,格式 layer/component
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-05 11:47:00 +08:00 |
|
OG T
|
3455044457
|
feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout
P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
- generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
- generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
- 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint
P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-04 12:35:05 +08:00 |
|
OG T
|
f5b19cf108
|
feat(learning): 實作 Playbook 信心度調整機制 (ADR-030)
- 新增 _promote_playbook: 高評分提升信心度 +0.1
- 新增 _demote_playbook: 低評分降低信心度 -0.15
- 新增 find_by_source_incident: 按 incident_id 查詢 Playbook
- 新增 adjust_confidence: 信心度調整 + 狀態自動轉換
- 新增 Playbook.failure_rate 屬性
自動狀態轉換:
- ai_confidence >= 0.9 + DRAFT → 自動 APPROVED
- ai_confidence < 0.3 + failure_rate > 50% → 自動 DEPRECATED
測試: 13 案例全部通過
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-29 22:10:49 +08:00 |
|
OG T
|
30153496d1
|
fix(api): 修復全部 lint 錯誤 (ruff --fix)
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 16:06:20 +08:00 |
|
OG T
|
698687f092
|
feat(api): #7 Playbook 萃取功能 (Phase 7.1-7.4)
實作內容:
- models/playbook.py: Playbook 資料模型 + Request/Response
- repositories/playbook_repository.py: Redis 雙層儲存
- repositories/interfaces.py: IPlaybookRepository Protocol
- services/playbook_service.py: 業務邏輯 (萃取/推薦/核准)
- api/v1/playbooks.py: REST API 端點
API 端點:
- POST /playbooks/extract/{incident_id} - 從成功案例萃取
- POST /playbooks/recommend - 症狀匹配推薦
- POST /playbooks/{id}/approve - 人工核准
- GET/PATCH/DELETE /playbooks/{id} - CRUD
遵循 leWOOOgo 積木化: Router → Service → Repository
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-26 10:54:13 +08:00 |
|