Commit Graph

5 Commits

Author SHA1 Message Date
Your Name
01ba1e6f13 fix(drift): read git state from gitea main
All checks were successful
Code Review / ai-code-review (push) Successful in 19s
CD Pipeline / tests (push) Successful in 1m29s
CD Pipeline / build-and-deploy (push) Successful in 3m40s
CD Pipeline / post-deploy-checks (push) Successful in 2m5s
2026-05-19 02:14:26 +08:00
Your Name
107c4f11cc fix(drift): normalize kustomize runtime defaults
All checks were successful
Code Review / ai-code-review (push) Successful in 21s
CD Pipeline / tests (push) Successful in 2m31s
CD Pipeline / build-and-deploy (push) Successful in 3m38s
CD Pipeline / post-deploy-checks (push) Successful in 1m53s
2026-05-19 02:02:43 +08:00
OG T
02a276127e fix(sensors+drift+repair-card): 全景修復三個節點問題
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 1h1m39s
Fix 1: sensors 7/8 失敗 — SSH host 短名展開 (pre_decision_investigator.py)
  根因: Prometheus instance label 為 "110:9100",split(":")[0]="110"
        SSH_MCP_ALLOWED_HOSTS 存完整 IP "192.168.0.110" → 7 個 SSH 工具全部失敗
  修復: 加入 _SHORT_HOST_MAP,"110"→"192.168.0.110",四台主機全覆蓋

Fix 2: Config Drift 誤報 — K8s 預設欄位加入白名單 (drift_detector.py)
  根因: kubectl rollout restart 後 restartedAt annotation 被偵測為 "medium" drift
        restartPolicy/dnsPolicy/terminationGracePeriodSeconds 等 K8s 自動填入欄位未白名單
  修復: _DEFAULT_ALLOWLIST_FIELDS 加入 13 個 K8s 執行時自動填入欄位

Fix 3: 修復請求卡內容垃圾 — fallback 帶入真實 error context (failure_watcher.py)
  根因: LLM 分析失敗時 root_cause = "規則引擎分類: K8S_ERROR"(無任何有用資訊)
  修復: fallback 改為 "[K8S_ERROR] {operation_type} 在 {target_resource} 失敗\n錯誤:{error_message[:200]}"

2026-04-16 ogt + Claude Sonnet 4.6(亞太)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:50:06 +08:00
OG T
53e1ae7ad7 fix(phase25): I2 NIM system prompt + I4 field_path 正則匹配修正
Some checks failed
CD Pipeline / Deploy Prometheus Alert Rules (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
I2: nemotron.analyze() 補上 system role (NIM 標準 message format)
    - 舊: messages=[{role:user, ...}]
    - 新: messages=[{role:system, ...}, {role:user, ...}]
    - 效果: K8s operator 角色定義,改善 tool calling 品質

I4: drift_detector._is_allowlisted/_is_critical 用正則取代 strip
    - 舊: replace('[*]','') 後 startswith/in → 無法匹配 containers[0]
    - 新: [*] → \[\d+\] 正則,正確匹配所有索引
    - 修復: containers[*].image 現在能匹配 containers[0].image
2026-04-05 12:11:05 +08:00
OG T
3455044457 feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout

P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
  - generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
  - generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
  - 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint

P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:35:05 +08:00