diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index c8206bb5..1349893d 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -6,6 +6,67 @@ --- +## 📍 2026-04-19 晚 18:00 — Review 深入:Phase 7 完整化(8 表全寫入 + coverage 升級 + Hermes AI 建議)🎖️🎖️ + +### 統帥指示「持續推進 + 持續 review 原本的做法 + 朝 AI 自主化方向」激活 + +### 本輪 Review 發現並修復的 bug +1. **asset_scanner K8sProvider 呼叫 bug**:`kubectl_get` 把 `--all-namespaces` 當 `-n` → asset_inventory=0 + - 修:改直接 subprocess(commit 0226344) +2. **asset_scanner 只掃 pods 盲點**:僅覆蓋 39 pods + - 修:v3 擴充掃 pods+deployments+services+nodes+configmaps(commit d11b09c) +3. **ReplicaSet 橋樑漏掉**:Pod.ownerReferences 是 ReplicaSet,跳過 → Pod→Deployment 關係全失 + - 修:先掃 ReplicaSets 建 rs_to_deployment map,Pod 用此反查(commit e677773) +4. **coverage_evaluator KM 欄位錯誤**:`ke.body does not exist`(實際欄位是 `ke.content`) + - 修:改用 `ke.content ILIKE` + 加 `ke.title` 匹配(commit c8b263d) +5. **drift diff HTTP 400**:`_full[:3950]` 切在 HTML tag 中間 + - 修:item-by-item 累計長度避免切斷(commit c0f3509) + +### 實證 DB 活化(Review 前 → 後) +| 表 | Review 前 | Review 後 | 關鍵驗證 | +|---|---|---|---| +| asset_inventory | 39 pods | **140+**(45 pods + 22 workloads + 52 k8s_resources + 2 hosts)| v3 擴充成功 | +| asset_relationship | 52(全無 Pod→Deployment)| **114**(Pod→Deployment 54+ 筆)| ReplicaSet 橋樑生效 | +| asset_coverage_snapshot | 全 unknown | **74 筆 non-unknown**(22 green + 52 red auto_alerting)| coverage_evaluator 首次升級 | +| alert_rule_catalog.noise_rate | 全 NULL | **12 筆有 noise_rate**(2 條 100% noise)| rule_stats_updater 首次跑 | + +### 新增 scanner/evaluator/advisor(本輪 + 前輪累計 11 個) +| 服務 | 檔案 | 排程 | 解鎖 | +|---|---|---|---| +| asset_scanner v3 | `asset_scanner_job.py` | 每 1h | 5 類資源 + 3 類 relationship | +| rule_catalog_sync | `rule_catalog_sync_job.py` | 每 1h | 68 條 Prometheus rules 同步 | +| capacity_scanner | `capacity_scanner_job.py` | 每日 02:00 | host_capacity_snapshot + violation | +| compliance_scanner | `compliance_scanner_job.py` | 每日 03:00 | 7 維 compliance(secret_rotated 真實)| +| **coverage_evaluator** | `coverage_evaluator_job.py` | 每 1h | unknown → green/red/yellow | +| **rule_stats_updater** | `rule_stats_updater_job.py` | 每 1h | noise_rate/TP/FP 從 incidents 推算 | +| **asset_change_tracker** | `asset_change_tracker_job.py` | 每 1h | added/removed/lifecycle_changed | +| **hermes_rule_quality** | `hermes_rule_quality_job.py` | 每日 04:00 | AI 建議 deprecate noisy rules(保守版)| + +### 8 張原 0 writer 表覆蓋率:**8/8 = 100%** ✅ + +### 找到的噪音規則(Hermes 將建議審查) +- `PostgreSQLDiskGrowthRate`: 噪音率 100%(tp=0 fp=2) +- `NoAlertsReceived2Hours`: 噪音率 100%(tp=0 fp=1) +- `MoWoooWorkDown`: 33%(tp=4 fp=2) +- `KubePodCrashLooping`: 25%(tp=3 fp=1) + +### 本輪 commits(6 個) +- `0226344`: asset_scanner kubectl subprocess 修 +- `d11b09c→fdf8b73`: asset_scanner v3 擴充多資源+relationship +- `007c7ef→5052323`: coverage_evaluator 初版 +- `df71c9a`: rule_stats_updater +- `6b14194→92349bc`: asset_change_tracker +- `c8b263d`: coverage_evaluator KM 欄位修 +- `e677773`: ReplicaSet 橋樑修 +- `9ed135e→6ab0ce9`: Hermes rule quality advisor + +### 下一階段候選 +- LLM 分析 noise rule 假報真因(升級 Hermes 從 threshold 到 AI 判斷) +- SSL/CVE/backup 合規實作(擴充 compliance 6 維 unknown) +- auto_playbook / auto_remediation / auto_rule_matching coverage 維度實作 + +--- + ## 📍 2026-04-19 下午 16:30 — Phase 7 完整實作:4 個新 scanner service + CI 修復 🎖️ ### 統帥鐵律激活