docs(logbook): Phase 7 完整化記錄 — 8/8 表全寫入 + 5 bugs 修 + Hermes E3

記錄本輪 review 深入發現的 5 個 bug + 8 個新 scanner/evaluator/advisor.
8 張 ADR-090 0 writer 表覆蓋率 100%.
2 條 100% noise rule 待 Hermes 推建議後人工決策.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
OG T
2026-04-19 19:27:10 +08:00
parent e84338e615
commit c015a77011

View File

@@ -6,6 +6,67 @@
---
## 📍 2026-04-19 晚 18:00 — Review 深入Phase 7 完整化8 表全寫入 + coverage 升級 + Hermes AI 建議)🎖️🎖️
### 統帥指示「持續推進 + 持續 review 原本的做法 + 朝 AI 自主化方向」激活
### 本輪 Review 發現並修復的 bug
1. **asset_scanner K8sProvider 呼叫 bug**`kubectl_get``--all-namespaces``-n` → asset_inventory=0
- 修:改直接 subprocesscommit 0226344
2. **asset_scanner 只掃 pods 盲點**:僅覆蓋 39 pods
-v3 擴充掃 pods+deployments+services+nodes+configmapscommit d11b09c
3. **ReplicaSet 橋樑漏掉**Pod.ownerReferences 是 ReplicaSet跳過 → Pod→Deployment 關係全失
- 修:先掃 ReplicaSets 建 rs_to_deployment mapPod 用此反查commit e677773
4. **coverage_evaluator KM 欄位錯誤**`ke.body does not exist`(實際欄位是 `ke.content`
- 修:改用 `ke.content ILIKE` + 加 `ke.title` 匹配commit c8b263d
5. **drift diff HTTP 400**`_full[:3950]` 切在 HTML tag 中間
-item-by-item 累計長度避免切斷commit c0f3509
### 實證 DB 活化Review 前 → 後)
| 表 | Review 前 | Review 後 | 關鍵驗證 |
|---|---|---|---|
| asset_inventory | 39 pods | **140+**45 pods + 22 workloads + 52 k8s_resources + 2 hosts| v3 擴充成功 |
| asset_relationship | 52全無 Pod→Deployment| **114**Pod→Deployment 54+ 筆)| ReplicaSet 橋樑生效 |
| asset_coverage_snapshot | 全 unknown | **74 筆 non-unknown**22 green + 52 red auto_alerting| coverage_evaluator 首次升級 |
| alert_rule_catalog.noise_rate | 全 NULL | **12 筆有 noise_rate**2 條 100% noise| rule_stats_updater 首次跑 |
### 新增 scanner/evaluator/advisor本輪 + 前輪累計 11 個)
| 服務 | 檔案 | 排程 | 解鎖 |
|---|---|---|---|
| asset_scanner v3 | `asset_scanner_job.py` | 每 1h | 5 類資源 + 3 類 relationship |
| rule_catalog_sync | `rule_catalog_sync_job.py` | 每 1h | 68 條 Prometheus rules 同步 |
| capacity_scanner | `capacity_scanner_job.py` | 每日 02:00 | host_capacity_snapshot + violation |
| compliance_scanner | `compliance_scanner_job.py` | 每日 03:00 | 7 維 compliancesecret_rotated 真實)|
| **coverage_evaluator** | `coverage_evaluator_job.py` | 每 1h | unknown → green/red/yellow |
| **rule_stats_updater** | `rule_stats_updater_job.py` | 每 1h | noise_rate/TP/FP 從 incidents 推算 |
| **asset_change_tracker** | `asset_change_tracker_job.py` | 每 1h | added/removed/lifecycle_changed |
| **hermes_rule_quality** | `hermes_rule_quality_job.py` | 每日 04:00 | AI 建議 deprecate noisy rules保守版|
### 8 張原 0 writer 表覆蓋率:**8/8 = 100%** ✅
### 找到的噪音規則Hermes 將建議審查)
- `PostgreSQLDiskGrowthRate`: 噪音率 100%tp=0 fp=2
- `NoAlertsReceived2Hours`: 噪音率 100%tp=0 fp=1
- `MoWoooWorkDown`: 33%tp=4 fp=2
- `KubePodCrashLooping`: 25%tp=3 fp=1
### 本輪 commits6 個)
- `0226344`: asset_scanner kubectl subprocess 修
- `d11b09c→fdf8b73`: asset_scanner v3 擴充多資源+relationship
- `007c7ef→5052323`: coverage_evaluator 初版
- `df71c9a`: rule_stats_updater
- `6b14194→92349bc`: asset_change_tracker
- `c8b263d`: coverage_evaluator KM 欄位修
- `e677773`: ReplicaSet 橋樑修
- `9ed135e→6ab0ce9`: Hermes rule quality advisor
### 下一階段候選
- LLM 分析 noise rule 假報真因(升級 Hermes 從 threshold 到 AI 判斷)
- SSL/CVE/backup 合規實作(擴充 compliance 6 維 unknown
- auto_playbook / auto_remediation / auto_rule_matching coverage 維度實作
---
## 📍 2026-04-19 下午 16:30 — Phase 7 完整實作4 個新 scanner service + CI 修復 🎖️
### 統帥鐵律激活