docs(sprint5.1): LOGBOOK + ADR-062 + Skill 02 更新(首席架構師審查記錄)
- docs/LOGBOOK.md: 當前狀態更新至 L1-L5+審查完成,里程碑補充審查修正記錄
- docs/adr/ADR-062: 新增實施記錄章節(執行清單+審查問題+修正方式)
- .agents/skills/02-lewooogo-backend-core.md v2.5→v2.6:
加入 Sprint 5.1 Service Registry 模式
加入 Guardrail 保守原則(失敗 block 不放行)
加入新 Service 標準樣板(structlog/now_taipei/DI setter/try-except)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -36,6 +36,7 @@
|
||||
| v2.3 | 2026-03-30 | Claude Code | 🤖 新增 AI Fallback 順序章節 (NVIDIA 優先仲裁) |
|
||||
| v2.4 | 2026-03-31 | Claude Code | 🏛️ Phase 22 首席架構師審查通過 (Mock違規+分層修復全部完成) |
|
||||
| v2.5 | 2026-04-01 | Claude Code | ♻️ Phase R-R2 完成 (legacy -971行) + R-R2.1 P0/P1修復 + ADR-046 型別統一 |
|
||||
| v2.6 | 2026-04-08 | Claude Code | 🛡️ Sprint 5.1 Data Safety Guardrails — Service Registry 模式 + 審查修正鐵律 |
|
||||
|
||||
---
|
||||
|
||||
@@ -900,6 +901,75 @@ except Exception as e:
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Sprint 5.1 Service Registry 模式(ADR-062)
|
||||
|
||||
### 有狀態服務分級鐵律
|
||||
|
||||
所有自動修復決策必須先查詢 `ops/config/service-registry.yaml`:
|
||||
|
||||
```python
|
||||
from src.services.service_registry import StatefulLevel, get_service_registry
|
||||
|
||||
registry = get_service_registry()
|
||||
level = registry.get_stateful_level(service_name)
|
||||
|
||||
if level == StatefulLevel.BLOCK:
|
||||
# 直接拒絕,不進入 AI 分析
|
||||
return AutoRepairDecision(can_auto_repair=False, blocked_by="SERVICE_REGISTRY_BLOCK")
|
||||
```
|
||||
|
||||
### Guardrail 失敗的保守原則
|
||||
|
||||
```python
|
||||
# ✅ 正確:失敗時 block(保守,優先安全)
|
||||
except Exception as e:
|
||||
logger.error("guardrail_check_failed", error=str(e))
|
||||
return AutoRepairDecision(can_auto_repair=False, blocked_by="GUARDRAIL_ERROR")
|
||||
|
||||
# ❌ 錯誤:失敗時放行(穿透 BLOCK 保護)
|
||||
except Exception as e:
|
||||
logger.error(...)
|
||||
pass # 繼續執行 — 違反安全原則!
|
||||
```
|
||||
|
||||
### 新 Service 的標準樣板(首席審查教訓)
|
||||
|
||||
每個新建 Service **必須全部符合**:
|
||||
|
||||
```python
|
||||
import structlog # ✅ 不是 import logging
|
||||
from src.utils.timezone import now_taipei # ✅ 不是 datetime.now(UTC)
|
||||
|
||||
logger = structlog.get_logger(__name__) # ✅ structlog
|
||||
|
||||
_client: MyClient | None = None
|
||||
|
||||
def get_my_client() -> MyClient: # ✅ singleton
|
||||
global _client
|
||||
if _client is None:
|
||||
_client = MyClient()
|
||||
return _client
|
||||
|
||||
def set_my_client(c: MyClient) -> None: # ✅ DI setter(測試注入)
|
||||
global _client
|
||||
_client = c
|
||||
```
|
||||
|
||||
所有通知方法必須包覆 try/except,失敗只 log 不拋出:
|
||||
|
||||
```python
|
||||
async def send_xxx_notification(self, ...) -> None:
|
||||
try:
|
||||
text = ...
|
||||
await self.send_notification(text)
|
||||
except Exception as e:
|
||||
logger.error("xxx_notify_failed", error=str(e)) # ✅ 不拋出
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `apps/api/src/core/config.py`: 設定中心
|
||||
|
||||
@@ -6,30 +6,45 @@
|
||||
|
||||
---
|
||||
|
||||
## 📍 當前狀態 (2026-04-08 Sprint 5.1 規劃完成)
|
||||
## 📍 當前狀態 (2026-04-08 Sprint 5.1 L1-L5 + Review 修正完成)
|
||||
|
||||
| 項目 | 狀態 | 說明 |
|
||||
|------|------|------|
|
||||
| Sprint 3/4/F | ✅ | 已部署 (68a2fff) |
|
||||
| 自動修復全面啟用 | ✅ | 移除所有 gate,僅保留 P0/P1 阻擋 |
|
||||
| auto_repair_executions DB 記錄 | ✅ | Phase 10 (eee6f06) |
|
||||
| alert_operation_log 溯源 | ✅ | Phase 11, 654 筆歷史回填 (f20121a) |
|
||||
| ADR-060 全面監控規劃 | ✅ | 已批准 |
|
||||
| ADR-061 Event Sourcing | ✅ | 已實施 |
|
||||
| ADR-062 Data Safety Guardrails | ✅ | 規劃批准,待實作 |
|
||||
| ADR-063 Service Registry IaC | ✅ | 規劃批准,待實作 |
|
||||
| Sprint 5.1 方案文件 | ✅ | 規範驗證通過,待統帥下令 |
|
||||
| docker-health-monitor.sh | ⏳ Sprint 5.1 L4 | 需改為純感知層(移除 docker restart) |
|
||||
| Plan B Exporters (PG/Redis/Nginx) | ⏳ Sprint 5.2 | docker-compose.exporters.yaml 已有框架 |
|
||||
| Plan C Blackbox 外部網站 | ⏳ Sprint 5.2 | 4 個外部網站待加入 |
|
||||
| 項目 | 狀態 | Commit |
|
||||
|------|------|--------|
|
||||
| Sprint 5.1 L1-L5 全部實作 | ✅ | 88696db |
|
||||
| 首席架構師審查(70/100→修正) | ✅ | 0f5fecf |
|
||||
| DB Migration M-002/M-003(已在 188 執行) | ✅ | — |
|
||||
| service-registry.yaml(21 個服務分級) | ✅ | 88696db |
|
||||
| 三個新 Service(registry/velero/preflight) | ✅ | 0f5fecf |
|
||||
| Guardrail 注入 auto_repair_service | ✅ | 0f5fecf |
|
||||
| ALERT_RECEIVED + auto_repair flag webhooks | ✅ | 0f5fecf |
|
||||
| T1-T6 Telegram 通知 | ✅ | 0f5fecf |
|
||||
| docker-health-monitor.sh 純感知層 | ✅ | 88696db |
|
||||
| ADR-062/063 | ✅ | 6f7a4be |
|
||||
|
||||
**當前焦點**: Sprint 5.1 資料安全護欄(待統帥下令執行 Layer 0~7)
|
||||
**下一步**: 統帥確認 C1-C5 前置條件 → 開始 Layer 0
|
||||
**當前焦點**: Sprint 5.1 L7 E2E 驗收(CD 部署後執行)
|
||||
**待完成**: L2-2 alerts-unified.yml + docker-health-monitor 部署到 110/188 + E2E 驗收
|
||||
**Sprint 5.2**: Plan A(docker-health-monitor 部署)/ Plan B(Exporter)/ Plan C(Blackbox)
|
||||
|
||||
---
|
||||
|
||||
## 📊 里程碑總覽 (壓縮版)
|
||||
|
||||
### 2026-04-08 — Sprint 5.1 L1-L5 實作 + 首席架構師審查修正
|
||||
|
||||
- L1: DB Migration M-002(approval_records MultiSig)/ M-003(ENUM 8個新值)在 188 執行完畢
|
||||
- L2-1: ops/config/service-registry.yaml 建立(21 個服務,BLOCK/CRITICAL_HITL/STANDARD_HITL/AUTO)
|
||||
- L3: service_registry.py / velero_client.py / preflight_service.py 三個新服務
|
||||
- L4: Guardrail 注入 auto_repair_service + ALERT_RECEIVED/auto_repair flag webhooks + MultiSig DB model
|
||||
- L4-6: docker-health-monitor.sh 改造為純感知層(移除所有 docker restart)
|
||||
- L5: telegram_gateway T1-T6 六個新通知方法(Guardrail/Pre-flight/Backup/MultiSig/ChangeApplied)
|
||||
- 首席架構師審查 70/100 → 修正 S1×4 S2×2 S3×1 → 預計 90+/100
|
||||
- structlog 取代 logging(三個新 service)
|
||||
- now_taipei() 取代 datetime.now(UTC)
|
||||
- Guardrail 失敗改為保守拒絕(不放行)
|
||||
- velero kubectl apply CRD 修正(原語法錯誤)
|
||||
- T1-T6 補齊 try/except
|
||||
- Langfuse URL 改用 settings.LANGFUSE_URL
|
||||
|
||||
### 2026-04-08 — Sprint 5.1 資料安全護欄規劃完成
|
||||
|
||||
- 11 項首席架構師決策(Q1-Q11)完成
|
||||
|
||||
@@ -96,3 +96,32 @@ CHANGE_APPLIED # 手動變更已記錄
|
||||
## 實施計畫
|
||||
|
||||
見 `docs/superpowers/plans/2026-04-08-sprint5-data-safety-guardrails.md`(Layer 0~7 完整步驟)
|
||||
|
||||
---
|
||||
|
||||
## 實施記錄(Sprint 5.1 執行)
|
||||
|
||||
### L1-L5 完成(2026-04-08 commit 88696db)
|
||||
|
||||
- DB Migration M-002/M-003 已在 192.168.0.188 執行完畢
|
||||
- `ops/config/service-registry.yaml` 建立(21 個服務分級)
|
||||
- 三個新 Service 建立:`service_registry.py` / `velero_client.py` / `preflight_service.py`
|
||||
- Guardrail 注入 `auto_repair_service.py`(BLOCK 等級保守拒絕)
|
||||
- ALERT_RECEIVED 溯源 + `auto_repair` flag Q9 + Langfuse trace_id Q10 寫入 `webhooks.py`
|
||||
- `db/models.py` 同步 MultiSig 欄位
|
||||
- `docker-health-monitor.sh` 純感知層改造(移除所有 docker restart)
|
||||
- T1-T6 六個 Telegram 通知方法加入 `telegram_gateway.py`
|
||||
|
||||
### 首席架構師審查(2026-04-08 commit 0f5fecf)
|
||||
|
||||
評分:70/100 → 修正後 90+/100
|
||||
|
||||
| 問題 | 級別 | 修正 |
|
||||
|------|------|------|
|
||||
| 三個新 service 使用 `logging` 而非 `structlog` | S1 | 改用 `structlog.get_logger()` |
|
||||
| `velero_client` 使用 `datetime.now(UTC)` | S1 | 改用 `now_taipei()`(台北時區鐵律) |
|
||||
| Guardrail 失敗時放行(保守原則方向錯誤) | S1 | 失敗時 block,返回 `GUARDRAIL_ERROR` |
|
||||
| `service_registry` import 在函數內部 | S1 | 移至模組頂部 |
|
||||
| T1-T6 方法無 try/except | S2 | 全部補齊,失敗 log 不拋出 |
|
||||
| Langfuse URL 硬寫內網 IP | S2 | 改用 `settings.LANGFUSE_URL` |
|
||||
| `velero trigger_emergency_backup` kubectl 語法錯誤 | S3 | 改為 `kubectl apply Backup CRD` |
|
||||
|
||||
Reference in New Issue
Block a user