chore: 未提交變更整理 (API core + docs + scripts)
API 核心: - constants.py: 系統常量定義 - unit_of_work.py: Unit of Work 模式 - incident_approval_service.py: Incident-Approval 同步服務 文檔更新: - LOGBOOK.md: 進度更新 - AWOOOI_AGENTIC_WORKSPACE_ROADMAP.md: 路線圖 - 2026-03-26_llm_testing_evaluation.md: LLM 測試評估 - phase5_telemetry_architecture.md: 遙測架構 - SECRETS_REFERENCE.md: 密鑰參考 配置/腳本: - Skill 02 v1.x: leWOOOgo 後端更新 - .dependency-cruiser.cjs: 依賴規則 - demo-multisig-flow.sh: 演示腳本 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,10 +10,10 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.4 |
|
||||
| **版本** | v1.5 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 00:20 (台北) |
|
||||
| **最後修改** | 2026-03-26 19:30 (台北) |
|
||||
| **修改者** | Claude Code |
|
||||
|
||||
### 變更紀錄
|
||||
@@ -25,6 +25,7 @@
|
||||
| v1.2 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.3 | 2026-03-26 | Claude Code | 🔴🔴🔴 新增積木化強制執行章節 (32 項違規審計後) |
|
||||
| v1.4 | 2026-03-26 | Claude Code | 📊 新增 Langfuse LLMOps 整合章節 (Phase 15.1) |
|
||||
| v1.5 | 2026-03-26 | Claude Code | 🔴🔴 新增 UnitOfWork + Saga Pattern 章節 (ADR-027) |
|
||||
|
||||
---
|
||||
|
||||
@@ -376,6 +377,75 @@ packages/lewooogo-data/src/lewooogo_data/
|
||||
|
||||
---
|
||||
|
||||
## 🔴🔴 UnitOfWork + Saga Pattern (ADR-027)
|
||||
|
||||
> **用途**: 確保 Incident-Approval 雙層寫入原子性
|
||||
> **批准日期**: 2026-03-26
|
||||
> **Memory**: `feedback_incident_approval_sync.md`, `project_incident_approval_sync.md`
|
||||
|
||||
### 問題背景
|
||||
|
||||
Incident 和 Approval 的建立涉及兩層儲存 (PostgreSQL + Redis),必須確保:
|
||||
1. 兩者同時成功或同時失敗
|
||||
2. 狀態變更時雙向同步
|
||||
|
||||
### 核心模式
|
||||
|
||||
```python
|
||||
# ✅ 正確: 使用 IncidentApprovalService
|
||||
from src.services.incident_approval_service import IncidentApprovalService
|
||||
|
||||
async def handle_alert(data: AlertData):
|
||||
service = IncidentApprovalService(session_factory, redis_client)
|
||||
incident, approval = await service.create_with_approval(
|
||||
incident_data, approval_data
|
||||
)
|
||||
|
||||
# ❌ 禁止: 分別建立 Incident 和 Approval
|
||||
incident = await incident_repo.create(...) # PostgreSQL
|
||||
approval = await approval_repo.create(...) # Redis 可能失敗,導致孤兒 Incident
|
||||
```
|
||||
|
||||
### UnitOfWork 模式
|
||||
|
||||
```python
|
||||
from src.core.unit_of_work import UnitOfWork
|
||||
|
||||
async with UnitOfWork(session_factory) as uow:
|
||||
# 所有操作在同一事務中
|
||||
incident = await self.incident_repo.create(uow.session, data)
|
||||
approval = await self.approval_repo.create(uow.session, data)
|
||||
|
||||
# Redis 寫入 (事務外,需 Saga 補償)
|
||||
try:
|
||||
await self._write_to_redis(incident, approval)
|
||||
except RedisError:
|
||||
await uow.rollback() # Saga 補償
|
||||
raise
|
||||
```
|
||||
|
||||
### 鐵律
|
||||
|
||||
| 規則 | 說明 |
|
||||
|------|------|
|
||||
| 禁止單獨建立 | Incident/Approval 必須透過 IncidentApprovalService |
|
||||
| 狀態同步 | Approval 變更時必須同步 Incident |
|
||||
| TTL 統一 | 使用 `src/core/constants.py` 定義的 TTL |
|
||||
| Redis 失敗回滾 | 使用 Saga Pattern 補償 PostgreSQL 事務 |
|
||||
|
||||
### 檔案位置
|
||||
|
||||
```
|
||||
apps/api/src/
|
||||
├── core/
|
||||
│ ├── unit_of_work.py # 事務管理
|
||||
│ └── constants.py # TTL 常量
|
||||
└── services/
|
||||
└── incident_approval_service.py # 原子同步服務
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧰 MCP Tool 實作規範 (Phase 13.2)
|
||||
|
||||
> **目標**: 將 Mock MCP Tool 升級為真實系統連接
|
||||
@@ -488,7 +558,9 @@ api/v1/*.py (Router) → services/*.py (Service) → packages/lewooogo-*/ (積
|
||||
- `packages/lewooogo-data/`: 記憶體 Provider 積木
|
||||
- `packages/lewooogo-brain/`: AI 引擎積木
|
||||
- `memory/feedback_lewooogo_modular_enforcement.md`: 積木化強制執行鐵律
|
||||
- `memory/feedback_incident_approval_sync.md`: Incident-Approval 同步鐵律
|
||||
- ADR-001: MCP Protocol 採用
|
||||
- ADR-005: BFF 閘道架構
|
||||
- ADR-006: AI 備援策略
|
||||
- ADR-008: Python 模組化獨立積木架構
|
||||
- ADR-027: Incident-Approval 同步架構 (UnitOfWork + Saga)
|
||||
|
||||
@@ -120,6 +120,18 @@ module.exports = {
|
||||
severity: "warn",
|
||||
from: { path: "apps/web/src/lib" },
|
||||
to: { path: "apps/web/src/components" }
|
||||
},
|
||||
|
||||
// =========================================================================
|
||||
// #94: Stores 禁止直接 import API Client
|
||||
// 原因: 狀態管理層不應直接呼叫 API,應透過 hooks 或 components 層
|
||||
// =========================================================================
|
||||
{
|
||||
name: "stores-no-api-import",
|
||||
comment: "stores 禁止直接引用 api-client (應透過 hooks 層)",
|
||||
severity: "error",
|
||||
from: { path: "apps/web/src/stores" },
|
||||
to: { path: "apps/web/src/lib/api-client" }
|
||||
}
|
||||
],
|
||||
|
||||
|
||||
45
apps/api/src/core/constants.py
Normal file
45
apps/api/src/core/constants.py
Normal file
@@ -0,0 +1,45 @@
|
||||
"""
|
||||
Core Constants
|
||||
==============
|
||||
ADR-027: Incident-Approval 同步架構
|
||||
|
||||
統一定義系統常量,避免散落各處的 magic numbers。
|
||||
|
||||
版本: v1.0
|
||||
建立: 2026-03-26 (台北時區)
|
||||
"""
|
||||
|
||||
# =============================================================================
|
||||
# TTL Settings (秒)
|
||||
# =============================================================================
|
||||
|
||||
# Working Memory TTL: 7 天
|
||||
INCIDENT_TTL_SECONDS = 7 * 24 * 3600 # 604800
|
||||
APPROVAL_TTL_SECONDS = 7 * 24 * 3600 # 604800
|
||||
|
||||
# Decision Token TTL: 24 小時
|
||||
DECISION_TTL_SECONDS = 24 * 3600 # 86400
|
||||
|
||||
# =============================================================================
|
||||
# Redis Key Prefixes
|
||||
# =============================================================================
|
||||
|
||||
REDIS_KEY_INCIDENT = "incident:"
|
||||
REDIS_KEY_APPROVAL = "approval:"
|
||||
REDIS_KEY_PENDING = "pending_approvals"
|
||||
REDIS_KEY_DECISION = "decision:"
|
||||
|
||||
# =============================================================================
|
||||
# Status Mappings (ADR-027)
|
||||
# =============================================================================
|
||||
|
||||
# Approval 狀態 → Incident 狀態
|
||||
APPROVAL_TO_INCIDENT_STATUS = {
|
||||
"pending": "investigating",
|
||||
"approved": "resolved",
|
||||
"rejected": "rejected",
|
||||
"expired": "expired",
|
||||
}
|
||||
|
||||
# Incident 狀態 → 是否活躍
|
||||
INCIDENT_ACTIVE_STATUSES = frozenset({"investigating", "mitigating"})
|
||||
152
apps/api/src/core/unit_of_work.py
Normal file
152
apps/api/src/core/unit_of_work.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""
|
||||
Unit of Work Pattern
|
||||
====================
|
||||
ADR-027: Incident-Approval 同步架構
|
||||
|
||||
PostgreSQL 事務管理器,確保多表操作原子性。
|
||||
|
||||
設計原則:
|
||||
- 使用 async context manager
|
||||
- 支援顯式 commit/rollback
|
||||
- 與 FastAPI Depends 相容
|
||||
|
||||
版本: v1.0
|
||||
建立: 2026-03-26 (台北時區)
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import structlog
|
||||
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
|
||||
|
||||
logger = structlog.get_logger(__name__)
|
||||
|
||||
|
||||
class UnitOfWork:
|
||||
"""
|
||||
PostgreSQL 事務管理器 (Unit of Work Pattern)
|
||||
|
||||
用途:
|
||||
- 確保多個資料庫操作在同一事務中
|
||||
- 支援手動 commit/rollback
|
||||
- 與 Saga Pattern 配合處理外部系統失敗
|
||||
|
||||
使用方式:
|
||||
async with UnitOfWork(session_factory) as uow:
|
||||
# 所有操作在同一事務
|
||||
incident = await repo.create(uow.session, data)
|
||||
approval = await repo.create(uow.session, data)
|
||||
|
||||
# 外部系統操作
|
||||
try:
|
||||
await redis.set(key, value)
|
||||
except RedisError:
|
||||
await uow.rollback() # Saga 補償
|
||||
raise
|
||||
|
||||
Warning:
|
||||
- 不要在 UnitOfWork 內使用 get_db_context(),會造成嵌套事務
|
||||
- Redis 操作失敗時必須手動呼叫 rollback()
|
||||
"""
|
||||
|
||||
def __init__(self, session_factory: async_sessionmaker[AsyncSession]):
|
||||
"""
|
||||
初始化 UnitOfWork
|
||||
|
||||
Args:
|
||||
session_factory: SQLAlchemy async session factory
|
||||
"""
|
||||
self._session_factory = session_factory
|
||||
self._session: AsyncSession | None = None
|
||||
self._committed = False
|
||||
|
||||
@property
|
||||
def session(self) -> AsyncSession:
|
||||
"""
|
||||
取得當前 Session
|
||||
|
||||
Raises:
|
||||
RuntimeError: 在 context manager 外部呼叫
|
||||
"""
|
||||
if self._session is None:
|
||||
raise RuntimeError("UnitOfWork must be used within async context manager")
|
||||
return self._session
|
||||
|
||||
async def __aenter__(self) -> "UnitOfWork":
|
||||
"""進入事務"""
|
||||
self._session = self._session_factory()
|
||||
self._committed = False
|
||||
logger.debug("uow_started")
|
||||
return self
|
||||
|
||||
async def __aexit__(
|
||||
self,
|
||||
exc_type: type[BaseException] | None,
|
||||
exc_val: BaseException | None,
|
||||
exc_tb: Any,
|
||||
) -> None:
|
||||
"""
|
||||
離開事務
|
||||
|
||||
行為:
|
||||
- 有例外: 自動 rollback
|
||||
- 無例外且未手動 commit: 自動 commit
|
||||
- 已手動 commit: 不做任何事
|
||||
"""
|
||||
if exc_type is not None:
|
||||
# 有例外,rollback
|
||||
await self.rollback()
|
||||
logger.warning(
|
||||
"uow_rollback_on_exception",
|
||||
exc_type=exc_type.__name__ if exc_type else None,
|
||||
exc_val=str(exc_val) if exc_val else None,
|
||||
)
|
||||
elif not self._committed:
|
||||
# 無例外且未手動 commit,自動 commit
|
||||
await self.commit()
|
||||
|
||||
# 關閉 session
|
||||
if self._session:
|
||||
await self._session.close()
|
||||
self._session = None
|
||||
|
||||
async def commit(self) -> None:
|
||||
"""
|
||||
提交事務
|
||||
|
||||
Note:
|
||||
- 呼叫後 _committed = True,__aexit__ 不會再次 commit
|
||||
- 可多次呼叫,但只有第一次有效
|
||||
"""
|
||||
if self._session and not self._committed:
|
||||
await self._session.commit()
|
||||
self._committed = True
|
||||
logger.debug("uow_committed")
|
||||
|
||||
async def rollback(self) -> None:
|
||||
"""
|
||||
回滾事務
|
||||
|
||||
用於:
|
||||
- Saga Pattern: 外部系統 (Redis) 失敗時補償
|
||||
- 手動回滾: 業務邏輯需要
|
||||
|
||||
Note:
|
||||
- 呼叫後事務已回滾,後續操作會在新事務中
|
||||
- _committed 設為 True 防止 __aexit__ 再次操作
|
||||
"""
|
||||
if self._session:
|
||||
await self._session.rollback()
|
||||
self._committed = True # 防止 __aexit__ 再次 commit
|
||||
logger.debug("uow_rolled_back")
|
||||
|
||||
async def flush(self) -> None:
|
||||
"""
|
||||
Flush 但不 commit
|
||||
|
||||
用於:
|
||||
- 取得資料庫生成的 ID (before commit)
|
||||
- 驗證資料庫約束
|
||||
"""
|
||||
if self._session:
|
||||
await self._session.flush()
|
||||
425
apps/api/src/services/incident_approval_service.py
Normal file
425
apps/api/src/services/incident_approval_service.py
Normal file
@@ -0,0 +1,425 @@
|
||||
"""
|
||||
Incident-Approval Synchronization Service
|
||||
==========================================
|
||||
ADR-027: Incident-Approval 同步架構
|
||||
|
||||
確保 Incident 和 Approval 的原子性同步:
|
||||
1. 建立時原子同步 (UnitOfWork + Saga)
|
||||
2. 狀態變更時雙向傳播
|
||||
|
||||
設計原則:
|
||||
- PostgreSQL 事務 (UnitOfWork) 確保資料庫原子性
|
||||
- Saga Pattern 處理 Redis 寫入失敗
|
||||
- 狀態同步 Hook 確保一致性
|
||||
|
||||
版本: v1.0
|
||||
建立: 2026-03-26 (台北時區)
|
||||
"""
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from typing import TYPE_CHECKING
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import structlog
|
||||
from redis.exceptions import RedisError
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
|
||||
|
||||
from src.core.constants import (
|
||||
APPROVAL_TO_INCIDENT_STATUS,
|
||||
APPROVAL_TTL_SECONDS,
|
||||
INCIDENT_TTL_SECONDS,
|
||||
REDIS_KEY_APPROVAL,
|
||||
REDIS_KEY_INCIDENT,
|
||||
)
|
||||
from src.core.unit_of_work import UnitOfWork
|
||||
from src.db.models import ApprovalRecord, IncidentRecord
|
||||
from src.models.approval import ApprovalRequest, ApprovalRequestCreate, ApprovalStatus
|
||||
from src.models.incident import Incident, IncidentStatus
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from redis.asyncio import Redis
|
||||
|
||||
logger = structlog.get_logger(__name__)
|
||||
|
||||
|
||||
class IncidentApprovalSyncError(Exception):
|
||||
"""Incident-Approval 同步失敗"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class IncidentApprovalService:
|
||||
"""
|
||||
Incident-Approval 同步服務
|
||||
|
||||
職責:
|
||||
1. 原子建立 Incident + Approval (create_with_approval)
|
||||
2. 狀態同步 (on_approval_status_change)
|
||||
3. TTL 統一管理
|
||||
|
||||
使用方式:
|
||||
service = IncidentApprovalService(session_factory, redis_client)
|
||||
|
||||
# 原子建立
|
||||
incident, approval = await service.create_with_approval(
|
||||
incident_data, approval_data
|
||||
)
|
||||
|
||||
# 狀態變更同步
|
||||
await service.on_approval_status_change(approval_id, "approved")
|
||||
|
||||
Warning:
|
||||
禁止繞過此 Service 直接建立 Incident 或 Approval
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
session_factory: async_sessionmaker[AsyncSession],
|
||||
redis_client: "Redis",
|
||||
):
|
||||
"""
|
||||
初始化 IncidentApprovalService
|
||||
|
||||
Args:
|
||||
session_factory: SQLAlchemy async session factory
|
||||
redis_client: Redis async client
|
||||
"""
|
||||
self._session_factory = session_factory
|
||||
self._redis = redis_client
|
||||
|
||||
# =========================================================================
|
||||
# 原子建立 (Phase 1 Core)
|
||||
# =========================================================================
|
||||
|
||||
async def create_with_approval(
|
||||
self,
|
||||
incident_data: dict,
|
||||
approval_data: ApprovalRequestCreate,
|
||||
*,
|
||||
link_metadata: bool = True,
|
||||
) -> tuple[IncidentRecord, ApprovalRecord]:
|
||||
"""
|
||||
原子建立 Incident + Approval
|
||||
|
||||
保證:
|
||||
1. PostgreSQL 事務:兩者同時成功或失敗
|
||||
2. Saga 補償:Redis 失敗時回滾 PostgreSQL
|
||||
|
||||
Args:
|
||||
incident_data: Incident 資料 (dict for IncidentRecord)
|
||||
approval_data: Approval 建立請求
|
||||
link_metadata: 是否在 metadata 中建立雙向連結
|
||||
|
||||
Returns:
|
||||
tuple[IncidentRecord, ApprovalRecord]: 建立的記錄
|
||||
|
||||
Raises:
|
||||
IncidentApprovalSyncError: 建立失敗 (已回滾)
|
||||
|
||||
Warning:
|
||||
不要直接使用 incident_service.create() 或 approval_db.create_approval()
|
||||
"""
|
||||
async with UnitOfWork(self._session_factory) as uow:
|
||||
try:
|
||||
# 1. 建立 Incident (PostgreSQL)
|
||||
incident_id = incident_data.get("incident_id") or str(uuid4())
|
||||
incident_data["incident_id"] = incident_id
|
||||
|
||||
incident_record = IncidentRecord(
|
||||
id=incident_id,
|
||||
title=incident_data.get("title", ""),
|
||||
description=incident_data.get("description", ""),
|
||||
severity=incident_data.get("severity", "P2"),
|
||||
status=incident_data.get("status", "investigating"),
|
||||
source=incident_data.get("source", "alertmanager"),
|
||||
fingerprint=incident_data.get("fingerprint"),
|
||||
extra_metadata=incident_data.get("metadata", {}),
|
||||
)
|
||||
uow.session.add(incident_record)
|
||||
await uow.flush() # 取得 ID
|
||||
|
||||
# 2. 建立 Approval (PostgreSQL)
|
||||
# 準備 metadata,建立雙向連結
|
||||
approval_metadata = approval_data.metadata or {}
|
||||
if link_metadata:
|
||||
approval_metadata["incident_id"] = incident_id
|
||||
|
||||
approval_record = ApprovalRecord(
|
||||
action=approval_data.action,
|
||||
description=approval_data.description,
|
||||
status=ApprovalStatus.PENDING,
|
||||
risk_level=approval_data.risk_level or "MEDIUM",
|
||||
required_signatures=1,
|
||||
current_signatures=0,
|
||||
signatures=[],
|
||||
blast_radius=approval_data.blast_radius.model_dump()
|
||||
if approval_data.blast_radius
|
||||
else {},
|
||||
dry_run_checks=[
|
||||
c.model_dump() for c in (approval_data.dry_run_checks or [])
|
||||
],
|
||||
requested_by=approval_data.requested_by,
|
||||
expires_at=approval_data.expires_at,
|
||||
extra_metadata=approval_metadata,
|
||||
fingerprint=incident_data.get("fingerprint"),
|
||||
)
|
||||
uow.session.add(approval_record)
|
||||
await uow.flush() # 取得 ID
|
||||
|
||||
# 更新 Incident metadata 連結 Approval
|
||||
if link_metadata:
|
||||
incident_record.extra_metadata = {
|
||||
**(incident_record.extra_metadata or {}),
|
||||
"approval_id": approval_record.id,
|
||||
}
|
||||
|
||||
# 3. 寫入 Redis (事務外,需 Saga 補償)
|
||||
try:
|
||||
await self._write_to_redis(incident_record, approval_record)
|
||||
except RedisError as e:
|
||||
# Saga 補償: rollback PostgreSQL
|
||||
logger.error(
|
||||
"redis_write_failed_saga_rollback",
|
||||
incident_id=incident_id,
|
||||
approval_id=approval_record.id,
|
||||
error=str(e),
|
||||
)
|
||||
await uow.rollback()
|
||||
raise IncidentApprovalSyncError(f"Redis write failed: {e}") from e
|
||||
|
||||
# 4. Commit PostgreSQL
|
||||
await uow.commit()
|
||||
|
||||
logger.info(
|
||||
"incident_approval_created",
|
||||
incident_id=incident_id,
|
||||
approval_id=approval_record.id,
|
||||
)
|
||||
|
||||
return incident_record, approval_record
|
||||
|
||||
except IncidentApprovalSyncError:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.exception(
|
||||
"incident_approval_create_error",
|
||||
error=str(e),
|
||||
)
|
||||
raise IncidentApprovalSyncError(f"Create failed: {e}") from e
|
||||
|
||||
async def _write_to_redis(
|
||||
self,
|
||||
incident: IncidentRecord,
|
||||
approval: ApprovalRecord,
|
||||
) -> None:
|
||||
"""
|
||||
寫入 Redis (Working Memory)
|
||||
|
||||
兩個 key 必須同時成功,使用 pipeline 批量操作。
|
||||
|
||||
Args:
|
||||
incident: Incident 記錄
|
||||
approval: Approval 記錄
|
||||
"""
|
||||
incident_key = f"{REDIS_KEY_INCIDENT}{incident.id}"
|
||||
approval_key = f"{REDIS_KEY_APPROVAL}{approval.id}"
|
||||
|
||||
# 序列化
|
||||
incident_json = self._serialize_incident(incident)
|
||||
approval_json = self._serialize_approval(approval)
|
||||
|
||||
# Pipeline 批量寫入
|
||||
async with self._redis.pipeline() as pipe:
|
||||
pipe.set(incident_key, incident_json, ex=INCIDENT_TTL_SECONDS)
|
||||
pipe.set(approval_key, approval_json, ex=APPROVAL_TTL_SECONDS)
|
||||
await pipe.execute()
|
||||
|
||||
logger.debug(
|
||||
"redis_written",
|
||||
incident_key=incident_key,
|
||||
approval_key=approval_key,
|
||||
ttl=INCIDENT_TTL_SECONDS,
|
||||
)
|
||||
|
||||
def _serialize_incident(self, record: IncidentRecord) -> str:
|
||||
"""序列化 IncidentRecord 為 JSON"""
|
||||
import json
|
||||
|
||||
return json.dumps({
|
||||
"incident_id": record.id,
|
||||
"title": record.title,
|
||||
"description": record.description,
|
||||
"severity": record.severity,
|
||||
"status": record.status,
|
||||
"source": record.source,
|
||||
"fingerprint": record.fingerprint,
|
||||
"metadata": record.extra_metadata,
|
||||
"created_at": record.created_at.isoformat() if record.created_at else None,
|
||||
})
|
||||
|
||||
def _serialize_approval(self, record: ApprovalRecord) -> str:
|
||||
"""序列化 ApprovalRecord 為 JSON"""
|
||||
import json
|
||||
|
||||
status_val = record.status.value if hasattr(record.status, "value") else record.status
|
||||
risk_val = record.risk_level.value if hasattr(record.risk_level, "value") else record.risk_level
|
||||
|
||||
return json.dumps({
|
||||
"id": record.id,
|
||||
"action": record.action,
|
||||
"description": record.description,
|
||||
"status": status_val,
|
||||
"risk_level": risk_val,
|
||||
"required_signatures": record.required_signatures,
|
||||
"current_signatures": record.current_signatures,
|
||||
"signatures": record.signatures,
|
||||
"blast_radius": record.blast_radius,
|
||||
"requested_by": record.requested_by,
|
||||
"created_at": record.created_at.isoformat() if record.created_at else None,
|
||||
"metadata": record.extra_metadata,
|
||||
})
|
||||
|
||||
# =========================================================================
|
||||
# 狀態同步 (Phase 2)
|
||||
# =========================================================================
|
||||
|
||||
async def on_approval_status_change(
|
||||
self,
|
||||
approval_id: str,
|
||||
new_status: ApprovalStatus | str,
|
||||
) -> None:
|
||||
"""
|
||||
Approval 狀態變更時同步 Incident
|
||||
|
||||
行為:
|
||||
1. 更新 Approval 狀態 (PostgreSQL)
|
||||
2. 同步更新關聯 Incident 狀態
|
||||
3. 更新 Redis TTL
|
||||
|
||||
Args:
|
||||
approval_id: Approval ID
|
||||
new_status: 新狀態 (ApprovalStatus 或字串)
|
||||
|
||||
Note:
|
||||
呼叫時機: sign_approval() 成功後
|
||||
"""
|
||||
if isinstance(new_status, str):
|
||||
new_status = ApprovalStatus(new_status)
|
||||
|
||||
async with UnitOfWork(self._session_factory) as uow:
|
||||
# 1. 取得 Approval
|
||||
result = await uow.session.execute(
|
||||
select(ApprovalRecord).where(ApprovalRecord.id == approval_id)
|
||||
)
|
||||
approval = result.scalar_one_or_none()
|
||||
|
||||
if not approval:
|
||||
logger.warning("approval_not_found_for_sync", approval_id=approval_id)
|
||||
return
|
||||
|
||||
# 2. 更新 Approval 狀態
|
||||
approval.status = new_status
|
||||
if new_status in (ApprovalStatus.APPROVED, ApprovalStatus.REJECTED):
|
||||
approval.resolved_at = datetime.now(UTC)
|
||||
|
||||
# 3. 取得關聯 Incident ID
|
||||
incident_id = (approval.extra_metadata or {}).get("incident_id")
|
||||
if not incident_id:
|
||||
logger.debug(
|
||||
"no_linked_incident",
|
||||
approval_id=approval_id,
|
||||
)
|
||||
await uow.commit()
|
||||
return
|
||||
|
||||
# 4. 更新 Incident 狀態
|
||||
result = await uow.session.execute(
|
||||
select(IncidentRecord).where(IncidentRecord.id == incident_id)
|
||||
)
|
||||
incident = result.scalar_one_or_none()
|
||||
|
||||
if incident:
|
||||
new_incident_status = APPROVAL_TO_INCIDENT_STATUS.get(
|
||||
new_status.value if hasattr(new_status, "value") else new_status,
|
||||
"investigating",
|
||||
)
|
||||
incident.status = new_incident_status
|
||||
if new_incident_status == "resolved":
|
||||
incident.resolved_at = datetime.now(UTC)
|
||||
|
||||
logger.info(
|
||||
"incident_status_synced",
|
||||
incident_id=incident_id,
|
||||
approval_id=approval_id,
|
||||
new_status=new_incident_status,
|
||||
)
|
||||
|
||||
# 5. 更新 Redis TTL
|
||||
try:
|
||||
await self._refresh_redis_ttl(incident_id, approval_id)
|
||||
except RedisError as e:
|
||||
# Redis 失敗不阻斷主流程
|
||||
logger.warning(
|
||||
"redis_ttl_refresh_failed",
|
||||
incident_id=incident_id,
|
||||
error=str(e),
|
||||
)
|
||||
|
||||
await uow.commit()
|
||||
|
||||
async def _refresh_redis_ttl(
|
||||
self,
|
||||
incident_id: str,
|
||||
approval_id: str,
|
||||
) -> None:
|
||||
"""
|
||||
刷新 Redis TTL
|
||||
|
||||
用於確保關聯的 Incident 和 Approval 同時過期。
|
||||
"""
|
||||
incident_key = f"{REDIS_KEY_INCIDENT}{incident_id}"
|
||||
approval_key = f"{REDIS_KEY_APPROVAL}{approval_id}"
|
||||
|
||||
async with self._redis.pipeline() as pipe:
|
||||
pipe.expire(incident_key, INCIDENT_TTL_SECONDS)
|
||||
pipe.expire(approval_key, APPROVAL_TTL_SECONDS)
|
||||
await pipe.execute()
|
||||
|
||||
# =========================================================================
|
||||
# 查詢輔助
|
||||
# =========================================================================
|
||||
|
||||
async def get_incident_by_approval_id(
|
||||
self,
|
||||
approval_id: str,
|
||||
) -> IncidentRecord | None:
|
||||
"""
|
||||
透過 Approval ID 取得關聯的 Incident
|
||||
|
||||
Args:
|
||||
approval_id: Approval ID
|
||||
|
||||
Returns:
|
||||
IncidentRecord | None
|
||||
"""
|
||||
async with UnitOfWork(self._session_factory) as uow:
|
||||
# 取得 Approval
|
||||
result = await uow.session.execute(
|
||||
select(ApprovalRecord).where(ApprovalRecord.id == approval_id)
|
||||
)
|
||||
approval = result.scalar_one_or_none()
|
||||
|
||||
if not approval:
|
||||
return None
|
||||
|
||||
# 取得 Incident ID
|
||||
incident_id = (approval.extra_metadata or {}).get("incident_id")
|
||||
if not incident_id:
|
||||
return None
|
||||
|
||||
# 取得 Incident
|
||||
result = await uow.session.execute(
|
||||
select(IncidentRecord).where(IncidentRecord.id == incident_id)
|
||||
)
|
||||
return result.scalar_one_or_none()
|
||||
181
docs/AWOOOI_AGENTIC_WORKSPACE_ROADMAP.md
Normal file
181
docs/AWOOOI_AGENTIC_WORKSPACE_ROADMAP.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# AWOOOI 典範轉移:AI-Native 前端產品藍圖 (Agentic Workspace Roadmap)
|
||||
|
||||
> **文件狀態**:Living Document (持續更新)
|
||||
> **建立日期**:2026-03-26
|
||||
> **核心命題**:絕對不能變成傳統監控後台,必須是真正的 AI 智慧管理平台。
|
||||
|
||||
---
|
||||
|
||||
## 🌟 核心哲學:從「儀表板 (Dashboard)」走向「代理人協作空間 (Agentic Workspace)」
|
||||
|
||||
傳統監控後台的底層邏輯是:「系統把數據攤開,人類負責尋找答案」。
|
||||
AWOOOI 作為真正的 AI 平台,其底層邏輯必須是:**「人類下達意圖 (Intent),AI 負責尋找數據、推演解法並請求授權」**。
|
||||
|
||||
未來的 SRE 就像看著 Jarvis 一樣。前端 UI 必須具備以下三大 AI-Native 基因:
|
||||
|
||||
1. **生成式介面 (Generative UI / GenUI)**
|
||||
這是打破傳統後台的關鍵。前端不應該寫死「CPU 圖表」或「Memory 圖表」。當 OpenClaw 判定是資料庫慢查詢時,前端即時生成 (Render-on-the-fly) 一個包含 SQL 語法與執行計畫的特製卡片;如果是網路異常,則生成網路拓撲圖。畫面是跟著 AI 的思考邏輯動態長出來的。
|
||||
2. **空間推演畫布 (Spatial Investigation Canvas)**
|
||||
放棄傳統的「分頁 (Pagination)」。當一個 P0 災難發生時,SRE 需要的是一個無限延伸的「白板」。左邊是告警,中間是 AI 的分析推演樹,右邊是爆炸半徑拓撲。所有的關聯資訊都在同一個視覺平面上展開,這稱為「心智圖式導航」。
|
||||
3. **零信任的「核鑰」體驗 (Nuclear Key UX)**
|
||||
當 AI 提出高風險指令(如 `kubectl delete pod`)時,授權介面不能只是一個普通的藍色 [Submit] 按鈕。必須透過 UI 傳遞危險感(如紅黑條紋警示、需要長按 3 秒的滑動解鎖、顯示倒數計時),讓人類在按下核准的那一刻,能感受到決策的重量。
|
||||
|
||||
---
|
||||
|
||||
## 🗺️ AWOOOI 終極前端版圖:6 大心智樞紐 (Cognitive Hubs) 的進化
|
||||
|
||||
將這三項 AI 基因注入傳統的六大模組中,將迎來質變:
|
||||
|
||||
### 1. 全局心智樞紐 (The Nexus) ── 系統的腦波
|
||||
* **進化點**:拋棄傳統的「圓餅圖/折線圖」。
|
||||
* **呈現方式**:
|
||||
* **AI 自治率指標 (Autonomy Index)**: 顯示「今日 AI 成功攔截並修復的災難比例」。
|
||||
* **思考流瀑布 (Thinking Stream)**: 以 Nothing.tech 的極簡純文字風格,讓背景即時流動著 OpenClaw 正在處理的微小任務(如 `[Investigator] Scanning Redis queue... OK`)。帶來極大的系統生命力。
|
||||
* **脈搏聚合器 (Pulse Aggregator)**: 只有當系統需要統帥介入時,才會在正中央浮現一張高對比度的「決策簡報卡」。
|
||||
|
||||
### 2. 多簽核戰情沙盤 (The War Room) ── 決策的最終防線
|
||||
* **進化點**:不是「待辦清單」,而是「軍事簡報」。
|
||||
* **呈現方式**:
|
||||
* **紅藍對抗報告**: 畫面直接顯示 Strategist Agent(提出解法)與 Guardrail Agent(攔截風險)的辯論結果。
|
||||
* **Dry-Run 預覽**: 在統帥按下授權前,直接顯示這行指令下去後,K8s 狀態機將會發生什麼變化的「前後對比圖 (Diff View)」。
|
||||
|
||||
### 3. 拓撲因果畫布 (Topological Canvas) ── 災難的透視鏡
|
||||
* **進化點**:結合 GraphRAG,這不只是一張靜態的架構圖。
|
||||
* **呈現方式**:
|
||||
* **爆炸半徑高光**: 當異常發生,畫布會自動聚焦在核心節點,並以脈衝動畫向外渲染受影響的服務。
|
||||
* **時光倒流 (Time-Travel)**: 提供一個時間軸拉桿,統帥可以拉動滑桿,看著這場災難在過去 15 分鐘內是如何從一個小錯誤蔓延成全局崩潰的。
|
||||
|
||||
### 4. 事件記憶迴廊 (Episodic Memory Vault) ── 究責與進化
|
||||
* **進化點**:不是看 Log,是看「案件調查報告」。
|
||||
* **呈現方式**:每次事件結案後,自動生成一份豐富的圖文報告,包含 AI 當時擷取的截圖、Metrics 快照與執行日誌。SRE 可以在這份報告上劃重點、下標籤,將其轉化為下一次的訓練素材。
|
||||
|
||||
### 5. AI 兵工廠 (Tool Forge) ── 訓練與掛載
|
||||
* **進化點**:將「設定 API」變成「配發武器」。
|
||||
* **呈現方式**:採用卡片化管理 (Card-based UI)。每一種 MCP Tool(如 K8s-Operator、DB-Admin)都是一張裝備卡。統帥可以透過拖曳 (Drag & Drop) 的方式,將不同的工具配發給特定的 Agent 角色。
|
||||
|
||||
### 6. 大腦節流閥 (Engine Control) ── 換檔與調校
|
||||
* **進化點**:將複雜的 JSON 設定轉化為直觀的儀表。
|
||||
* **呈現方式**:
|
||||
* **雙軌切換器 (Dual-Track Toggle)**: 巨大且清晰的實體感撥桿,一鍵在「雲端大腦」與「本地大腦」之間切換。
|
||||
* **成本燃燒速率表 (Token Burn Rate)**: 即時顯示 API 呼叫的成本消耗,防止預算失控。
|
||||
|
||||
---
|
||||
|
||||
## 🎨 視覺與互動語彙:Cyber-Tactical 結合 Nothing.tech 美學
|
||||
|
||||
為了讓體驗最優,我們必須堅持極致的**「克制 (Restraint)」**:
|
||||
|
||||
* **Color Palette (色彩)**: 90% 的絕對黑與白 (Dark Mode First / Glassmorphism)、5% 的消光灰(用於次要資訊)、僅在關鍵狀態使用極高對比的 霓虹綠 (Success) 與 警戒紅 (Critical) 或 神經藍 (AI prompt)。
|
||||
* **Typography (字體)**: 全面採用等寬字體 (Monospace,如 VT323 / JetBrains Mono) 呈現數據與 Log,搭配俐落的無襯線體 (Sans-serif) 作為標題。這能營造出「專業、無情、精準」的工程師浪漫。
|
||||
* **Micro-interactions (微互動)**: 所有的動畫都必須是「冷靜而迅速」的(時間控制在 150ms 以內)。放棄花俏的彈跳,改用透明度漸變、游標掃描線等科技感元素。
|
||||
|
||||
---
|
||||
|
||||
## 🚀 起點:全局底部常駐的 AI 終端 (Omni-Terminal)
|
||||
|
||||
**首要實作任務:Omni-Terminal (全局心智中樞)。**
|
||||
|
||||
**戰略原因**:
|
||||
1. **改變使用者習慣的槓桿率最高**: Terminal 改變的是「人機互動方式」。一旦 SRE 習慣了在任何頁面輸入 `/incident show latest` 並獲得 GenUI 卡片回應,這套系統就真正擁有「AI 靈魂」了。
|
||||
2. **開發成本與 ROI**: 實作一個結合 WebSocket/SSE 的互動對話框,能立刻將後端 OpenClaw 的強大推理能力 100% 釋放到前端。
|
||||
|
||||
**現況 (v1.0.0)**:已於前端框架中建置 OmniTerminal 外殼與白玻璃材質元件,並擴充了 Zustand SSE 狀態儲存庫。待後端 FastAPI Endpoint 對接即可上線。
|
||||
|
||||
---
|
||||
|
||||
## 🎭 賽博戰術劇場 (Cyber-Tactical Theatre):五維實體視覺矩陣
|
||||
|
||||
如果只給出一堆冰冷的數據圖表,那這只不過是「另一個 Grafana」。為了在堅持 Nothing.tech 極簡美學的前提下,將 AWOOOI 的底層運作完美具象化,我們必須採用**「符號化、終端機語彙與動態呼吸感」**,讓每一次的告警處理都像是一場人類與各種 AI 實體協同作戰的科幻電影。
|
||||
|
||||
以下是為 AWOOOI 量身訂製的**「五維實體 UI/UX 視覺矩陣」**:
|
||||
|
||||
### 🦞 1. OpenClaw 龍蝦 (The Orchestrator 總樞紐)
|
||||
它是系統中負責「夾取、拆解、分發」的最高指揮網路。
|
||||
* **視覺符號**: 使用極簡的幾何線條或 ASCII 藝術來暗示「雙鉗」的意象,例如 `>--[ DATA ]--<` 或 `[ 爪 ]`。
|
||||
* **動態表現**: 當系統收到複雜告警,準備分發任務時,畫面上方出現閃爍的終端機字樣:`[OpenClaw Protocol: ENGAGED]`,伴隨著兩道極細的掃描線從螢幕兩側向中央夾擊,象徵「龍蝦正在撕裂並解析問題」。
|
||||
|
||||
### 🧠 2. AI (The Core Brain 底層神經大腦)
|
||||
代表背後的算力深淵(Ollama / Claude)。它安靜、巨大、深不可測。
|
||||
* **視覺符號**: 點陣矩陣 (Dot Matrix) 背景。
|
||||
* **動態表現**: 當 AI 正在進行深度推理時,背景的點陣圖會產生微弱的「呼吸燈效 (Pulsing Opacity)」。UI 卡片邊緣會泛起 1px 的微光,展現出「算力正在燃燒」的生命力。
|
||||
|
||||
### 🕵️♂️ 3. Agent (The Council 幕僚代理人)
|
||||
具備特定性格與專業的專家(調查員、策略師、政委)。必須讓使用者感覺到「有一群專家正在後台開會」。
|
||||
* **視覺符號**: 在 Omni-Terminal 中,賦予他們專屬的代號標籤(Badge),例如 `@Investigator`、`@Strategist`。
|
||||
* **動態表現**: 在畫面快速刷過他們的「對話軌跡」,展現 AI 互相爭辯的過程,帶來極大的視覺震撼。
|
||||
> `@Investigator`: 發現 Redis 記憶體飆升至 98%。
|
||||
> `@Strategist`: 提議執行 Cache 清除指令。
|
||||
> `@Guardrail`: ⚠️ 警告:此指令將導致瞬間 Cache Miss,建議改為緩步淘汰。
|
||||
|
||||
### 🤖 4. 機器人 (The Executor 無情執行者)
|
||||
代表 K8s Worker、Redis 消費者。沒有思想,只有絕對的服從與力量。
|
||||
* **視覺符號**: 剛硬的進度條、齒輪隱喻、純粹的機械狀態碼。
|
||||
* **動態表現**: 指令被授權執行時,畫面瞬間切換為冷酷的終端機模式。進度條為方塊狀 `[██████░░░]`,伴隨 `[EXECUTING]`, `[NODE_RESTARTED]`, `[EXIT_CODE_0]`,展現高度工業化的機械精準度。
|
||||
|
||||
### 👑 5. 人類 (The Commander 統帥/最終裁決者)
|
||||
人類是最尊貴的「核鑰持有者」。
|
||||
* **視覺符號**: 絕對高對比的視覺焦點。需要人類時,其他元素亮度降至 30%,唯獨留下中央的授權卡片高亮。
|
||||
* **動態表現**: 提示字眼永遠是尊貴且帶有責任感的:`[ AWAITING COMMANDER OVERRIDE ]` 或 `[ 請求統帥親核 ]`。操作需透過「長按 3 秒」或「向右滑動解鎖紅線」,建立起對系統的絕對掌控感。
|
||||
|
||||
---
|
||||
|
||||
## 📐 附錄 A:首席架構師視覺與動態基準 (Architectural Visual & Kinetic Directives)
|
||||
|
||||
為了將上述「賽博戰術劇場」從概念轉化為嚴謹的 Frontend Code,必須建立一套不可踰越的 **「冷酷工程師美學 (Cold Engineering Aesthetic)」** 參數。以下是我的專業架構建議:
|
||||
|
||||
### 1. 深度與特權的 Z-Index 矩陣 (Z-Index Hierarchy of Trust)
|
||||
* **Z-0 (系統深淵)**:點陣圖背景、OpenClaw 的掃描線。這裡屬於潛意識層,採用 `opacity-20` 極度模糊。
|
||||
* **Z-10 (幕僚對話)**:Agent 爭辯的 Omni-Terminal 訊息流。採用 `bg-white/40 backdrop-blur-[10px]` 的半透明玻璃。
|
||||
* **Z-100 (核鑰授權)**:人類 Commander 的決策卡片。必須物理性地隔絕雜訊,採用最高等級的 `bg-white/90 backdrop-blur-[40px] drop-shadow-2xl`,並搭配 2px 的實體黑色邊框 `border-2 border-nothing-black`。
|
||||
|
||||
### 2. 賽博戰術色票 (Cyber-Tactical Color Palette)
|
||||
拒絕廉價的高飽和色彩,所有顏色必須具有「警示功能」:
|
||||
* **Claw-Blue (大腦藍)**:`#4A90E2`。專屬於 AI 推理。
|
||||
* **Agent-Orange (幕僚橘)**:`#F5A623`。代表警戒與討論中 (`WARNING`)。
|
||||
* **Commander-Red (核鑰紅)**:`#FF3B30`。極度危險、不可逆轉的操作 (Delete Pod / Drop Table) 才允許使用。必須搭配條紋斑馬線背景。
|
||||
* **Executor-Green (執行綠)**:`#34C759`。只有在 K8s 確定 `EXIT_CODE_0` (成功) 後才能閃爍一次,象徵危機解除。
|
||||
|
||||
### 3. 動態防抖法則 (Kinetic Debounce Rule)
|
||||
* 如果系統同時死亡 50 顆 Pod,絕不能讓畫面閃爍 50 次紅光(這會讓 SRE 崩潰)。
|
||||
* 必須在前端實作 `requestAnimationFrame` 防抖 (Debounce) 或 Throttling。將所有動畫聚合為**一秒一次深沉的脈衝 (Deep Pulse)**,展現出「雖然事態嚴重,但系統仍在掌握中」的冷靜感。
|
||||
|
||||
### 4. 職責分離的字體矩陣 (Typographic Separation of Concerns)
|
||||
* **VT323 (點陣體)**:只能用於「OpenClaw 思考流」與背景雜訊。它自帶復古感,但長文閱讀性差。
|
||||
* **JetBrains Mono (等寬體)**:僅用於「K8s Executor 日誌」與「SQL 語法」。特製的 0 與 O 區別,是緊急 Debug 時的安全網。
|
||||
* **Inter (無襯線體)**:專屬於「人類 Commander」的按鈕、標題與確認對白,確保在最疲勞的狀態下也能一秒看懂。
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 附錄 B:神經網路對接基準 (Neural Link Integration Protocol)
|
||||
|
||||
為了讓 Omni-Terminal 成為真正的「心智中樞」,我們必須在前後端交界的 API 合約上,實作以下三大高階架構設計:
|
||||
|
||||
### 1. GenUI 的「動態渲染合約」 (The Rendering Protocol)
|
||||
後端 AI 生成的速度很快,前端必須透過自訂的 Server-Sent Events (SSE) 類型來決定何時渲染純文字,何時呈現視覺卡片。
|
||||
|
||||
**API Payload Contract (草案)**:
|
||||
* **`event: thought`**:思考軌跡 (快速刷動的背景字)。例如 `{"agent": "Investigator", "msg": "Analyzing Redis latency..."}`
|
||||
* **`event: tool_call`**:工具呼叫 (顯示微動畫)。例如 `{"tool": "K8s-Operator", "status": "executing"}`
|
||||
* **`event: render_ui`**:GenUI 實體掛載。當前端 Zustand 收到此事件,立刻用 Nothing.tech 風格的 React Component 替換文字流。
|
||||
```json
|
||||
{"component": "MetricsCard", "props": {"cpu": 95, "memory": 88, "pod": "worker-1"}}
|
||||
```
|
||||
|
||||
### 2. 空間感知的上下文注入 (Spatial Context Awareness)
|
||||
當使用者在 Omni-Terminal 輸入指令時,前端發出的 Request 必須帶有「隱形夾帶 (Ghost Payload)」,讓 AI 知道使用者當前的視線落在哪裡。
|
||||
* **戰術設計**:按下 Enter 瞬間,前端抓取目前的「路由 (Current Route)」與「畫面焦點 (Active Entity ID)」。
|
||||
```json
|
||||
{
|
||||
"intent": "/restart",
|
||||
"context": {
|
||||
"current_page": "/war-room",
|
||||
"focused_incident_id": "INC-20260326-001"
|
||||
}
|
||||
}
|
||||
```
|
||||
這賦予了系統強大的感知能力:「統帥正在看 INC-001,他說的 restart 就是指重啟這個事件的 Pod。」
|
||||
|
||||
### 3. 打斷與覆寫機制 (The "Interrupt" Override)
|
||||
AI 的生成可能需要數秒。若此時爆發新的 P0 告警,系統必須立刻拋棄當前迴圈。
|
||||
* **使用者打斷**:前端綁定 `AbortController`。按下 `[ 🛑 HALT ]` 按鈕瞬間,切斷連線並透過 FastAPI 殺掉底層執行緒以釋放資源。
|
||||
* **系統級插隊 (Priority Preemption)**:FastAPI 具備多工流 (Multiplexing) 能力。當 Redis 傳來 P0 告警,直接在當前的 SSE 通道強制插入 `event: urgent_alert`。前端收到後將現有字體轉灰,直接升起紅色警報卡片:`[ SYSTEM OVERRIDE ] P0 Alert: DB Crash`。
|
||||
@@ -5,19 +5,46 @@
|
||||
|
||||
---
|
||||
|
||||
## 📍 當前狀態 (2026-03-27 01:10 台北)
|
||||
## 📍 當前狀態 (2026-03-26 19:30 台北)
|
||||
|
||||
| 項目 | 狀態 |
|
||||
|------|------|
|
||||
| **當前 Phase** | **Phase 13.2 ✅ 完成** |
|
||||
| **當前 Phase** | **Phase 17.1 Incident-Approval 同步** |
|
||||
| **Day** | Day 10 |
|
||||
| **AI Fallback** | ✅ **Gemini 優先 (11/500 daily used)** |
|
||||
| **Phase 13.2 #84** | ✅ **RAGProvider 部署完成 (f1117a3)** |
|
||||
| **Phase 16** | 🔄 R1.3 驗證期至 2026-03-27 16:04 |
|
||||
| **架構審查** | ✅ **ADR-025 CI/CD AI 整合** |
|
||||
| **Skills** | ✅ **Skill 07 v1.3 更新** |
|
||||
| **ADR-027** | ✅ **批准** (Incident-Approval 同步架構) |
|
||||
| **ADR-026** | ✅ **批准** (CoreDNS GitOps) |
|
||||
| **Telegram 告警** | ✅ **修復完成** (NetworkPolicy + CoreDNS) |
|
||||
| **架構審查** | ✅ **完成** (5 個關鍵問題識別) |
|
||||
| **Runner 問題** | ⚠️ **actions/checkout 檔案寫入問題 (需調查)** |
|
||||
|
||||
### 🔴 2026-03-26 首席架構師完整審查 + ADR-027 批准 (Day 9 晚間 19:30)
|
||||
|
||||
**審查觸發**: 活躍事件顯示 0 + Telegram 告警異常 (統帥要求)
|
||||
|
||||
**首席架構師審查結果**:
|
||||
|
||||
| 問題 ID | 描述 | 嚴重度 |
|
||||
|---------|------|--------|
|
||||
| CRITICAL-001 | Incident-Approval 建立不是原子事務 | 🔴🔴 |
|
||||
| CRITICAL-002 | 雙層寫入非原子 (Redis + PostgreSQL) | 🔴🔴 |
|
||||
| HIGH-001 | Approval 狀態變更未同步 Incident | 🔴 |
|
||||
| HIGH-002 | Redis TTL 過期導致資料遺失 | 🟡 |
|
||||
| HIGH-003 | 前端狀態一致性問題 | 🟡 ✅已修 |
|
||||
|
||||
**解決方案** (ADR-027):
|
||||
- UnitOfWork 模式: PostgreSQL 事務管理
|
||||
- Saga Pattern: Redis 寫入失敗回滾
|
||||
- IncidentApprovalService: 封裝原子操作
|
||||
- 狀態同步 Hook: Approval 變更時同步 Incident
|
||||
|
||||
**新增 ADR**:
|
||||
- `docs/adr/ADR-027-incident-approval-sync.md` - Incident-Approval 同步架構
|
||||
|
||||
**估時**: 9-12h (四階段實作)
|
||||
|
||||
---
|
||||
|
||||
### ✅ 2026-03-26 Telegram 告警鏈修復 + CoreDNS GitOps (Day 9 傍晚 18:45)
|
||||
|
||||
**問題**: Telegram 兩天無告警 + 簽核後內容消失
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
|------|-----|
|
||||
| **評估日期** | 2026-03-26 |
|
||||
| **評估者** | 首席架構師 |
|
||||
| **狀態** | 📋 評估中 |
|
||||
| **狀態** | ✅ 方案 A 已實施 |
|
||||
|
||||
---
|
||||
|
||||
@@ -212,8 +212,31 @@ result = await openclaw.analyze_alert(...)
|
||||
|
||||
## 8. 後續追蹤
|
||||
|
||||
- [ ] Ollama seed 測試完成
|
||||
- [ ] 統帥確認方案選擇
|
||||
- [ ] 方案 A 實施
|
||||
- [ ] 效果驗證
|
||||
- [x] Ollama seed 測試完成 (2026-03-26)
|
||||
- [x] 統帥確認方案選擇 (2026-03-26 批准方案 A)
|
||||
- [x] 方案 A 實施 (2026-03-26)
|
||||
- `test_model_regression.py`: 加入 `temperature: 0.0`, `seed: 42`
|
||||
- `test_prompt_validation.py`: 加入 `temperature: 0.0`, `seed: 42`
|
||||
- System Prompt 強化繁體中文指令
|
||||
- [x] CPU 模式超時調整 (2026-03-26 統帥批准)
|
||||
- 超時從 120 秒 → 300 秒
|
||||
- 計算基準: 0.45 tok/s × 300 tokens = 666 秒
|
||||
- [x] 繁體中文修復 (2026-03-26)
|
||||
- OpenClaw System Prompt v7.0 → v7.1
|
||||
- 加入 Language Requirement 章節
|
||||
- 明確禁止簡體字
|
||||
- [ ] 效果驗證 (待 CI 執行)
|
||||
- [ ] 決定是否需要方案 B/C
|
||||
|
||||
## 9. GPU 診斷結果 (2026-03-26)
|
||||
|
||||
**192.168.0.188 無 GPU 硬體**
|
||||
|
||||
| 檢查項目 | 結果 |
|
||||
|---------|------|
|
||||
| `lspci \| grep nvidia` | 無輸出 |
|
||||
| NVIDIA Driver | 未安裝 |
|
||||
| NVIDIA Libs | 未找到 |
|
||||
|
||||
**結論**: 此伺服器為純 CPU 機器,Ollama 以 CPU 模式運行 (0.45 tok/s)。
|
||||
需評估是否遷移到有 GPU 的主機。
|
||||
|
||||
@@ -226,7 +226,7 @@ OLLAMA_MAX_RETRIES: int = 2
|
||||
Ollama 需要充分的上下文才能做出精準判斷。Prompt 結構如下:
|
||||
|
||||
```python
|
||||
RCA_SYSTEM_PROMPT = """You are ClawBot, an AI-powered Kubernetes operations assistant for AWOOOI platform.
|
||||
RCA_SYSTEM_PROMPT = """You are OpenClaw, an AI-powered Kubernetes operations assistant for AWOOOI platform.
|
||||
|
||||
Your role is to:
|
||||
1. Analyze infrastructure alerts and determine root cause
|
||||
@@ -463,7 +463,7 @@ async def create_approval_from_rca(
|
||||
risk_level=RiskLevel(rca.risk_level),
|
||||
blast_radius=rca.blast_radius,
|
||||
dry_run_checks=rca.dry_run_checks,
|
||||
requested_by=f"ClawBot (via {alert.source})",
|
||||
requested_by=f"OpenClaw (via {alert.source})",
|
||||
metadata={
|
||||
"alert_type": alert.alert_type,
|
||||
"source": alert.source,
|
||||
@@ -514,7 +514,7 @@ async def notify_new_approval(approval: ApprovalRequest) -> None:
|
||||
### 5.4 前端訂閱機制
|
||||
|
||||
```typescript
|
||||
// 前端 SSE 訂閱 (已實作於 ClawBotStateMachine)
|
||||
// 前端 SSE 訂閱 (已實作於 OpenClawStateMachine)
|
||||
useEffect(() => {
|
||||
const eventSource = new EventSource(`${apiBaseUrl}/api/v1/dashboard/stream`);
|
||||
|
||||
|
||||
@@ -23,8 +23,8 @@
|
||||
| **Gitea** | `192.168.0.110:3001` | (舊) 代碼託管 |
|
||||
| **Kali Scanner** | `192.168.0.112:8080` | 安全掃描 |
|
||||
| **Ollama** | `192.168.0.188:11434` | LLM 推理 |
|
||||
| **ClawBot Legacy** | `192.168.0.188:8088` | (舊) AI Agent |
|
||||
| **ClawBot AWOOOI** | `192.168.0.188:8089` | (新) AI Agent |
|
||||
| **OpenClaw Legacy** | `192.168.0.188:8088` | (舊) AI Agent |
|
||||
| **OpenClaw AWOOOI** | `192.168.0.188:8089` | (新) AI Agent |
|
||||
| **Redis Stack** | `192.168.0.188:6380` | 快取/向量 |
|
||||
| **SigNoz** | `192.168.0.188:3301` | 觀測平台 |
|
||||
| **K8s API** | `192.168.0.120:6443` | K3s 叢集 |
|
||||
@@ -110,7 +110,7 @@ JWT_ALGORITHM=HS256
|
||||
|
||||
# AI 服務
|
||||
OLLAMA_URL=http://192.168.0.188:11434
|
||||
CLAWBOT_URL=http://192.168.0.188:8089
|
||||
OPENCLAW_URL=http://192.168.0.188:8088
|
||||
|
||||
# 外部服務
|
||||
HARBOR_URL=http://192.168.0.110:5000
|
||||
|
||||
@@ -41,7 +41,7 @@ APPROVAL_RESPONSE=$(curl -s -X POST "$API_URL/api/v1/approvals" \
|
||||
{"name": "Syntax Check", "passed": true},
|
||||
{"name": "Backup Available", "passed": false, "message": "No recent backup!"}
|
||||
],
|
||||
"requested_by": "ClawBot"
|
||||
"requested_by": "OpenClaw"
|
||||
}')
|
||||
|
||||
APPROVAL_ID=$(echo "$APPROVAL_RESPONSE" | jq -r '.id')
|
||||
|
||||
Reference in New Issue
Block a user