feat(phase6-9): Complete modular architecture and Agent Teams

Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
OG T
2026-03-23 18:40:36 +08:00
parent 6eccb45757
commit 7478dc0254
169 changed files with 24613 additions and 247 deletions

View File

@@ -0,0 +1,72 @@
# Automation 01: 開發循環自動化
> **觸發**: 修改 `apps/` 或 `packages/` 下的程式碼
> **目標**: 自動執行檢查,減少手動 Allow
---
## ✅ 自動執行 (Tier 0/1) - 無需確認
### 前端修改後
```bash
# TypeScript 靜態檢查
cd apps/web && pnpm exec tsc --noEmit
# 如有疑慮,執行 build
cd apps/web && pnpm build
```
### 後端修改後
```bash
# Python 語法檢查
cd apps/api && python -c "from src.main import app; print('✅ Import OK')"
# 或完整檢查
cd apps/api && python -m py_compile src/**/*.py
```
### 完成任務後
- 自動更新相關 Memory MD
- 自動更新 LOGBOOK.md (重大里程碑)
- 自動回報驗證結果
---
## ⚡ 快速確認 (Tier 2) - 一次 Y 即可
| 操作 | 說明 |
|------|------|
| `git add` + `git commit` | 提交變更 |
| `pnpm build` (耗時) | 完整建置 |
| `docker-compose up` | 本地測試 |
---
## 🔐 必須詳細確認 (Tier 3)
| 操作 | 說明 |
|------|------|
| `git push` | 推送到遠端 |
| `kubectl apply` | 部署到 K8s |
| 修改 `.env` / secrets | 機密操作 |
---
## 自動化流程圖
```
修改程式碼
[自動] 靜態檢查 (tsc/py_compile)
[自動] 更新 Memory
[確認] git commit?
[確認] git push?
[確認] kubectl apply?
```

View File

@@ -0,0 +1,56 @@
# Automation 02: 部署驗證自動化
> **觸發**: 部署完成後
> **目標**: 自動執行全鏈路驗證
---
## ✅ 自動執行 (Tier 0) - 無需確認
### 部署後立即執行
```bash
# 1. K8s Rollout 狀態
kubectl rollout status deployment/awoooi-api -n awoooi-prod
kubectl rollout status deployment/awoooi-web -n awoooi-prod
# 2. Pod 狀態
kubectl get pods -n awoooi-prod
# 3. API Health Check
curl -s https://awoooi.wooo.work/api/v1/health | jq '.'
# 4. 前端可達性
curl -s -o /dev/null -w "%{http_code}" https://awoooi.wooo.work/
```
### 自動產出驗證報告
```markdown
## 部署驗證報告
| 項目 | 狀態 | 證據 |
|------|------|------|
| API Rollout | ✅/❌ | ... |
| Web Rollout | ✅/❌ | ... |
| API Health | ✅/❌ | HTTP xxx |
| Web 可達 | ✅/❌ | HTTP xxx |
```
---
## ⚡ 異常時自動通報
如果任一項失敗:
1. 立即通報統帥
2. 建議回滾指令
3. 記錄到 RCA Memory
---
## 🔐 回滾需確認 (Tier 3)
```bash
# 回滾需要統帥授權
kubectl rollout undo deployment/awoooi-api -n awoooi-prod
```

View File

@@ -0,0 +1,63 @@
# Automation 03: Memory 同步自動化
> **觸發**: 任務完成時
> **目標**: 自動更新 Memory確保跨 Session 連續性
---
## ✅ 自動執行 (Tier 1) - 無需確認
### 任務完成後自動執行
1. **判斷是否需要更新 Memory**
- 新功能完成 → 更新 project_* MD
- 修復 Bug → 更新 RCA MD
- 學到教訓 → 更新 feedback_* MD
2. **更新對應 Memory 檔案**
```bash
# 自動寫入 Memory
Write(~/.claude/projects/*/memory/*.md)
```
3. **更新 MEMORY.md 索引**
```bash
# 確保索引同步
Edit(~/.claude/projects/*/memory/MEMORY.md)
```
4. **更新 LOGBOOK.md (重大里程碑)**
```bash
# 追加進度紀錄
Edit(docs/LOGBOOK.md)
```
---
## 自動判斷 Memory 類型
| 情境 | Memory 類型 | 檔案命名 |
|------|-------------|----------|
| 用戶回饋/糾正 | feedback | `feedback_*.md` |
| 功能完成 | project | `project_phase*.md` |
| 生產事故 | project | `project_*_rca_*.md` |
| 新增參考資料 | reference | `reference_*.md` |
| 用戶資訊 | user | `user_*.md` |
---
## Session 結束前檢查清單
```
□ 相關 Memory MD 已更新?
□ MEMORY.md 索引已同步?
□ LOGBOOK.md 已記錄?
□ 下一步已標記?
```
---
## 禁止自動化
- 刪除現有 Memory 檔案
- 修改他人建立的 Memory (需確認)

View File

@@ -0,0 +1,76 @@
# Automation 04: Memory 審計
> **觸發**: 每週一次 / 統帥要求時 / Session 啟動時
> **目標**: 確保 Memory 不過期、不衝突、不幻覺
---
## ✅ 自動執行 (Tier 0)
### 審計清單
#### 1. Project Memory 驗證
```bash
# 檢查 Phase 狀態是否與實際一致
kubectl get pods -n awoooi-prod
curl -s https://awoooi.wooo.work/api/v1/health | jq '.status'
```
#### 2. Reference Memory 驗證
```bash
# 檢查 IP/Port 是否正確
ping -c 1 192.168.0.188
curl -s http://192.168.0.188:8089/health
```
#### 3. Feedback Memory 檢查
- 規則是否仍然適用?
- 是否與新規則衝突?
- 是否已被更新的規則取代?
---
## 過期標記格式
如果 Memory 過期,在 frontmatter 加入:
```yaml
---
name: xxx
status: DEPRECATED
deprecated_date: 2026-XX-XX
deprecated_reason: 已被 yyy 取代
---
```
---
## 審計報告格式
```markdown
## Memory 審計報告 (YYYY-MM-DD)
### 驗證通過
- [x] project_phases.md - Phase 狀態一致
- [x] reference_four_hosts.md - IP 正確
### 需要更新
- [ ] project_xxx.md - 狀態已變更
### 已過期
- [x] feedback_xxx.md - 標記 DEPRECATED
```
---
## 審計頻率
| 類型 | 頻率 |
|------|------|
| Project | 每日 |
| Reference | 每週 |
| Feedback | 每月 |
| User | 按需 |

View File

@@ -211,9 +211,77 @@ grep -rn "old_function_name" apps/api/src/
---
## 🧱 leWOOOgo Memory Providers (Phase 6.4d - 2026-03-23)
> **新架構**: 雙層記憶體 (Working + Episodic)
### 記憶層級
| 層級 | Provider | 儲存 | TTL |
|------|----------|------|-----|
| Working Memory | `RedisMemoryProvider` | Redis | 7 天 |
| Episodic Memory | `PgMemoryProvider` | PostgreSQL | 永久 |
| 雙層整合 | `DualMemoryProvider` | 兩者同步 | - |
### 使用方式
```python
from lewooogo_data.providers import (
RedisMemoryProvider,
PgMemoryProvider,
DualMemoryProvider,
init_redis_pool,
init_pg_engine,
)
# 初始化連線池 (啟動時執行)
await init_redis_pool()
await init_pg_engine()
# 建立 Provider
from your_models import Incident
# 單層使用
redis_memory = RedisMemoryProvider(Incident, key_prefix="incidents")
pg_memory = PgMemoryProvider(Incident)
# 雙層使用 (推薦)
dual_memory = DualMemoryProvider(Incident, key_prefix="incidents")
# CRUD 操作
await dual_memory.save("inc-001", incident)
data = await dual_memory.load("inc-001") # Working 優先Episodic 備援
```
### 鐵律
| 規則 | 說明 |
|------|------|
| TTL 必須設定 | Redis 所有 key 必須有 TTL禁止無限累積 |
| 雙層同步 | 寫入時 Working + Episodic 同步 |
| 優雅降級 | Redis 斷線不影響主流程 |
| 禁止直接存取 | 所有記憶體操作必須透過 Provider |
### 檔案位置
```
packages/lewooogo-data/src/lewooogo_data/
├── interfaces/
│ └── memory_provider.py # IMemoryProvider, IDualMemoryProvider
└── providers/
├── redis_memory.py # RedisMemoryProvider
├── pg_memory.py # PgMemoryProvider
└── dual_memory.py # DualMemoryProvider
```
---
## 參考文檔
- `apps/api/src/core/config.py`: 設定中心
- `apps/api/src/main.py`: FastAPI 應用入口
- `packages/lewooogo-data/`: 記憶體 Provider 積木
- `packages/lewooogo-brain/`: AI 引擎積木
- ADR-005: BFF 閘道架構
- ADR-006: AI 備援策略
- ADR-008: Python 模組化獨立積木架構

View File

@@ -0,0 +1,17 @@
---
description: 基礎設施與主機管理員 (DevOps & Infrastructure)
---
# awoooi-devops-commander
## 管轄範圍
Docker, K3s, Nginx, Host Networking
## 核心約束 (AWOOOI 憲法)
1. **防止腦分裂 (Split Brain Prevention)**:
- 牢記四主機架構:`.110` (金庫)、`.112` (安全)、`.120/.121` (K3s 資源)、`.188` (唯一大腦,包含 Nginx/Ollama/ClawBot/SigNoz)。
- 嚴禁在 `.188` 以外的主機部署會做決策的 AI 模型。
2. **授權分級 (Authorization Tiers)**:
- **Tier 1 (直接執行)**: 查詢日誌 (`docker logs`)、編譯程式碼。可以完全自主執行無須過問。
- **Tier 2 (請求一次授權)**: 重啟常規容器 `docker restart`。詢問統帥一次後即可連續執行相關修復。
- **Tier 3 (嚴格簽核)**: 生產環境 `kubectl apply` 或丟棄資料庫。必須提供風險報告並等待人類二次簽核。

View File

@@ -0,0 +1,21 @@
---
description: 前端開發與 Nothing.tech 美學規範 (Frontend Development & Aesthetics)
---
# awoooi-frontend-aesthetics
## 管轄範圍
`apps/web` (Next.js 14, Zustand, Tailwind)
## 核心約束 (AWOOOI 憲法)
1. **Nothing.tech 純白工業風**: 絕對禁止使用深色漸層或遮蔽數據的色塊。必須使用 `bg-white/70 backdrop-blur-[20px]` (白玻璃)、`VT323` 點陣字體,以及 `claw-blue` (`#4A90D9`) 作為 AI 提示色。
2. **狀態與串流防護**: 必須使用 Zustand 處理 SSE (Server-Sent Events) 的 Buffer 與 Exponential Backoff。
3. **禁止虛假數據**: 絕對禁止使用 Mock Data 隱瞞 API 錯誤,必須直接渲染 404/500。
## 強制交付驗證 (Pre-Commit Verification)
當你修改 `apps/web/` 下的任何程式碼後,**必須**自主執行以下命令以確認沒有 Hydration Error 或是宣告錯誤。
// turbo-all
```bash
cd apps/web && pnpm exec tsc --noEmit
cd apps/web && pnpm run build
```

View File

@@ -0,0 +1,17 @@
---
description: Turborepo 架構協調與依賴管理 (Monorepo Orchestration)
---
# awoooi-monorepo-master
## 管轄範圍
`packages/*`, Workspace dependencies, Git
## 核心約束 (AWOOOI 憲法)
1. **禁止遺毒 (No Legacy Import)**: 絕對禁止在現有模組 import 舊專案 `wooo-aiops` 的程式碼,若需資料則一律走獨立的 REST API。
2. **唯一映像標籤**: 嚴禁在 K8s YAML 中寫死 `latest` 標籤,必須要求 CI 動態注入 `{sha}-{run_id}` 標籤防止 Ghost Rollback。
## 自動化驗收
// turbo-all
```bash
pnpm install
```

View File

@@ -0,0 +1,19 @@
---
description: 全鏈路驗收與無人測試員 (SRE QA & Verification)
---
# awoooi-sre-qa
## 管轄範圍
Playwright, API Testing
## 核心約束 (AWOOOI 憲法)
1. **禁止人工 QA (No Human QA Protocol)**: 絕對禁止對統帥說出「請按 F12 查看 Console」或「請幫我刷新畫面看長怎樣」。
2. **強制雙端驗證報告**: 任務結束時,必須產出 Markdown 表格,證明 API Health、SSE Stream、與 Frontend 無報錯皆為綠燈。
## 瀏覽器自動化測試
若懷疑前端渲染異常,你必須自主執行測試腳本,抓出紅字 Error 再自行修復。
// turbo
```bash
docker logs awoooi-web --tail 20
```

View File

@@ -0,0 +1,19 @@
---
description: 後端引擎與 API 開發規範 (Backend Core & API Development)
---
# lewooogo-backend-core
## 管轄範圍
`apps/api` (FastAPI, Python 3.11, asyncpg)
## 核心約束 (AWOOOI 憲法)
1. **四大鐵律**: Async-First 全域非同步、CORS 嚴格白名單、Pydantic 強型別、`structlog` 結構化日誌(禁止使用 print
2. **可觀測性強制注入**: 所有 API 必須包含 OpenTelemetry traces並將日誌打向唯一端點 `192.168.0.188:4317` (SigNoz)。
## 自動化驗收
修改後端程式碼完成後,請確保 Docker 容器運行中,並執行以下健康度掃描,若未出現 200 則必須繼續修復:
// turbo-all
```bash
curl -s http://localhost:8000/api/v1/health
```

View File

@@ -0,0 +1,12 @@
---
description: AI 認知覺醒與演算法防護 (AI Cognitive & Algorithms)
---
# openclaw-cognitive-expert
## 管轄範圍
`Incident Engine`, `GraphRAG`, `Multi-Sig` 模組
## 核心約束 (AWOOOI 憲法)
1. **大腦架構**: 負責維護 Working Memory (Redis Hash) 與 Episodic Memory (PostgreSQL) 的資料同步,以及透過 Redis Streams 實作 Event Bus 事件匯流排。
2. **資安防護 (TOCTOU)**: 當處理 Multi-Sig 簽核模組時,在任何執行動作前,必須強制調用 `dry_run` 來確認 K8s 狀態沒有被篡改。
3. **演算法維護**: 負責 BFS/DFS 演算法尋找 Blast Radius (爆炸半徑) 與 Root Cause (根本原因)。

223
.claude/settings.json Normal file
View File

@@ -0,0 +1,223 @@
{
"permissions": {
"allow": [
"Read(**)",
"Glob(**)",
"Grep(**)",
"Bash(curl *)",
"Bash(kubectl get *)",
"Bash(kubectl describe *)",
"Bash(kubectl logs *)",
"Bash(kubectl rollout status *)",
"Bash(docker ps *)",
"Bash(docker logs *)",
"Bash(ls *)",
"Bash(cat *)",
"Bash(head *)",
"Bash(tail *)",
"Bash(grep *)",
"Bash(find *)",
"Bash(pwd)",
"Bash(which *)",
"Bash(echo *)",
"Bash(git status *)",
"Bash(git log *)",
"Bash(git diff *)",
"Bash(git branch *)",
"Bash(git remote *)",
"Edit(**)",
"Write(apps/**)",
"Write(packages/**)",
"Write(docs/**)",
"Write(.agents/**)",
"Write(k8s/**)",
"Write(scripts/**)",
"Bash(pnpm *)",
"Bash(npm *)",
"Bash(npx *)",
"Bash(node *)",
"Bash(python *)",
"Bash(python3 *)",
"Bash(pip *)",
"Bash(cd *)",
"Bash(mkdir *)",
"Bash(touch *)",
"Bash(cp *)",
"Bash(mv *)",
"Bash(chmod *)",
"Bash(pytest *)",
"Bash(playwright *)",
"Bash(git add *)",
"Bash(git commit *)",
"Bash(git stash *)",
"Bash(ssh *)",
"Bash(scp *)",
"Bash(export KUBECONFIG=*)",
"Bash(git push:*)",
"Bash(claude --version)",
"Bash(git check-ignore:*)",
"WebSearch",
"Bash(claude plugin:*)",
"Bash(claude --channels)",
"Bash(claude --channels plugin:telegram@claude-plugins-official --help)",
"Bash(bash)",
"Bash(source ~/.zshrc)",
"Bash(~/.bun/bin/bun --version)",
"Bash(env)",
"Bash(claude upgrade:*)",
"Bash(/Users/ogt/.local/bin/claude --help)",
"Bash(CLAUDE_CODE_EXPERIMENTAL_CHANNELS=1 claude --help)",
"Bash(claude --channels plugin:telegram@claude-plugins-official --print \"hello\")",
"Bash(mkdir -p ~/.claude/channels/telegram)",
"Bash(~/.claude/channels/telegram/.env)",
"Bash(~/.bun/bin/bun run:*)",
"Bash(sudo ln:*)",
"Bash(ln -sf ~/.bun/bin/bun /opt/homebrew/bin/bun)",
"Bash(xargs python:*)",
"Bash(uv --version)",
"Bash(pip3 install:*)",
"Bash(pip3 show:*)",
"Bash(ruff *)",
"Bash(mypy *)",
"Bash(black *)",
"Bash(isort *)",
"Bash(timeout *)",
"Bash(wc *)",
"Bash(sort *)",
"Bash(uniq *)",
"Bash(awk *)",
"Bash(sed *)",
"Bash(tr *)",
"Bash(tee *)",
"Bash(xargs *)",
"Bash(test *)",
"Bash([ *)",
"Bash(true)",
"Bash(false)",
"Bash(date *)",
"Bash(sleep *)",
"Bash(kill *)",
"Bash(pkill *)",
"Bash(ps *)",
"Bash(top *)",
"Bash(htop *)",
"Bash(df *)",
"Bash(du *)",
"Bash(free *)",
"Bash(uname *)",
"Bash(hostname *)",
"Bash(whoami)",
"Bash(id *)",
"Bash(groups *)",
"Bash(stat *)",
"Bash(file *)",
"Bash(realpath *)",
"Bash(dirname *)",
"Bash(basename *)",
"Bash(type *)",
"Bash(command *)",
"Bash(hash *)",
"Bash(alias *)",
"Bash(set *)",
"Bash(unset *)",
"Bash(printenv *)",
"Bash(diff *)",
"Bash(cmp *)",
"Bash(comm *)",
"Bash(join *)",
"Bash(paste *)",
"Bash(cut *)",
"Bash(rev *)",
"Bash(nl *)",
"Bash(fmt *)",
"Bash(fold *)",
"Bash(pr *)",
"Bash(expand *)",
"Bash(unexpand *)",
"Bash(od *)",
"Bash(xxd *)",
"Bash(hexdump *)",
"Bash(strings *)",
"Bash(base64 *)",
"Bash(md5sum *)",
"Bash(sha256sum *)",
"Bash(jq *)",
"Bash(yq *)",
"Bash(gh *)",
"Bash(docker build *)",
"Bash(docker run *)",
"Bash(docker exec *)",
"Bash(docker compose *)",
"Bash(docker-compose *)",
"Bash(docker images *)",
"Bash(docker inspect *)",
"Bash(docker network *)",
"Bash(docker volume *)",
"Bash(kubectl apply *)",
"Bash(kubectl create *)",
"Bash(kubectl exec *)",
"Bash(kubectl port-forward *)",
"Bash(kubectl config *)",
"Bash(helm *)",
"Bash(terraform *)",
"Bash(ansible *)",
"Bash(bun *)",
"Bash(deno *)",
"Bash(cargo *)",
"Bash(rustc *)",
"Bash(go *)",
"Bash(java *)",
"Bash(javac *)",
"Bash(gradle *)",
"Bash(mvn *)",
"Bash(make *)",
"Bash(cmake *)",
"Bash(ninja *)",
"Bash(uv *)",
"Bash(poetry *)",
"Bash(pipx *)",
"Bash(virtualenv *)",
"Bash(venv *)",
"Bash(conda *)",
"Bash(brew *)",
"Bash(apt *)",
"Bash(apt-get *)",
"Bash(yum *)",
"Bash(dnf *)",
"Bash(pacman *)",
"Bash(snap *)",
"Bash(flatpak *)",
"Bash(systemctl status *)",
"Bash(journalctl *)",
"Bash(service * status)",
"Bash(nc *)",
"Bash(netstat *)",
"Bash(ss *)",
"Bash(lsof *)",
"Bash(nmap *)",
"Bash(dig *)",
"Bash(nslookup *)",
"Bash(host *)",
"Bash(ping *)",
"Bash(traceroute *)",
"Bash(mtr *)",
"Bash(wget *)",
"Bash(http *)",
"Bash(httpie *)",
"Bash(hadolint apps/api/Dockerfile)",
"Bash(docker info:*)",
"Bash(kubectl cluster-info:*)",
"Read(//var/run/**)",
"Bash(open -a Docker)",
"Bash(git rm:*)",
"Bash(git reset:*)"
],
"deny": [
"Bash(rm -rf *)",
"Bash(git push --force *)",
"Bash(git reset --hard *)",
"Bash(kubectl delete *)",
"Bash(docker rm -f *)"
]
}
}

View File

@@ -0,0 +1,827 @@
{
"permissions": {
"allow": [
"Bash(pnpm install:*)",
"Bash(npm --version)",
"Bash(npm install:*)",
"Bash(pnpm --version)",
"Bash(pnpm dev:*)",
"Bash(pnpm add:*)",
"Bash(ls -la /Users/ogt/awoooi/apps/web/next.config.*)",
"Bash(pkill -f \"next dev\")",
"Bash(curl -sL http://localhost:3000/zh-TW)",
"Bash(curl -s http://localhost:3000/zh-TW)",
"Bash(pnpm --filter web build)",
"Bash(curl -s http://localhost:3001/zh-TW)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" http://localhost:3000/zh-TW)",
"Bash(kubectl apply:*)",
"Bash(chmod +x /Users/ogt/awoooi/deploy-infra.sh)",
"Bash(./deploy-infra.sh)",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"mkdir -p /tmp/awoooi-k8s\")",
"Bash(sshpass -p '0936223270' scp -o StrictHostKeyChecking=no /Users/ogt/awoooi/k8s/awoooi-prod/01-namespace-quota.yaml /Users/ogt/awoooi/k8s/awoooi-prod/02-network-policy.yaml /Users/ogt/awoooi/k8s/awoooi-prod/04-configmap.yaml wooo@192.168.0.120:/tmp/awoooi-k8s/)",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"sudo kubectl apply -f /tmp/awoooi-k8s/01-namespace-quota.yaml\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl apply -f /tmp/awoooi-k8s/01-namespace-quota.yaml 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl apply -f /tmp/awoooi-k8s/02-network-policy.yaml 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl apply -f /tmp/awoooi-k8s/04-configmap.yaml 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get ns awoooi-prod -o wide 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get networkpolicy -n awoooi-prod 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get resourcequota,limitrange,configmap -n awoooi-prod 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"rm -rf /tmp/awoooi-k8s\")",
"Bash(PYTHONPATH=. python -c \"from src.main import app; print\\(''Import OK''\\)\")",
"Bash(curl -s http://localhost:8000/api/v1/health/ready)",
"Bash(curl -s http://localhost:8000/api/v1/health/live)",
"Bash(curl -s http://localhost:8000/)",
"Bash(pkill -f \"uvicorn src.main:app\")",
"Bash(pkill -f \"node.*next\")",
"Bash(curl -s http://localhost:8000/api/v1/health)",
"Read(//Users/ogt/awoooi/apps/api/**)",
"Bash(pnpm typecheck:*)",
"Read(//Users/ogt/awoooi/apps/web/**)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/dashboard/demo/spike/clear)",
"Read(//Users/ogt/awoooi/=== 驗證英文頁面 \\(/en/**)",
"Bash(jq \".devDependencies | keys | map\\(select\\(startswith\\(\"\"@playwright\"\"\\) or startswith\\(\"\"playwright\"\"\\)\\)\\)\")",
"Bash(npx playwright:*)",
"Bash(curl -s http://localhost:3000/zh-TW/demo -o /dev/null -w \"Frontend: HTTP %{http_code}\\\\n\")",
"Bash(__NEW_LINE_ef548029029cdfac__ echo:*)",
"Bash(curl -s http://localhost:8000/api/v1/health -o /dev/null -w \"Backend: HTTP %{http_code}\\\\n\")",
"Bash(echo '=== 已產出的截圖 ===' find /Users/ogt/awoooi/apps/web/test-results -name *.png)",
"Bash(echo '=== Playwright E2E 測試結果 ===' echo echo '📸 截圖證據 \\(test-results/screenshots/\\):' ls -la /Users/ogt/awoooi/apps/web/test-results/screenshots/ __NEW_LINE_db74e5f56e34db17__ echo echo '🎬 錄影證據 \\(.webm\\):' find /Users/ogt/awoooi/apps/web/test-results -name *.webm -exec ls -la {})",
"Bash(__NEW_LINE_db74e5f56e34db17__ echo:*)",
"Bash(source .venv/bin/activate)",
"Bash(python scripts/demo_multisig.py)",
"Bash(python -c \"from src.api.v1.approvals import router; print\\(''✅ Approvals router loaded:'', len\\(router.routes\\), ''routes''\\)\")",
"Bash(npx tsc:*)",
"Bash(chmod +x /Users/ogt/awoooi/scripts/demo-multisig-flow.sh)",
"Bash(python -c \"from src.main import app; print\\(''✅ API loads successfully''\\)\")",
"Bash(jq)",
"Bash(/Users/ogt/awoooi/scripts/demo-multisig-flow.sh)",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/approvals\" -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s http://localhost:8000/api/v1/openapi.json)",
"Bash(python -c \":*)",
"Bash(curl -s http://localhost:3000 -o /dev/null -w \"%{http_code}\")",
"Bash(lsof -ti:3000,3001,8000)",
"Bash(curl -s http://localhost:8000/health)",
"Bash(curl -s http://localhost:8000/api/v1/approvals/pending)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" http://localhost:3001/zh-TW/demo)",
"Bash(ls -la test-results/*.png)",
"Bash(cp test-results/cpo102-*.png /Users/ogt/awoooi/docs/screenshots/)",
"Bash(ssh ogt@192.168.0.120 'cat /etc/rancher/k3s/k3s.yaml')",
"Bash(python -c \"from src.main import app; print\\(''✅ main.py imports OK''\\)\")",
"Bash(curl -s http://localhost:8000/api/v1/approvals/k8s-test)",
"Bash(sqlite3 awoooi.db \".tables\")",
"Bash(sshpass -p 0936223270 ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 'sudo cat /etc/rancher/k3s/k3s.yaml')",
"Bash(kubectl --kubeconfig=/Users/ogt/awoooi/apps/api/k3s-prod.yaml get deployments -n awoooi-prod)",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get deployments -n awoooi-prod 2>/dev/null\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get deployments -A 2>/dev/null\")",
"Bash(curl -s -X POST http://localhost:8000/api/v1/approvals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(APPROVAL_ID=\"b58a0d86-fa4e-43ca-881c-02e978cd7943\")",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/approvals/$APPROVAL_ID/sign\" -H \"Content-Type: application/json\" -d '{:*)",
"Bash(sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT operation_type, target_resource, namespace, success, dry_run_passed, dry_run_message, error_message, execution_duration_ms, created_at FROM audit_logs ORDER BY created_at DESC LIMIT 1;\" -header -column)",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S kubectl get pods -n monitoring -l app=grafana 2>/dev/null\")",
"Bash(curl -s http://192.168.0.188:11434/api/tags)",
"Bash(python -c \"from src.main import app; print\\(''✅ Compile OK''\\)\")",
"Bash(curl -s http://localhost:8000/api/v1/ai/status)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/ai/analyze-and-propose -H \"Content-Type: application/json\" -d '{}')",
"Bash(curl -s -X POST http://192.168.0.188:11434/api/generate -H \"Content-Type: application/json\" -d '{\"\"\"\"model\"\"\"\":\"\"\"\"llama3.2:1b\"\"\"\",\"\"\"\"prompt\"\"\"\":\"\"\"\"Output only JSON: {\\\\\"\"\"\"action\\\\\"\"\"\":\\\\\"\"\"\"test\\\\\"\"\"\"}\"\"\"\",\"\"\"\"stream\"\"\"\":false,\"\"\"\"format\"\"\"\":\"\"\"\"json\"\"\"\"}' --max-time 30)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/ai/analyze-and-propose -H \"Content-Type: application/json\" -d '{}' --max-time 60)",
"Bash(PROMPT='你是 ClawBot AI。分析以下監控數據輸出純 JSON無其他文字。:*)",
"Bash(curl -s -X POST http://192.168.0.188:11434/api/generate -H \"Content-Type: application/json\" -d \"{\"\"model\"\":\"\"llama3.2:1b\"\",\"\"prompt\"\":\"\"$PROMPT\"\",\"\"stream\"\":false,\"\"format\"\":\"\"json\"\",\"\"options\"\":{\"\"num_predict\"\":256,\"\"temperature\"\":0.1}}\" --max-time 60)",
"Bash(curl -s -X POST http://192.168.0.188:11434/api/generate -H \"Content-Type: application/json\" -d '{\"\"\"\"model\"\"\"\":\"\"\"\"llama3.2:1b\"\"\"\",\"\"\"\"prompt\"\"\"\":\"\"\"\"Harbor service returning 404. Output JSON: {\\\\\"\"\"\"suggested_action\\\\\"\"\"\":\\\\\"\"\"\"RESTART_DEPLOYMENT\\\\\"\"\"\",\\\\\"\"\"\"target_resource\\\\\"\"\"\":\\\\\"\"\"\"harbor\\\\\"\"\"\",\\\\\"\"\"\"namespace\\\\\"\"\"\":\\\\\"\"\"\"default\\\\\"\"\"\",\\\\\"\"\"\"risk_level\\\\\"\"\"\":\\\\\"\"\"\"medium\\\\\"\"\"\",\\\\\"\"\"\"reasoning\\\\\"\"\"\":\\\\\"\"\"\"Service down\\\\\"\"\"\",\\\\\"\"\"\"confidence\\\\\"\"\"\":0.8,\\\\\"\"\"\"affected_services\\\\\"\"\"\":[]}\"\"\"\",\"\"\"\"stream\"\"\"\":false,\"\"\"\"format\"\"\"\":\"\"\"\"json\"\"\"\",\"\"\"\"options\"\"\"\":{\"\"\"\"num_predict\"\"\"\":128,\"\"\"\"temperature\"\"\"\":0.1}}' --max-time 30)",
"Bash(curl -v -X POST http://192.168.0.188:11434/api/generate -H \"Content-Type: application/json\" -d '{\"\"\"\"model\"\"\"\":\"\"\"\"llama3.2:1b\"\"\"\",\"\"\"\"prompt\"\"\"\":\"\"\"\"Say hello\"\"\"\",\"\"\"\"stream\"\"\"\":false}' --max-time 30)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/ai/analyze-and-propose -H \"Content-Type: application/json\" -d '{}' --max-time 120)",
"Bash(curl -s http://localhost:8000/api/v1/ai/analyze-and-propose -X POST -H \"Content-Type: application/json\")",
"Bash(curl -s http://localhost:8000/api/v1/dashboard)",
"Bash(ls -la ~/Downloads/image*.png)",
"Bash(ls -la ~/Desktop/image*.png)",
"Bash(ls -la /Users/ogt/awoooi/apps/web/public/*.png)",
"WebFetch(domain:openclaw.ai)",
"Bash(ls -la /Users/ogt/Downloads/*.png)",
"Bash(ls -la /Users/ogt/.gemini/antigravity/brain/*/image*.png)",
"Bash(ls -lat /Users/ogt/Downloads/*.png)",
"Bash(curl -s http://localhost:8000/api/v1/approvals)",
"Bash(curl -s -X GET http://localhost:8000/api/v1/approvals/)",
"Bash(APPROVAL_ID=\"4989729e-e518-4e7e-8dff-5c3269e0c82b\")",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/approvals/$APPROVAL_ID/sign\" -H \"Content-Type: application/json\" -d '{\"\"\"\"signer_id\"\"\"\": \"\"\"\"ciso-001\"\"\"\", \"\"\"\"signer_name\"\"\"\": \"\"\"\"Demo CISO\"\"\"\", \"\"\"\"comment\"\"\"\": \"\"\"\"資安確認,核准執行\"\"\"\"}')",
"Bash(curl -s http://localhost:8000/api/v1/webhooks/health)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/webhooks/alerts -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s http://localhost:3000)",
"Bash(ls -la apps/web/test-results/*.png)",
"Bash(curl -s http://localhost:3000/zh-TW/demo)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" http://localhost:3333/zh-TW/demo)",
"Bash(curl -s http://localhost:8001/api/v1/approvals/pending)",
"Bash(curl -s -X POST http://localhost:8001/api/v1/approvals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s http://localhost:8001/openapi.json)",
"Bash(curl -s http://localhost:8001/docs)",
"Bash(curl -s http://localhost:8001/api/v1/webhooks/grafana -X OPTIONS)",
"Bash(pnpm run:*)",
"Bash(node scripts/screenshot-rbac.mjs)",
"Bash(pnpm exec:*)",
"Bash(curl -s http://localhost:3333 -o /dev/null -w \"%{http_code}\")",
"Bash(curl -s http://localhost:3333/zh-TW/demo -o /dev/null -w \"%{http_code}\")",
"Bash(python3 -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f''''Count: {d[count]}''''\\); [print\\(f''''- {a[id][:8]}... risk={a[risk_level]}''''\\) for a in d[''''approvals''''][:3]]\")",
"Bash(curl -s http://localhost:3000/zh-TW/demo -o /dev/null -w \"%{http_code}\")",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f'''' Connected: {d[\"\"success\"\"]}''''\\); print\\(f'''' Namespaces: {d[\"\"namespaces\"\"][:3]}...''''\\)\" __NEW_LINE_57ae1c1c812968e7__ echo \"\" echo \"3. 資料庫持久化:\" sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT COUNT\\(*\\) as approvals FROM approval_records;\" sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT COUNT\\(*\\) as timeline FROM timeline_events;\" sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT COUNT\\(*\\) as audits FROM audit_logs;\")",
"Bash(head -2 __NEW_LINE_9bf9481fbdf30d4e__ echo \"\" echo \"2. 告警收斂跳過 LLM 日誌 \\(應該有 4 次\\):\" grep -c \"alert_converged_skip_llm\" /tmp/api-server.log)",
"Bash(python -m json.tool)",
"Bash(__NEW_LINE_7463bff94cecc20f__ echo:*)",
"Bash(__NEW_LINE_13846c8488c5fa9a__ echo:*)",
"Bash(__NEW_LINE_13846c8488c5fa9a__ ls:*)",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f'''' Status: {d[\"\"status\"\"]}''''\\)\" __NEW_LINE_32366ca1bb050259__ echo \"\" echo \"2. 待簽核記錄 \\(含 hit_count\\):\" curl -s http://localhost:8000/api/v1/approvals/pending)",
"Read(//Users/ogt/awoooi/**)",
"Bash(curl -s http://localhost:8000/api/v1/timeline/events?limit=10)",
"Bash(curl -s http://localhost:8000/api/v1/timeline/events?limit=5)",
"Bash(ls -la /Users/ogt/awoooi/apps/api/*.txt /Users/ogt/awoooi/apps/api/*.toml)",
"Bash(ls -la /Users/ogt/awoooi/docker-compose*.yml)",
"Bash(ls /Users/ogt/awoooi/k8s/awoooi-prod/*rbac* /Users/ogt/awoooi/k8s/awoooi-prod/*service-account*)",
"Bash(kubectl kustomize:*)",
"Bash(docker compose:*)",
"Bash(docker info:*)",
"Bash(python3 -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(''''API Status:'''', d.get\\(''''status'''', ''''unknown''''\\)\\)\")",
"Bash(pkill -9 -f uvicorn)",
"Bash(lsof -ti:8000)",
"Bash(open -a Docker)",
"Bash(docker stop:*)",
"Bash(lsof -ti:3000)",
"Bash(docker start:*)",
"Bash(docker ps:*)",
"Bash(curl -s http://localhost:3000 -o /dev/null -w 'HTTP Status: %{http_code}\\\\n')",
"Bash(curl -I http://localhost:8000/api/v1/dashboard/stream)",
"Bash(curl -s http://localhost:8000/openapi.json)",
"Bash(curl -s http://localhost:8000/api/v1/dashboard/stream --max-time 3 -w \"\\\\n--- HTTP Status: %{http_code} ---\\\\n\")",
"Bash(curl -s http://localhost:8000/api/v1/dashboard/stream --max-time 3)",
"Bash(curl -s http://localhost:3000/zh-TW -o /dev/null -w \"HTTP Status: %{http_code}\\\\n\")",
"Bash(curl -s -D - http://localhost:8000/api/v1/dashboard/stream --max-time 2)",
"Bash(chmod +x /Users/ogt/awoooi/scripts/deploy-infra.sh)",
"Bash(./scripts/deploy-infra.sh)",
"Bash(pnpm --filter @awoooi/web build)",
"Bash(timeout 10 env MOCK_MODE=true OTEL_ENABLED=false uvicorn src.main:app --host 0.0.0.0 --port 8099)",
"Bash(timeout 8 pnpm --filter @awoooi/web dev)",
"Bash(git diff:*)",
"Bash(curl -s -I http://localhost:8000/api/v1/dashboard/stream)",
"Bash(timeout 3 curl -s -N http://localhost:8000/api/v1/dashboard/stream)",
"Bash(grep -n \"NEXT_PUBLIC\\\\|API_URL\\\\|localhost\" /Users/ogt/awoooi/apps/web/.env*)",
"Bash(timeout 2 curl -s -D - -N http://localhost:8000/api/v1/dashboard/stream)",
"Bash(curl -s http://localhost:3000/)",
"Bash(python -m py_compile scripts/fire_test_alert.py)",
"Bash(python -m scripts.fire_test_alert --help)",
"Bash(python -m scripts.fire_test_alert)",
"Bash(python -m scripts.fire_test_alert --type k8s_pod_crash)",
"Bash(timeout 3 curl -s -N -H \"Origin: http://localhost:3000\" http://localhost:8000/api/v1/dashboard/stream)",
"Bash(python -m scripts.fire_test_alert --type disk_full)",
"Bash(docker restart:*)",
"Bash(curl -s -w \"\\\\nHTTP_CODE: %{http_code}\\\\n\" http://localhost:3000)",
"Bash(docker exec:*)",
"Bash(docker rmi:*)",
"Bash(timeout 5 curl -s -N http://localhost:8000/api/v1/dashboard/stream)",
"Bash(curl -s http://localhost:3000 -w \"\\\\nHTTP: %{http_code}\\\\n\")",
"Bash(timeout 120 docker logs awoooi-api -f --since 1s)",
"Bash(curl -s -I -H \"Origin: http://localhost:3000\" http://localhost:8000/api/v1/dashboard/stream)",
"Bash(curl -s -X OPTIONS -H \"Origin: http://localhost:3000\" -H \"Access-Control-Request-Method: GET\" http://localhost:8000/api/v1/dashboard/stream -I)",
"Bash(node /Users/ogt/awoooi/scripts/verify-sse.js)",
"Bash(python -m scripts.fire_test_alert --type db_connection_timeout)",
"Bash(npm run:*)",
"Bash(docker-compose down:*)",
"Bash(docker-compose build:*)",
"Bash(docker-compose up:*)",
"Bash(pkill -f 'next dev')",
"Bash(node /Users/ogt/awoooi/scripts/test-approval-flow.js)",
"Bash(python -m scripts.fire_test_alert --type pod_crash)",
"Bash(node /Users/ogt/awoooi/scripts/test-k8s-executor.js)",
"Bash(kubectl cluster-info:*)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl cluster-info)",
"Bash(ls -la /Users/ogt/awoooi/apps/web/src/app/[locale]/)",
"Bash(python -c \"from src.api.v1 import audit_logs; print\\(''API module loads OK''\\)\")",
"Bash(curl -s http://localhost:3000/zh-TW/action-logs)",
"Bash(pnpm build:*)",
"Bash(curl -s http://localhost:8000/api/v1/audit-logs)",
"Bash(xargs -r kill -9 2)",
"Bash(/dev/null source:*)",
"Bash(python -c \"from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor; print\\(''''httpx ok''''\\)\")",
"Bash(sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 5;\")",
"Bash(sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT name FROM sqlite_master WHERE type=''table'';\")",
"Bash(sqlite3 /Users/ogt/awoooi/apps/api/awoooi.db \"SELECT id, event_type, status, title, created_at FROM timeline_events ORDER BY created_at DESC LIMIT 5;\")",
"Bash(curl -s http://localhost:8000/api/v1/audit-logs/stats)",
"Bash(curl -s http://localhost:8000/api/v1/timeline?limit=10)",
"Bash(curl -s \"http://localhost:8000/api/v1/timeline\")",
"Bash(curl -s http://localhost:8000/api/v1/docs)",
"Bash(chmod +x /Users/ogt/awoooi/scripts/setup-guardrails.sh /Users/ogt/awoooi/scripts/ai_code_reviewer.py)",
"Bash(ls -la /Users/ogt/awoooi/apps/web/.eslintrc*)",
"Bash(ls -la scripts/*.py scripts/*.sh .pre-commit-config.yaml .secrets.baseline apps/web/.eslintrc.js)",
"Bash(python -m src.services.test_context_gatherer)",
"Bash(python -m pytest src/services/test_context_gatherer.py -v)",
"Bash(grep -r \"ClawBot\\\\|clawbot\\\\|CLAWBOT\" --include=*.py --include=*.ts --include=*.tsx apps/)",
"Bash(python scripts/e2e_openclaw_test.py)",
"Bash(python -m pytest tests/e2e_network_test.py -v --tb=short)",
"Bash(chmod +x /Users/ogt/awoooi/apps/api/scripts/apply_prometheus_config.sh /Users/ogt/awoooi/apps/api/scripts/fire_live_alert.py)",
"Bash(./scripts/apply_prometheus_config.sh)",
"Bash(python scripts/fire_live_alert.py oomkilled)",
"Bash(python scripts/fire_live_alert.py oomkilled --api-url http://localhost:8000)",
"Bash(python scripts/fire_live_alert.py highcpu --api-url http://localhost:8000)",
"Bash(python scripts/fire_live_alert.py podcrash --api-url http://localhost:8000)",
"Bash(python -m pytest tests/test_webhook_telegram_integration.py -v)",
"Bash(ls -la /Users/ogt/awoooi/apps/api/.env*)",
"Bash(ls -la /Users/ogt/wooo-aiops/.env*)",
"Bash(ls -la /Users/ogt/AIOps/.env*)",
"Bash(/Users/ogt/awoooi/apps/api/.env:*)",
"Bash(/tmp/deploy-188-home.sh:*)",
"Bash(chmod +x /tmp/deploy-188-home.sh)",
"Bash(scp /tmp/awoooi-api-deploy.tar.gz /tmp/deploy-188-home.sh ollama@192.168.0.188:/tmp/)",
"Bash(ssh ollama@192.168.0.188 \"bash /tmp/deploy-188-home.sh\")",
"Bash(ssh ollama@192.168.0.188 \"curl -s http://localhost:8000/api/v1/webhooks/health\")",
"Bash(ssh ollama@192.168.0.188 \"tail -50 /tmp/openclaw.log\")",
"Bash(ssh ollama@192.168.0.188 \"cd /home/ollama/awoooi-api && source .venv/bin/activate && pip install sqlalchemy aiosqlite -q && pip install httpx python-dotenv pydantic-settings -q\")",
"Bash(ssh ollama@192.168.0.188 \"cd /home/ollama/awoooi-api && pkill -f ''uvicorn src.main:app'' 2>/dev/null; sleep 1; source .venv/bin/activate && nohup uvicorn src.main:app --host 0.0.0.0 --port 8000 > /tmp/openclaw.log 2>&1 & sleep 3 && curl -s http://localhost:8000/api/v1/webhooks/health\")",
"Bash(ssh ollama@192.168.0.188:*)",
"Bash(pkill -f ngrok)",
"Bash(pkill -f \"ssh -fN.*8001\")",
"Bash(ssh -fN -L 8001:localhost:8000 ollama@192.168.0.188)",
"Bash(curl -s http://localhost:8001/api/v1/webhooks/health)",
"Bash(BOT_TOKEN=\"8569720657:AAHdvKf_P2ms-QKFTyqTLtLiqEggz8cpjMk\" curl -s \"https://api.telegram.org/bot$BOT_TOKEN/getWebhookInfo\")",
"Bash(curl -s https://api.telegram.org/bot$BOT_TOKEN/getWebhookInfo)",
"Bash(curl -s http://localhost:8001/api/v1/webhooks/)",
"Bash(curl -s http://localhost:8001/)",
"Bash(curl -s http://localhost:8001/api/v1/health)",
"Bash(scp /tmp/awoooi-api-v7.tar.gz ollama@192.168.0.188:/tmp/)",
"Bash(tar -czvf /tmp/awoooi-api-v7.1.tar.gz src/ requirements.txt pyproject.toml)",
"Bash(scp /tmp/awoooi-api-v7.1.tar.gz ollama@192.168.0.188:/tmp/)",
"Bash(ssh ollama@192.168.0.188 \"tail -10 /tmp/openclaw.log | grep -E ''''clickhouse|signoz_gold''''\")",
"Bash(ssh ogt@192.168.0.188 \"cd /home/ollama/awoooi-api && tail -50 nohup.out 2>/dev/null || journalctl -u awoooi-api --no-pager -n 50 2>/dev/null || echo ''請手動檢查日誌''\")",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8123/ -d \"SELECT 1 FORMAT JSONEachRow\")",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:11434/api/tags)",
"Bash(ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5 ollama@192.168.0.188 \"echo ok\")",
"Bash(ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5 wooo@192.168.0.188 \"echo ok\")",
"Bash(ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5 root@192.168.0.188 \"echo ok\")",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8001/health)",
"Bash(ssh root@192.168.0.188 \"cat /tmp/openclaw.log 2>/dev/null | tail -100 || echo ''Log file not found''\")",
"Bash(ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ollama@192.168.0.188 \"echo ok\")",
"Bash(ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 wooo@192.168.0.188 \"echo ok\")",
"Bash(scp /Users/ogt/awoooi/apps/api/src/services/signoz_client.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/services/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/services/openclaw.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/services/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/services/telegram_gateway.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/services/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/api/v1/webhooks.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/api/v1/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/models/ai.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/models/)",
"Bash(ssh ollama@192.168.0.188 \"cd /home/ollama/awoooi-api && pkill -f ''''uvicorn src.main:app'''' && sleep 2 && nohup .venv/bin/python3 -m uvicorn src.main:app --host 0.0.0.0 --port 8000 > nohup.out 2>&1 &\")",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8000/health)",
"Bash(curl -s --connect-timeout 10 http://192.168.0.188:8000/health)",
"Bash(curl -s -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"high_cpu\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"api-gateway\"\",\"\"namespace\"\":\"\"awoooi-prod\"\",\"\"message\"\":\"\"CPU 92% test\"\"}')",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"high_cpu\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"api-gateway\"\",\"\"namespace\"\":\"\"awoooi-prod\"\",\"\"message\"\":\"\"CPU 92% - 統帥全自主驗收 v2\"\"}')",
"Bash(curl -s --connect-timeout 30 --max-time 120 -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s --connect-timeout 30 --max-time 180 -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"k8s_pod_crash\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"inventory-api\"\",\"\"namespace\"\":\"\"commerce\"\",\"\"message\"\":\"\"Pod crash - 統帥終極驗收\"\"}' --connect-timeout 30 --max-time 180)",
"Bash(ssh -o ConnectTimeout=10 ollama@192.168.0.188 \"echo OK && ps aux | grep uvicorn | grep -v grep | head -2\")",
"Bash(curl -s http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"ssl_expiry\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"nginx-ingress\"\",\"\"namespace\"\":\"\"ingress\"\",\"\"message\"\":\"\"SSL 即將過期 - 終極驗收\"\"}' --connect-timeout 30 --max-time 180)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"db_connection_timeout\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"postgres-primary\"\",\"\"namespace\"\":\"\"database\"\",\"\"message\"\":\"\"DB 連線逾時 - SignOz 整合終極測試\"\"}' --connect-timeout 30 --max-time 180)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"service_404\"\",\"\"severity\"\":\"\"critical\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"auth-service\"\",\"\"namespace\"\":\"\"identity\"\",\"\"message\"\":\"\"Service 404 - SignOz + Ollama 整合終極測試\"\"}' --connect-timeout 30 --max-time 180)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/webhooks/alerts -X POST -H \"Content-Type: application/json\" -d '{\"\"alert_type\"\":\"\"high_cpu\"\",\"\"severity\"\":\"\"warning\"\",\"\"source\"\":\"\"signoz\"\",\"\"target_resource\"\":\"\"recommendation-engine\"\",\"\"namespace\"\":\"\"ml\"\",\"\"message\"\":\"\"CPU 78% - Ollama 最終測試\"\"}' --connect-timeout 30 --max-time 200)",
"Bash(scp apps/api/src/services/openclaw.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/services/openclaw.py)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/core/http_client.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/core/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/main.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/core/config.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/core/)",
"Bash(scp /Users/ogt/awoooi/apps/api/src/api/v1/health.py ollama@192.168.0.188:/home/ollama/awoooi-api/src/api/v1/)",
"Bash(ssh -o ConnectTimeout=5 ollama@192.168.0.188 \"ps aux | grep uvicorn | grep -v grep\")",
"Bash(curl -s -H \"Origin: http://localhost:3000\" -H \"Access-Control-Request-Method: GET\" -X OPTIONS http://192.168.0.188:8000/api/v1/health -v)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/health)",
"Bash(curl -s -N --max-time 3 http://192.168.0.188:8000/api/v1/dashboard/stream)",
"Bash(curl -s http://localhost:3000/zh-TW -o /dev/null -w \"%{http_code}\")",
"Bash(open http://localhost:3000/zh-TW)",
"Bash(open http://localhost:3001/zh-TW)",
"Bash(curl -s -H \"Origin: http://localhost:3001\" http://192.168.0.188:8000/api/v1/dashboard/stream --max-time 3)",
"Bash(curl -s -I -H \"Origin: http://localhost:3001\" http://192.168.0.188:8000/api/v1/health)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/approvals/pending)",
"Bash(curl -s http://192.168.0.188:8000/api/v1/approvals)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/approvals?status=pending_approval\")",
"Bash(xargs sed:*)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/approvals/history?limit=5\")",
"Bash(curl -s http://192.168.0.188:8000/api/v1/approvals/approved)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/timeline?limit=10\")",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/action-logs\")",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/timeline/events?limit=10\")",
"Bash(ssh ogt@192.168.0.188 \"kubectl get nodes\")",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/approvals/k8s-test\")",
"Bash(scp /Users/ogt/awoooi/apps/api/k3s-prod.yaml ogt@192.168.0.188:~/awoooi-api/k3s-prod.yaml)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/timeline/events?limit=5\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.120 \"cat /etc/rancher/k3s/k3s.yaml\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no wooo@192.168.0.188 \"echo ''SSH OK'' && pwd\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no ollama@192.168.0.188 \"echo ''SSH OK'' && pwd && ls -la ~/awoooi-api/ 2>/dev/null || echo ''Directory not found''\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no ollama@192.168.0.188 \"sshpass -p ''0936223270'' scp -o StrictHostKeyChecking=no wooo@192.168.0.120:/etc/rancher/k3s/k3s.yaml ~/awoooi-api/k3s-prod.yaml && sed -i ''s/127.0.0.1/192.168.0.120/g'' ~/awoooi-api/k3s-prod.yaml && echo ''Kubeconfig deployed!'' && head -10 ~/awoooi-api/k3s-prod.yaml\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no ollama@192.168.0.188 \"cd ~/awoooi-api && pkill -f ''uvicorn'' 2>/dev/null; sleep 1; nohup .venv/bin/uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload > nohup.out 2>&1 & sleep 3; echo ''=== API Restarted ==='' && tail -20 nohup.out\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no ollama@192.168.0.188 \"cd ~/awoooi-api && pkill -f ''uvicorn src.main'' || true\")",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/health\" --connect-timeout 5)",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 ollama@192.168.0.188 \"cd ~/awoooi-api && source .venv/bin/activate && nohup uvicorn src.main:app --host 0.0.0.0 --port 8000 > nohup.out 2>&1 &\")",
"Bash(sshpass -p:*)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/health\" --connect-timeout 10)",
"Bash(curl -s \"http://192.168.0.188:8000/api/v1/timeline/events?limit=8\")",
"Bash(curl -s http://localhost:3000/zh-TW -o /dev/null -w \"Frontend: HTTP %{http_code}\\\\n\")",
"Bash(sshpass -p '0936223270' ssh -o StrictHostKeyChecking=no ollama@192.168.0.188 'curl -s http://localhost:8000/api/v1/approvals/pending | jq -r \"\".approvals[] | \\\\\"\"ID: \\\\\\(.id\\) | Action: \\\\\\(.action\\)\\\\\"\"\"\"')",
"Bash(curl -s --connect-timeout 5 https://awoooi.wooo.tw/api/v1/health)",
"Bash(curl -s --connect-timeout 5 https://awoooi.wooo.tw/api/v1/approvals/pending)",
"Bash(ssh ollama@192.168.70.188 \"ps aux | grep uvicorn | grep -v grep | head -3\")",
"Bash(ssh -o ConnectTimeout=10 ollama@192.168.70.188 \"echo ''SSH Connected''\")",
"Bash(ping -c 2 -t 5 192.168.70.188)",
"Bash(curl -s --connect-timeout 10 https://awoooi.wooo.tw/api/v1/health)",
"Bash(ssh -o ConnectTimeout=10 ollama@192.168.0.188 \"echo ''SSH Connected to 188 Base''\")",
"Bash(grep -B 5 -A 30 \"async def add_signature\" /Users/ogt/awoooi/apps/api/src/services/*.py)",
"Bash(ssh ogt@192.168.0.188 \"cd /home/ogt/awoooi && docker compose ps\")",
"Bash(ls -la .env*)",
"Bash(.env:*)",
"Bash(timeout 15 python -m uvicorn src.main:app --host 0.0.0.0 --port 8001)",
"Bash(timeout 20 python -m uvicorn src.main:app --host 0.0.0.0 --port 8001)",
"Bash(timeout 25 python -m uvicorn src.main:app --host 0.0.0.0 --port 8001)",
"Bash(ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ogt@192.168.0.188 \"cd /home/ogt/wooo-aiops && docker compose ps clawbot 2>/dev/null || docker ps | grep -i claw\")",
"Bash(ls -la ~/.ssh/*.pub)",
"Bash(ssh -i ~/.ssh/id_rsa -o ConnectTimeout=5 -o StrictHostKeyChecking=no -o PasswordAuthentication=no ogt@192.168.0.188 \"echo connected\")",
"Bash(curl -s \"https://api.telegram.org/bot8569720657:AAHdvKf_P2ms-QKFTyqTLtLiqEggz8cpjMk/logOut\")",
"Bash(curl -s \"https://api.telegram.org/bot8569720657:AAHdvKf_P2ms-QKFTyqTLtLiqEggz8cpjMk/close\")",
"Bash(curl -s \"https://api.telegram.org/bot8569720657:AAHdvKf_P2ms-QKFTyqTLtLiqEggz8cpjMk/getUpdates?timeout=3&limit=1\")",
"Bash(ping -c 1 192.168.0.188)",
"Bash(python -m tests.test_redis_multisig)",
"Bash(curl -v -X POST http://localhost:8000/api/v1/webhooks/signals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(python3 -c \":*)",
"Bash(echo ' 無法連線' __NEW_LINE_8fc87454f9798a7d__ echo echo [結論]: echo ' /signals 端點尚未部署到 .188' echo ' 程式碼已完成,需要執行:' echo \" cd apps/api && docker build -t awoooi-api . && docker-compose up -d\")",
"Bash(__NEW_LINE_dc88f37970737861__ cd:*)",
"Bash(__NEW_LINE_dc88f37970737861__ echo:*)",
"Read(//Users/**)",
"Bash(tail -20 __NEW_LINE_8b049957a9782734__ echo \"\" echo \"[Step 2] 等待容器啟動 \\(10 秒\\)...\" sleep 10 __NEW_LINE_8b049957a9782734__ echo \"\" echo \"[Step 3] 檢查容器狀態...\" docker compose ps)",
"Bash(tail -5 __NEW_LINE_275e0094e9dcb44a__ echo \"\" echo \"[1.2] 重建 API 容器 \\(含 Signal Worker\\)...\" docker compose build api)",
"Bash(1 __NEW_LINE_275e0094e9dcb44a__ echo \"\" echo \"[1.4] 等待服務就緒 \\(15 秒\\)...\" sleep 15 __NEW_LINE_275e0094e9dcb44a__ echo \"\" echo \"[1.5] 檢查容器狀態...\" docker compose ps)",
"Bash(__NEW_LINE_f4c8301ec5249760__ echo:*)",
"Bash(__NEW_LINE_21ba3cf3700d942d__ cd:*)",
"Bash(1 __NEW_LINE_9a14b79fc58c11ba__ echo \"\" echo \"[1.3] 等待服務就緒 \\(15 秒\\)...\" sleep 15 __NEW_LINE_9a14b79fc58c11ba__ echo \"\" echo \"[1.4] 檢查容器狀態...\" docker compose ps api)",
"Bash(1 __NEW_LINE_6b654ca5be87c137__ echo \"\" echo \"[2] 等待服務就緒 \\(15 秒\\)...\" sleep 15 __NEW_LINE_6b654ca5be87c137__ echo \"\" echo \"[3] 發送測試 Signal...\" curl -s -X POST http://localhost:8000/api/v1/webhooks/signals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(__NEW_LINE_564908ddf866c081__ echo:*)",
"Bash(chmod +x /Users/ogt/awoooi/apps/api/scripts/test_phase63_aggregation.py)",
"Bash(python scripts/test_phase63_aggregation.py)",
"Bash(xargs -r docker exec -i awoooi-redis redis-cli DEL)",
"Bash(chmod +x /Users/ogt/awoooi/apps/api/scripts/test_race_condition.py)",
"Bash(python scripts/test_race_condition.py)",
"Bash(chmod +x /Users/ogt/awoooi/apps/api/scripts/test_phase64_proposal.py)",
"Bash(python scripts/test_phase64_proposal.py)",
"Bash(python agent.py --alert FINAL_PHASE_6_TEST)",
"Bash(AWOOOI_REDIS_URL=\"redis://localhost:6379/0\" python agent.py --alert FINAL_PHASE_6_TEST)",
"Bash(curl -s http://localhost:8000/api/v1/incidents)",
"Bash(curl -s -X POST http://localhost:8000/api/v1/incidents/INC-20260322-06085B/proposal)",
"Bash(grep -r \"mock\\\\|Mock\\\\|MOCK\\\\|fake\\\\|Fake\\\\|dummy\\\\|hardcode\" /Users/ogt/awoooi/apps/web/src --include=*.tsx --include=*.ts -l)",
"Bash(NEXT_PUBLIC_API_URL=http://localhost:8000 pnpm next build --no-lint)",
"Bash(grep -v \"Traceback\\\\|File \"\"/usr\\\\|^\\\\s*$\")",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f''''Signal Count: {len\\(d[\"\"signals\"\"]\\)}''''\\); [print\\(f'''' - {s[\"\"alert_name\"\"]} \\({s[\"\"signal_id\"\"]}\\)''''\\) for s in d[''''signals'''']]\")",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" http://localhost:3003/zh-TW)",
"Bash(curl -s -X GET \"http://localhost:8000/api/v1/incidents\" -H \"Origin: http://localhost:3003\" -H \"Access-Control-Request-Method: GET\" -v)",
"Bash(grep -r TELEGRAM /Users/ogt/awoooi/apps/api/.env*)",
"Bash(grep -r TELEGRAM_BOT_TOKEN /Users/ogt/awoooi --include=*.env* --include=*.yaml --include=*.yml)",
"Bash(curl -s -I -X OPTIONS \"http://localhost:8000/api/v1/incidents\" -H \"Origin: http://localhost:3000\" -H \"Access-Control-Request-Method: GET\")",
"Bash(curl -s \"http://localhost:8000/api/v1/incidents\" -H \"Origin: http://localhost:3000\")",
"Bash(python /tmp/e2e_drill.py)",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); i=[x for x in d[''''incidents''''] if x[''''incident_id'''']==''''INC-20260322-06085B''''][0]; print\\(f\"\"Incident: {i[''''incident_id'''']}\"\"\\); print\\(f\"\"Signals: {i[''''signal_count'''']}\"\"\\); print\\(f\"\"Updated: {i[''''updated_at'''']}\"\"\\)\")",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/telegram/test\")",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/telegram/test-push\" -H \"Content-Type: application/json\" -d '{\"\"\"\"approval_id\"\"\"\": \"\"\"\"15ab6844-ca4e-4a13-aead-dc71cd342445\"\"\"\", \"\"\"\"risk_level\"\"\"\": \"\"\"\"critical\"\"\"\", \"\"\"\"resource_name\"\"\"\": \"\"\"\"api-gateway\"\"\"\", \"\"\"\"root_cause\"\"\"\": \"\"\"\"E2E DRILL - PodCrashLoopBackOff\"\"\"\", \"\"\"\"suggested_action\"\"\"\": \"\"\"\"RESTART_DEPLOYMENT\"\"\"\", \"\"\"\"estimated_downtime\"\"\"\": \"\"\"\"5-15 min\"\"\"\"}')",
"Bash(curl -s -o /dev/null -w \"HTTP Status: %{http_code}\\\\n\" http://localhost:3000/zh-TW)",
"Bash(curl -s -I \"http://localhost:8000/api/v1/incidents\" -H \"Origin: http://localhost:3000\")",
"Bash(curl -s -X POST http://localhost:8000/api/v1/incidents/INC-20260322-19DF60/proposal)",
"Bash(curl -s -X POST \"http://localhost:8000/api/v1/telegram/test-push\" -H \"Content-Type: application/json\" -d '{\"\"\"\"approval_id\"\"\"\": \"\"\"\"942e762e-fb97-480f-b21a-d3be67fa70b1\"\"\"\", \"\"\"\"risk_level\"\"\"\": \"\"\"\"critical\"\"\"\", \"\"\"\"resource_name\"\"\"\": \"\"\"\"core-system\"\"\"\", \"\"\"\"root_cause\"\"\"\": \"\"\"\"E2E DRILL TAKE 2 - 二次實彈演習\"\"\"\", \"\"\"\"suggested_action\"\"\"\": \"\"\"\"INVESTIGATE_SERVICE\"\"\"\", \"\"\"\"estimated_downtime\"\"\"\": \"\"\"\"5-15 min\"\"\"\"}')",
"Bash(curl -s \"http://localhost:8000/api/v1/incidents\" -H \"Origin: http://localhost:3000\" -H \"Accept: application/json\")",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f''''Incidents: {d[\"\"count\"\"]}''''\\); [print\\(f'''' - {i[\"\"incident_id\"\"]} | {i[\"\"severity\"\"]} | {i[\"\"signal_count\"\"]} signals | {i[\"\"affected_services\"\"]}''''\\) for i in d[''''incidents'''']]\")",
"Bash(curl -s \"http://localhost:8000/api/v1/approvals/pending\" -H \"Origin: http://localhost:3000\")",
"Bash(python -c \"import sys,json; d=json.load\\(sys.stdin\\); print\\(f''''Pending: {d[\"\"count\"\"]} approvals''''\\); [print\\(f'''' - {a[\"\"id\"\"][:8]}... | {a[\"\"risk_level\"\"]} | {a[\"\"action\"\"][:30]}...''''\\) for a in d[''''approvals''''][:3]]\")",
"Bash(mkdir -p /Users/ogt/awoooi/apps/web/public/fonts)",
"Bash(curl -sL -o DSEG7Classic-Bold.woff2 \"https://cdn.jsdelivr.net/npm/dseg@0.46.0/fonts/DSEG7-Classic/DSEG7Classic-Bold.woff2\")",
"Bash(curl -sL -o DSEG7Classic-Bold.woff \"https://cdn.jsdelivr.net/npm/dseg@0.46.0/fonts/DSEG7-Classic/DSEG7Classic-Bold.woff\")",
"Bash(curl -sL -o DSEG7Classic-Regular.woff2 \"https://cdn.jsdelivr.net/npm/dseg@0.46.0/fonts/DSEG7-Classic/DSEG7Classic-Regular.woff2\")",
"Bash(curl -sL -o DSEG7Classic-Regular.woff \"https://cdn.jsdelivr.net/npm/dseg@0.46.0/fonts/DSEG7-Classic/DSEG7Classic-Regular.woff\")",
"Bash(pnpm next:*)",
"Bash(chmod +x /Users/ogt/awoooi/scripts/bootstrap_prod.sh)",
"Bash(/Users/ogt/awoooi/.env:*)",
"Bash(grep -E \"^\\\\.env$|03-secrets\\\\.yaml\" .gitignore)",
"Bash(echo 'Adding to .gitignore...' if ! grep -q ^.env$ .gitignore)",
"Bash(then echo:*)",
"Bash(git add:*)",
"Bash(git commit:*)",
"Bash(git push:*)",
"Bash(git remote:*)",
"Bash(gh repo:*)",
"Bash(gh api:*)",
"Bash(gh run:*)",
"Bash(ls -la pnpm-*.yaml package.json turbo.json)",
"Bash(git status:*)",
"Bash(gh workflow:*)",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-77545758fc-xnncc -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-77545758fc-xnncc -n awoooi-prod 2>&1 | grep -i ''cors'' -A 5 -B 5\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-79948cbbbf-b8cgj -n awoooi-prod --tail=100\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod -l app=awoooi-api --sort-by=.metadata.creationTimestamp -o name | tail -1 | xargs kubectl logs -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath=''{.data.OPENCLAW_TG_USER_WHITELIST}'' | base64 -d\")",
"Bash(ssh wooo@192.168.0.120 'kubectl patch secret awoooi-secrets -n awoooi-prod --type='\"''\"'json'\"''\"' -p='\"''\"'[:*)",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout restart deployment/awoooi-api -n awoooi-prod && kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=120s\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout restart deployment/awoooi-worker -n awoooi-prod && kubectl rollout status deployment/awoooi-worker -n awoooi-prod --timeout=120s\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-747967b787-fcx2r -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.110 \"ps aux | grep -E ''actions-runner|Runner'' | grep -v grep\")",
"Bash(curl -sf http://192.168.0.120:32334/api/v1/health)",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-fd795cd87-rdpgn -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf http://192.168.0.120:32334/api/v1/health | jq .status\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf http://192.168.0.120:32334/api/v1/health\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf http://localhost:32334/api/v1/health\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get svc -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf http://10.43.125.201:8000/api/v1/health\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf http://10.43.105.105:3000/ -o /dev/null && echo ''Web OK''\")",
"Bash(ssh ogt@192.168.0.188 \"ls -la /etc/nginx/sites-available/\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-795c95ff76-wch2p -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod && ss -tlnp | grep 32334\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf http://127.0.0.1:32334/api/v1/health | head -c 200\")",
"Bash(ssh wooo@192.168.0.120 \"sudo ufw status 2>/dev/null || sudo iptables -L INPUT -n | head -20\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf --connect-timeout 5 http://192.168.0.120:32334/api/v1/health | head -c 100\")",
"Bash(ssh wooo@192.168.0.110 \"curl -v --connect-timeout 5 http://192.168.0.120:32334/api/v1/health 2>&1 | head -30\")",
"Bash(ssh wooo@192.168.0.120 \"cat /etc/systemd/system/k3s.service 2>/dev/null | grep -i exec || ps aux | grep k3s | head -3\")",
"Bash(ssh wooo@192.168.0.120 \"cat /etc/systemd/system/k3s.service\")",
"Bash(ssh wooo@192.168.0.120 \"netstat -tlnp 2>/dev/null | grep 32334 || ss -tlnp | grep 32334\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf --connect-timeout 5 http://192.168.0.120:31234/health 2>&1 | head -c 100\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy allow-nginx-ingress -n awoooi-prod -o yaml\")",
"Bash(curl -sk https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -sk -I -X OPTIONS https://awoooi.wooo.work/api/v1/health -H \"Origin: https://awoooi.wooo.work\" -H \"Access-Control-Request-Method: GET\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sI --connect-timeout 3 http://127.0.0.1:32334/api/v1/health 2>&1 | head -5\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sI --connect-timeout 3 http://127.0.0.1:32335/ 2>&1 | head -5\")",
"Bash(ssh wooo@192.168.0.121 \"curl -sI --connect-timeout 3 http://127.0.0.1:32334/api/v1/health 2>&1 | head -5\")",
"Bash(ssh wooo@192.168.0.121 \"curl -sI --connect-timeout 3 http://127.0.0.1:32335/ 2>&1 | head -5\")",
"Bash(ssh wooo@192.168.0.120 \"sudo iptables -t nat -L KUBE-NODEPORTS -n 2>/dev/null | head -20\")",
"Bash(ssh wooo@192.168.0.120 \"sudo netstat -tlnp | grep -E ''32334|32335''\")",
"Bash(ssh wooo@192.168.0.120 \"ss -tlnp 2>/dev/null | grep -E ''32334|32335'' || netstat -tln | grep -E ''32334|32335''\")",
"Bash(ssh wooo@192.168.0.120 \"ss -tln | grep -E ''32334|32335|:323''\")",
"Bash(ssh wooo@192.168.0.120 \"ss -tln\")",
"Bash(ssh wooo@192.168.0.120 \"export KUBECONFIG=/home/wooo/.kube/config-120; /home/wooo/bin/kubectl get svc -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"which kubectl || find /usr -name kubectl 2>/dev/null | head -1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get svc -n awoooi-prod && kubectl get pods -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"export KUBECONFIG=/home/wooo/.kube/config-120 && kubectl logs awoooi-api-546b88465d-lb8zm -n awoooi-prod --tail 80\")",
"Bash(ssh wooo@192.168.0.120 \"KUBECONFIG=/home/wooo/.kube/config-120 kubectl logs awoooi-api-546b88465d-lb8zm -n awoooi-prod --tail 80 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"ls -la /home/wooo/.kube/ && cat /home/wooo/.kube/config-120 2>/dev/null | head -20 || cat /etc/rancher/k3s/k3s.yaml 2>/dev/null | head -20\")",
"Bash(ssh wooo@192.168.0.120 \"sudo cat /etc/rancher/k3s/k3s.yaml | head -20\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && kubectl logs awoooi-api-546b88465d-lb8zm -n awoooi-prod --tail 100 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"which kubectl 2>/dev/null || find /home/wooo -name kubectl 2>/dev/null | head -1 || ls -la /home/wooo/bin/\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs awoooi-api-546b88465d-lb8zm -n awoooi-prod --tail 100 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl describe pod awoooi-api-546b88465d-lb8zm -n awoooi-prod | tail -40\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get svc -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl exec -n awoooi-prod deploy/awoooi-api -- curl -sf http://localhost:8000/api/v1/health 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl exec -n awoooi-prod deploy/awoooi-api -- wget -qO- http://localhost:8000/api/v1/health 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 20 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf http://192.168.0.120:32334/api/v1/health 2>&1 || echo ''FAILED to connect to 120:32334''\")",
"Bash(ssh wooo@192.168.0.110 \"curl -sf http://192.168.0.121:32334/api/v1/health 2>&1 || echo ''FAILED to connect to 121:32334''\")",
"Bash(ssh wooo@192.168.0.110 \"ssh wooo@192.168.0.120 ''cat /etc/rancher/k3s/k3s.yaml 2>/dev/null || echo No k3s.yaml''\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get pods -n awoooi-prod -o wide | grep Running\")",
"Bash(ssh -o ConnectTimeout=5 wooo@192.168.0.120 \"ufw status 2>/dev/null || firewall-cmd --state 2>/dev/null || echo ''No firewall command found''\")",
"Bash(ssh -o ConnectTimeout=5 wooo@192.168.0.121 \"ufw status 2>/dev/null || firewall-cmd --state 2>/dev/null || echo ''No firewall command found''\")",
"Bash(pip3 show:*)",
"Bash(docker build:*)",
"Bash(docker version:*)",
"Bash(docker run:*)",
"Bash(curl -vI -H \"Origin: https://awoooi.wooo.work\" http://localhost:8889/api/v1/health)",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get endpoints awoooi-api-svc -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get pods -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"sudo -n ufw status 2>/dev/null || sudo -n iptables -L INPUT -n 2>/dev/null | head -20 || echo ''Need sudo for firewall check''\")",
"Bash(ssh wooo@192.168.0.120 \"ss -tln | grep -E ''32334|32335|:323'' || echo ''No NodePort listeners found''\")",
"Bash(ssh wooo@192.168.0.121 \"ss -tln | grep -E ''32334|32335|:323'' || echo ''No NodePort listeners found''\")",
"Bash(ssh wooo@192.168.0.120 \"ps aux | grep -E ''kube-proxy|k3s'' | grep -v grep | head -5\")",
"Bash(ssh wooo@192.168.0.120 \"cat /proc/sys/net/ipv4/ip_forward\")",
"Bash(ssh wooo@192.168.0.120 \"systemctl status k3s 2>/dev/null | head -15 || ps aux | grep ''k3s server'' | grep -v grep\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf --connect-timeout 5 http://127.0.0.1:32334/api/v1/health 2>&1 || echo ''LOCALHOST NodePort FAILED''\")",
"Bash(ssh wooo@192.168.0.120 \"curl -sf --connect-timeout 5 http://192.168.0.120:32334/api/v1/health 2>&1 || echo ''EXTERNAL IP NodePort FAILED''\")",
"Bash(ssh wooo@192.168.0.120 \"cat /etc/iptables/rules.v4 2>/dev/null || iptables-save 2>/dev/null | grep -E ''DROP|REJECT|32334|32335'' | head -10 || echo ''Cannot read iptables without sudo''\")",
"Bash(ssh wooo@192.168.0.121 \"curl -sf --connect-timeout 5 http://192.168.0.120:32334/api/v1/health 2>&1 || echo ''Worker->Master NodePort FAILED''\")",
"Bash(ssh wooo@192.168.0.120 \"cat /etc/rancher/k3s/config.yaml 2>/dev/null || ls -la /etc/rancher/k3s/ 2>/dev/null || echo ''No K3s config found''\")",
"Bash(ssh wooo@192.168.0.120 \"netstat -an 2>/dev/null | grep 32334 || ss -an | grep 32334 || echo ''No socket found for 32334''\")",
"Bash(ssh wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S iptables -L INPUT -n 2>&1 | head -20\")",
"Bash(ssh wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S iptables -t nat -L KUBE-NODEPORTS -n 2>&1 | head -20\")",
"Bash(ssh wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S iptables -L KUBE-ROUTER-INPUT -n 2>&1 | head -30\")",
"Bash(ssh wooo@192.168.0.120 \"echo ''0936223270'' | sudo -S iptables -t nat -L KUBE-NODEPORTS -n 2>&1 | grep -i awoooi || echo ''NO AWOOOI RULES FOUND''\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get svc awoooi-api-svc -n awoooi-prod -o yaml | grep -A5 ''spec:''\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get networkpolicy -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl apply -f - 2>&1\")",
"Bash(curl -sf --connect-timeout 10 https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -skf --connect-timeout 10 https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -sI https://awoooi.wooo.work/)",
"Bash(curl -skI https://awoooi.wooo.work/)",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 50 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl rollout restart deployment/awoooi-api -n awoooi-prod && /home/wooo/kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=120s\")",
"Bash(curl -sf https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -skf https://awoooi.wooo.work/api/v1/health)",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 40 2>&1\")",
"Bash(for i:*)",
"Bash(do curl:*)",
"Bash(echo \"Request $i sent\")",
"Bash(done)",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 100 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 30 2>&1\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get configmap awoooi-config -n awoooi-prod -o yaml | grep OTEL\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl exec deployment/awoooi-api -n awoooi-prod -- env | grep OTEL\")",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl exec deployment/awoooi-api -n awoooi-prod -- python -c \"\"import socket; s=socket.socket\\(\\); s.settimeout\\(5\\); s.connect\\(\\(''192.168.0.188'', 24317\\)\\); print\\(''✅ Connection to 24317 OK''\\); s.close\\(\\)\"\" 2>&1\")",
"Bash(curl -vI https://awoooi.wooo.work)",
"Bash(curl -vI https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -sf -X POST https://awoooi.wooo.work/api/v1/webhooks/signals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(curl -s -X POST https://awoooi.wooo.work/api/v1/webhooks/signals -H \"Content-Type: application/json\" -d '{\"\"source\"\": \"\"prometheus\"\", \"\"severity\"\": \"\"P1\"\", \"\"message\"\": \"\"Test alert from CLI\"\"}')",
"Bash(curl -s -X POST https://awoooi.wooo.work/api/v1/webhooks/signals -H \"Content-Type: application/json\" -d '{:*)",
"Bash(ssh wooo@192.168.0.110 \"export KUBECONFIG=/home/wooo/.kube/config-120 && /home/wooo/kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath=''''{.data.WEBHOOK_HMAC_SECRET}'''' 2>/dev/null\")",
"Bash(timeout 15 curl -N -s https://awoooi.wooo.work/api/v1/dashboard/stream)",
"Bash(bash:*)",
"Bash(curl -s https://awoooi.wooo.work/api/v1/metrics/gold)",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT DISTINCT metric_name FROM signoz_metrics.distributed_samples_v4 WHERE unix_milli > \\(toUnixTimestamp\\(now\\(\\)\\) - 1800\\) * 1000 LIMIT 20 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT count\\(\\) as trace_count FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 30 MINUTE FORMAT TabSeparated\")",
"Bash(ssh wooo@192.168.0.120 \"KUBECONFIG=/home/wooo/.kube/config-120 /home/wooo/bin/kubectl get configmap awoooi-config -n awoooi-prod -o jsonpath=''{.data}'' | python3 -m json.tool 2>/dev/null | head -30\")",
"Bash(ssh wooo@192.168.0.120 \"KUBECONFIG=/home/wooo/.kube/config-120 /home/wooo/bin/kubectl logs deployment/awoooi-api -n awoooi-prod --tail 50 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"which kubectl || ls -la ~/bin/kubectl 2>/dev/null || ls -la /usr/local/bin/kubectl 2>/dev/null || echo ''kubectl not found''\")",
"Bash(ssh wooo@192.168.0.120 \"export KUBECONFIG=/home/wooo/.kube/config-120 && kubectl get configmap awoooi-config -n awoooi-prod -o jsonpath=''{.data}'' 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"ls -la ~/.kube/ 2>/dev/null; cat ~/.kube/config 2>/dev/null | head -20 || echo ''checking k3s default...''; sudo cat /etc/rancher/k3s/k3s.yaml 2>/dev/null | head -5 || echo ''no k3s config''\")",
"Bash(ssh wooo@192.168.0.120 \"sudo k3s kubectl get configmap awoooi-config -n awoooi-prod -o yaml 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"sudo k3s kubectl logs deployment/awoooi-api -n awoooi-prod --tail 100 2>&1\")",
"Bash(nc -zv 192.168.0.188 24317)",
"Bash(curl -s http://192.168.0.188:24318/v1/traces -X POST -H \"Content-Type: application/json\" -d '{}')",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT DISTINCT serviceName, count\\(\\) as cnt FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 24 HOUR GROUP BY serviceName ORDER BY cnt DESC LIMIT 20 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"DESCRIBE TABLE signoz_traces.distributed_signoz_index_v2 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, count\\(\\) as cnt FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 5 MINUTE GROUP BY serviceName ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, count\\(\\) as cnt FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 10 MINUTE GROUP BY serviceName ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT service_name, count\\(\\) as cnt FROM signoz_logs.distributed_logs WHERE timestamp > now\\(\\) - INTERVAL 30 MINUTE GROUP BY service_name ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SHOW TABLES FROM signoz_logs FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT count\\(\\) as total FROM signoz_logs.distributed_logs_v2 WHERE timestamp > now\\(\\) - INTERVAL 30 MINUTE FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT JSONExtractString\\(resources_string, ''service.name''\\) as svc, count\\(\\) as cnt FROM signoz_logs.distributed_logs_v2 WHERE timestamp > now\\(\\) - INTERVAL 5 MINUTE GROUP BY svc ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"DESCRIBE TABLE signoz_logs.distributed_logs_v2 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT resources_string[''service.name''] as svc, count\\(\\) as cnt FROM signoz_logs.distributed_logs_v2 WHERE timestamp > \\(toUnixTimestamp64Nano\\(now64\\(\\)\\) - 300000000000\\) GROUP BY svc ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT body, resources_string FROM signoz_logs.distributed_logs_v2 WHERE timestamp > \\(toUnixTimestamp64Nano\\(now64\\(\\)\\) - 60000000000\\) LIMIT 1 FORMAT JSONEachRow\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, count\\(\\) as cnt FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 2 MINUTE GROUP BY serviceName ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, name, timestamp FROM signoz_traces.distributed_signoz_index_v2 WHERE timestamp > now\\(\\) - INTERVAL 5 MINUTE ORDER BY timestamp DESC LIMIT 5 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, name, formatDateTime\\(timestamp, ''%Y-%m-%d %H:%M:%S''\\) as ts FROM signoz_traces.distributed_signoz_index_v2 ORDER BY timestamp DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT count\\(\\) FROM signoz_traces.distributed_signoz_index_v2 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT count\\(\\) FROM signoz_traces.distributed_signoz_spans FORMAT TabSeparated\")",
"Bash(ssh wooo@192.168.0.188 \"docker ps | grep -E ''otel|signoz''\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT metric_name, sum\\(value\\) as total FROM signoz_metrics.distributed_samples_v4 WHERE metric_name LIKE ''otelcol%span%'' AND unix_milli > \\(toUnixTimestamp\\(now\\(\\)\\) - 300\\) * 1000 GROUP BY metric_name FORMAT TabSeparated\")",
"Bash(for t:*)",
"Bash(do)",
"Bash(echo -n \"$t: \")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT count\\(\\) FROM signoz_traces.$t FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"SELECT serviceName, count\\(\\) as cnt FROM signoz_traces.distributed_signoz_index_v3 WHERE timestamp > now\\(\\) - INTERVAL 10 MINUTE GROUP BY serviceName ORDER BY cnt DESC LIMIT 10 FORMAT TabSeparated\")",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \":*)",
"Bash(curl -s 'http://192.168.0.188:8123/' --data \"DESCRIBE TABLE signoz_traces.distributed_signoz_index_v3 FORMAT TabSeparated\")",
"Bash(AWOOOI_API_URL=https://awoooi.wooo.work WEBHOOK_HMAC_SECRET=\"CHANGE_ME_TO_RANDOM_64_CHARS\" python scripts/fire_live_alert.py oomkilled)",
"Bash(timeout 10 curl -sN https://awoooi.wooo.work/api/v1/dashboard/stream)",
"Bash(curl -s https://awoooi.wooo.work/api/v1/dashboard)",
"Bash(npm list:*)",
"Bash(node scripts/verify-frontend.js)",
"Bash(node /Users/ogt/awoooi/scripts/verify-frontend.js)",
"Bash(python -c \"from src.services.proposal_service import ProposalService; print\\(''''✅ ProposalService OK''''\\)\")",
"Bash(python -c \"from src.services.openclaw import OpenClawService; print\\(''''✅ OpenClawService OK''''\\)\")",
"Bash(curl -s http://192.168.0.120:32334/api/v1/incidents)",
"Bash(jq -r \".incidents[:2] | .[] | \"\"\\\\\\(.incident_id\\) - \\\\\\(.status\\) - \\\\\\(.severity\\)\"\"\")",
"Bash(curl -s -X POST \"http://192.168.0.120:32334/api/v1/incidents/INC-20260322-4B3152/propose\" -H \"Content-Type: application/json\")",
"Bash(kubectl logs:*)",
"Bash(ssh ogt@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail 30\")",
"Bash(curl -sv -X POST \"http://192.168.0.120:32334/api/v1/incidents/INC-20260322-4B3152/propose\" -H \"Content-Type: application/json\")",
"Bash(curl -s http://192.168.0.120:32334/api/v1/health)",
"Bash(curl -s \"http://192.168.0.120:32334/api/v1/incidents/INC-20260322-4B3152\")",
"Bash(curl -sv \"http://192.168.0.120:32334/api/v1/incidents\")",
"Bash(curl -s --retry 3 --retry-delay 2 \"http://192.168.0.120:32334/api/v1/health\")",
"Bash(curl -s --retry 3 --retry-delay 2 http://192.168.0.120:32334/api/v1/health)",
"Bash(do echo:*)",
"Bash(curl -s -X POST \"https://awoooi.wooo.work/api/v1/incidents/INC-20260322-4B3152/propose\" -H \"Content-Type: application/json\")",
"Bash(curl -s -X POST \"https://awoooi.wooo.work/api/v1/incidents/INC-20260322-4B3152/proposal\" -H \"Content-Type: application/json\")",
"Bash(curl -s -X POST \"https://awoooi.wooo.work/api/v1/incidents/INC-20260322-D6C6A0/proposal\" -H \"Content-Type: application/json\")",
"Bash(curl -s http://192.168.0.120:32334/api/v1/approvals/pending)",
"Bash(kubectl get:*)",
"Bash(curl -s -w \"\\\\nHTTP_CODE: %{http_code}\\\\n\" http://192.168.0.120:32334/api/v1/health)",
"Bash(curl -s http://awoooi.wooo.work/api/v1/health)",
"Bash(curl -s http://awoooi.wooo.work/api/v1/approvals/pending)",
"Bash(curl -sL https://awoooi.wooo.work/api/v1/approvals/pending -k)",
"Bash(ssh root@192.168.0.120 \"kubectl get pods -n awoooi-prod -o wide\")",
"Bash(ssh root@192.168.0.120 \"kubectl logs -n awoooi-prod -l app=awoooi-api --tail=30\")",
"Bash(curl -sL https://awoooi.wooo.work/api/v1/timeline -k)",
"Bash(curl -sL https://awoooi.wooo.work/api/v1/incidents -k)",
"Bash(curl -sL \"https://awoooi.wooo.work/api/v1/approvals?include_history=true\" -k)",
"Bash(curl -sL \"https://awoooi.wooo.work/api/v1/incidents/INC-20260322-4B3152\" -k)",
"Bash(curl -sL \"https://awoooi.wooo.work/api/v1/audit-logs?limit=10\" -k)",
"Bash(curl -sL https://awoooi.wooo.work/api/v1/audit-logs?limit=10 -k)",
"Bash(ssh ogt@192.168.0.120 \"kubectl logs -n awoooi-prod -l app=awoooi-api --tail=100\")",
"Bash(ssh ogt@192.168.0.120 \"kubectl logs -n awoooi-prod -l app=awoooi-web --tail=50\")",
"Bash(ssh ogt@192.168.0.188 \"kubectl --kubeconfig=/etc/rancher/k3s/k3s.yaml logs -n awoooi-prod -l app=awoooi-api --tail=100 2>/dev/null || docker logs awoooi-api --tail=100 2>/dev/null\")",
"Bash(curl -sL \"https://awoooi.wooo.work/api/v1/approvals/pending\" -k -w \"\\\\n\\\\nHTTP: %{http_code}\\\\nTime: %{time_total}s\\\\n\")",
"Bash(curl -sL -X POST https://awoooi.wooo.work/api/v1/approvals/182e07c1-118a-49d7-b71c-7d33c5484d9b/sign -H 'Content-Type: application/json' -d '{\"\"\"\"signer_id\"\"\"\": \"\"\"\"test-debug\"\"\"\", \"\"\"\"signer_name\"\"\"\": \"\"\"\"Debug Test\"\"\"\", \"\"\"\"comment\"\"\"\": \"\"\"\"Testing\"\"\"\"}' -k)",
"Bash(curl -s https://wwooo.aiops.tw/api/v1/health)",
"Bash(curl -s https://wwooo.aiops.tw/api/v1/incidents?limit=5)",
"Bash(curl -s https://wwooo.aiops.tw/api/v1/approvals/pending)",
"Bash(curl -v -s \"https://wwooo.aiops.tw/api/v1/health\")",
"Bash(curl -s \"https://wwooo.aiops.tw/\")",
"Bash(curl -s --connect-timeout 5 \"http://192.168.0.120:32334/api/v1/health\")",
"Bash(curl -s --connect-timeout 5 \"http://192.168.0.120:32334/api/v1/incidents?limit=5\")",
"Bash(ssh -o ConnectTimeout=5 wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-867f67f55d-kvdl2 -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod | grep -E ''NAME|worker''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod | grep worker\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-5bdc5699bb-kcv9q -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod --show-labels | grep worker\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy allow-required-egress -n awoooi-prod -o yaml\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl patch networkpolicy allow-required-egress -n awoooi-prod --type=''json'' -p=''[{\"\"op\"\": \"\"replace\"\", \"\"path\"\": \"\"/spec/podSelector/matchLabels\"\", \"\"value\"\": {\"\"system\"\": \"\"awoooi\"\"}}]''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout restart deployment/awoooi-worker -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-5bdc5699bb-kcv9q -n awoooi-prod --tail=15\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=40\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod 2>&1 | grep -E ''signal_worker|redis_pool|INFO'' | tail -10\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s http://localhost:32334/api/v1/health\")",
"Bash(ssh wooo@192.168.0.120 'curl -s -X POST \"\"http://localhost:32334/api/v1/webhooks/signals\"\" -H \"\"Content-Type: application/json\"\" -d \"\"{:*)",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod | grep -E ''NAME|worker|api''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod && echo ''==='' && kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s http://localhost:32334/api/v1/incidents?limit=5\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s http://localhost:32334/api/v1/approvals/pending\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod 2>&1 | head -50\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s http://localhost:32334/api/v1/health | jq ''.components''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get secret -n awoooi-prod -o name\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath=''{.data.WEBHOOK_HMAC_SECRET}'' | base64 -d\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=20 2>&1 | grep -E ''signal|incident|telegram|INFO''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s ''http://localhost:32334/api/v1/incidents?limit=5''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod 2>&1 | grep -iE ''telegram|notification|send'' | tail -10\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s ''http://localhost:32334/api/v1/approvals/pending''\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s ''http://localhost:32334/api/v1/incidents?limit=2'' && echo ''---'' && curl -s ''http://localhost:32334/api/v1/approvals/pending''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod | grep worker && echo ''---'' && kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-6b8cc94d9c-xjdwr -n awoooi-prod --tail=40\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy allow-required-egress -n awoooi-prod -o jsonpath=''{.spec.podSelector}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl patch networkpolicy allow-required-egress -n awoooi-prod --type=''json'' -p=''[{\"\"op\"\": \"\"replace\"\", \"\"path\"\": \"\"/spec/podSelector\"\", \"\"value\"\": {\"\"matchLabels\"\": {\"\"system\"\": \"\"awoooi\"\"}}}]''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl delete pod awoooi-worker-6b8cc94d9c-xjdwr -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-6b8cc94d9c-pmzj7 -n awoooi-prod --tail=30\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-6b8cc94d9c-pmzj7 -n awoooi-prod --tail=20\")",
"Bash(ls -la /Users/ogt/awoooi/apps/api/scripts/fire*.py)",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s ''http://localhost:32334/api/v1/incidents?limit=3''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod 2>&1 | grep -iE ''proposal|approval|llm|ai|ollama|generate'' | tail -20\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deployment awoooi-worker -n awoooi-prod -o jsonpath=''{.spec.template.spec.containers[0].envFrom}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deployment awoooi-api -n awoooi-prod -o jsonpath=''{.spec.template.spec.containers[0].envFrom}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get configmap awoooi-config -n awoooi-prod -o jsonpath=''''{.data}''''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath=''{.data}'' | tr '','' ''\\\\n''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl exec deployment/awoooi-api -n awoooi-prod -- python -c ''import os; print\\(os.getenv\\(\"\"DATABASE_URL\"\", \"\"NOT SET\"\"\\)[:50]\\)''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-75ffbfb88b-2htfh -n awoooi-prod --tail=50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl exec awoooi-api-6687db5564-rv755 -n awoooi-prod -- env | grep DATABASE\")",
"Bash(ssh wooo@192.168.0.120 \"PGPASSWORD=''CHANGE_ME'' psql -h 192.168.0.188 -U awoooi -d awoooi_prod -c ''SELECT 1'' 2>&1 || echo ''Connection failed''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod\")",
"Bash(curl -sv http://192.168.0.120:32334/api/v1/health)",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-75ffbfb88b-2htfh -n awoooi-prod --tail=20 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-7fb7d5b55f-n48gk -n awoooi-prod --tail=20 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get rs -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl scale rs awoooi-api-75ffbfb88b -n awoooi-prod --replicas=0\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl scale rs awoooi-worker-7fb7d5b55f -n awoooi-prod --replicas=0\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=10\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deploy -n awoooi-prod -o wide\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deploy awoooi-api -n awoooi-prod -o jsonpath=''{.spec.replicas}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deploy awoooi-worker -n awoooi-prod -o jsonpath=''{.spec.replicas}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=5s\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout history deployment/awoooi-api -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout undo deployment/awoooi-api -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout undo deployment/awoooi-worker -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=30s\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get rs awoooi-api-6687db5564 -n awoooi-prod -o jsonpath=''{.metadata.annotations.deployment\\\\.kubernetes\\\\.io/revision}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl delete pod awoooi-api-7f487f7cbb-5f88g -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout undo deployment/awoooi-api -n awoooi-prod --to-revision=46\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --tail=15\")",
"Bash(curl -s http://192.168.0.120:32334/api/v1/incidents?limit=3)",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --since=2m\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --since=2m | grep -i webhook\")",
"Bash(curl -sv -X POST http://192.168.0.120:32334/api/v1/webhooks/alertmanager -H \"Content-Type: application/json\" -d '{:*)",
"Bash(ssh wooo@192.168.0.120 \"kubectl get endpoints -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"curl -s http://localhost:32334/api/v1/health | jq ''{status}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-worker -n awoooi-prod --since=30s\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-fc4744758-7wfv5 -n awoooi-prod --tail=30 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-worker-6fc548887b-b9mtf -n awoooi-prod --tail=30 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get configmap awoooi-config -n awoooi-prod -o yaml\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath=''''{.data}''''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pod awoooi-worker-6fc548887b-b9mtf -n awoooi-prod -o jsonpath=''{.metadata.labels}''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get networkpolicy -n awoooi-prod -o yaml\")",
"Bash(ssh wooo@192.168.0.120 'kubectl patch networkpolicy allow-required-egress -n awoooi-prod --type=json -p=\"\"[{\\\\\"\"op\\\\\"\": \\\\\"\"replace\\\\\"\", \\\\\"\"path\\\\\"\": \\\\\"\"/spec/podSelector/matchLabels\\\\\"\", \\\\\"\"value\\\\\"\": {\\\\\"\"system\\\\\"\": \\\\\"\"awoooi\\\\\"\"}}]\"\"')",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout restart deployment/awoooi-api deployment/awoooi-worker -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs awoooi-api-6c69b77894-d6jqq -n awoooi-prod --tail=20\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl run nc-test --rm -it --restart=Never --image=busybox -- nc -zv 192.168.0.188 5432\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get pods -n awoooi-prod -o=custom-columns=''NAME:.metadata.name,IMAGE:.spec.containers[0].image''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl exec awoooi-api-6687db5564-rv755 -n awoooi-prod -- ls -la *.db 2>/dev/null || echo ''No SQLite files''\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl exec awoooi-api-6687db5564-rv755 -n awoooi-prod -- env | grep -E ''MOCK|DATABASE|SQLITE''\")",
"Bash(curl -s \"http://192.168.0.120:32334/api/v1/approvals\")",
"Bash(python -m py_compile src/lewooogo_brain/engines/incident_engine.py src/lewooogo_brain/engines/proposal_engine.py src/lewooogo_brain/skills/loader.py)",
"Bash(python packages/lewooogo-brain/tests/test_skill_loader.py)",
"Bash(python packages/lewooogo-brain/tests/test_incident_engine.py)",
"Bash(python packages/lewooogo-brain/tests/test_guardrails.py)",
"Bash(python -m py_compile src/lewooogo_brain/engines/proposal_engine.py src/lewooogo_brain/engines/incident_engine.py src/lewooogo_brain/skills/loader.py)",
"Bash(PYTHONPATH=/Users/ogt/awoooi/packages/lewooogo-brain/src python -c \":*)",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8000/api/v1/health)",
"Bash(curl -s \"https://awoooi.wooo.work/api/v1/approvals/pending\")",
"Bash(curl -s \"https://awoooi.wooo.work/api/v1/approvals?status=pending\")",
"Bash(curl -s \"https://awoooi.wooo.work/api/v1/incidents\")",
"Bash(uv sync:*)",
"Bash(python -c \"from src.routers.proposals import router; print\\(''✅ Router 語法驗證通過''\\)\")",
"Bash(curl -s -X GET \"https://awoooi.wooo.work/api/v1/health\" --connect-timeout 10)",
"Bash(curl -s -X GET \"https://awoooi.wooo.work/api/v1/incidents\" --connect-timeout 10)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" \"https://awoooi.wooo.work\" --connect-timeout 10)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" -L \"https://awoooi.wooo.work\" --connect-timeout 10)",
"Bash(curl -s -X POST \"https://awoooi.wooo.work/api/v1/incidents/test-123/propose\" -H \"Content-Type: application/json\" -d '{\"\"require_dry_run\"\": true}' --connect-timeout 10)",
"Bash(ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ollama@192.168.0.120 \"kubectl get pods -n awoooi-prod -o wide\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs awoooi-api-64c8659cff-grslz -n awoooi-prod --tail=50)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath='{.data.DATABASE_URL}')",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl rollout restart deployment/awoooi-api -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -n awoooi-prod -l app=awoooi-api)",
"Bash(curl -s \"https://awoooi.wooo.work/api/v1/health\" --connect-timeout 10)",
"Bash(curl -s -o /dev/null -w \"%{http_code}\" -L \"https://awoooi.wooo.work/zh-TW\" --connect-timeout 10)",
"Bash(python -c \"from src.routers.proposals import router; print\\(''✅ Router import successful''\\)\")",
"Bash(PGPASSWORD=postgres psql -h 192.168.0.188 -U awoooi -d awoooi_dev -c \"SELECT incident_id, status, severity FROM incidents LIMIT 5;\")",
"Bash(PGPASSWORD=AwoooiProd2026 psql -h 192.168.0.188 -U awoooi -d awoooi_prod -c \"SELECT incident_id, status, severity FROM incidents LIMIT 5;\")",
"Bash(curl -sf http://192.168.0.120:32334/api/v1/incidents)",
"Bash(curl -v \"http://192.168.0.120:32334/api/v1/incidents\")",
"Bash(export KUBECONFIG=/Users/ogt/.kube/config-120)",
"Bash(curl -sI \"http://awoooi.wooo.work/\")",
"Bash(openssl s_client -servername awoooi.wooo.work -connect awoooi.wooo.work:443)",
"Bash(openssl x509:*)",
"Bash(curl -s -X POST \"http://192.168.0.120:32334/api/v1/incidents/INC-20260323-7DE10B/propose\" -H \"Content-Type: application/json\" -d '{\"\"\"\"require_dry_run\"\"\"\": true}')",
"Bash(python -c \"from src.services.executor import execute_approved_proposal, get_executor, ActionExecutor; print\\(''✅ Import successful''\\)\")",
"Bash(curl -s https://awoooi.woooo.cc/api/v1/incidents)",
"Bash(curl -s https://awoooi.woooo.cc/api/v1/health)",
"Bash(curl -s --connect-timeout 10 https://awoooi.woooo.cc/api/v1/health)",
"Bash(ssh ogt@192.168.70.202 \"sudo kubectl get pods -n awoooi 2>/dev/null\")",
"Bash(curl -s --connect-timeout 5 http://192.168.70.200:8000/api/v1/health)",
"Bash(ssh ogt@192.168.70.202 \"sudo kubectl get pods -n awoooi-prod\")",
"Bash(ssh -o StrictHostKeyChecking=no ogt@192.168.70.202 \"sudo kubectl get pods -n awoooi-prod\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -A)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-worker-7479556d76-jbbps --tail 30)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod -l app=awoooi-api --tail 20)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n awoooi-prod deployment/awoooi-api -- curl -s http://localhost:8000/api/v1/incidents)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n awoooi-prod deployment/awoooi-api -- python -c \"import httpx; r = httpx.get\\(''http://localhost:8000/api/v1/incidents''\\); print\\(r.text\\)\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get ingress -n awoooi-prod -o wide)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get svc -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get deployment awoooi-worker -n awoooi-prod -o jsonpath='{.spec.template.spec.containers[0].env}')",
"Bash(curl -s --connect-timeout 5 http://192.168.70.202:32334/api/v1/health)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl describe deployment awoooi-worker -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get configmap -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl describe deployment awoooi-api -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get configmap awoooi-config -n awoooi-prod -o yaml)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get secrets -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath='{.data}')",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get secret awoooi-secrets -n awoooi-prod -o jsonpath='{.data.REDIS_URL}')",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl rollout restart deployment/awoooi-worker -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -n awoooi-prod -l app=awoooi-worker)",
"Bash(curl -s --connect-timeout 5 https://awoooi.wooo.work/api/v1/health)",
"Bash(curl -s https://awoooi.wooo.work/api/v1/incidents)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod -l app=awoooi-worker --tail 10)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get svc -n wooo-aiops-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get svc -A)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-worker-76bdf9786d-rvtmz --tail 15)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n awoooi-prod deployment/awoooi-api -- python -c \"import os; print\\(os.getenv\\(''REDIS_URL'', ''NOT_SET''\\)\\)\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get deployment awoooi-api -n awoooi-prod -o yaml)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl rollout restart deployment/awoooi-api deployment/awoooi-worker -n awoooi-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-api-865cdc97db-6mpzz --tail 20)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -n wooo-aiops-prod -l app=redis)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get pods -n wooo-aiops-prod)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n wooo-aiops-prod redis-6c6fcd64b8-8wznx -- redis-cli ping)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n awoooi-prod awoooi-api-6445c76797-mrl7p -- python -c \"import redis; r=redis.Redis\\(host=''10.43.239.47'', port=6379, db=10\\); print\\(r.ping\\(\\)\\)\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get networkpolicy -A)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get networkpolicy allow-required-egress -n awoooi-prod -o yaml)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl patch networkpolicy allow-required-egress -n awoooi-prod --type='json' -p='[{\"\"op\"\": \"\"add\"\", \"\"path\"\": \"\"/spec/egress/0/ports/-\"\", \"\"value\"\": {\"\"port\"\": 6379, \"\"protocol\"\": \"\"TCP\"\"}}]')",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-api-5fcc484b85-qpwt6 --tail 15)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl exec -n awoooi-prod awoooi-api-6445c76797-mrl7p -- python -c \"import os; print\\(''REDIS_URL:'', os.getenv\\(''REDIS_URL''\\)\\); import redis; r=redis.Redis.from_url\\(os.getenv\\(''REDIS_URL''\\)\\); print\\(''PING:'', r.ping\\(\\)\\)\")",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-worker-59d7588d75-p5tht --tail 20)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod -l app=awoooi-worker --tail 30)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get deployment awoooi-worker -n awoooi-prod -o yaml)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get networkpolicy -n awoooi-prod -o wide)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl apply -f -)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs -n awoooi-prod awoooi-worker-6cd7dcbc9-5mtfq --tail 15)",
"Bash(jq .incidents[0])",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl get configmap awoooi-config -n awoooi-prod -o jsonpath='{.data.OPENCLAW_URL}')",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8088/health)",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8088/)",
"Bash(nc -zv 192.168.0.188 8088 -w 5)",
"Bash(ping -c 2 192.168.0.188)",
"Bash(ping -c 2 192.168.70.202)",
"Bash(grep -n \"mapToDualState\" /Users/ogt/awoooi/apps/web/src/app/[locale]/page.tsx -A 30)",
"Bash(head -40 /Users/ogt/awoooi/apps/web/src/app/[locale]/page.tsx)",
"Bash(ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no ollama@192.168.0.188 \"docker ps -a | grep -i claw; docker start openclaw 2>/dev/null || docker start clawbot 2>/dev/null || echo ''Container not found, listing all:'' && docker ps -a --format ''table {{.Names}}\\\\t{{.Status}}'' | head -10\")",
"Bash(curl -s --connect-timeout 5 http://192.168.0.188:8089/health)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl rollout status deployment/awoooi-web -n awoooi-prod --timeout=60s)",
"Bash(grep -rn \"clawbot\\\\|ClawBot\" /Users/ogt/awoooi/ --include=*.yaml --include=*.yml --include=*.json)",
"Bash(grep -rn \"ClawBot\\\\|clawbot\" /Users/ogt/awoooi/apps/ --include=*.py --include=*.ts --include=*.tsx)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs deployment/awoooi-api -n awoooi-prod --tail=100)",
"Bash(KUBECONFIG=/Users/ogt/awoooi/apps/api/k3s-prod.yaml kubectl logs deployment/awoooi-api -n awoooi-prod --tail=200)",
"Bash(export KUBECONFIG=/Users/ogt/awoooi/k3s-prod.yaml)",
"Bash(ssh root@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=200 2>&1 | grep -iE ''error|fail|exception|execute|background|parse'' | tail -40\")",
"Bash(curl -s https://awoooi.wooo.work/api/v1/approvals)",
"Bash(ssh k3s@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=200 2>&1 | grep -iE ''error|fail|execute|background|parse'' | tail -40\")",
"Bash(ssh ubuntu@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=200 2>&1 | grep -iE ''error|fail|execute|background|parse'' | tail -40\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=200 2>&1 | grep -iE ''error|fail|execute|background|parse|skip'' | tail -50\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=500 2>&1 | grep -iE ''background_execution|approve_action|reject|k8s_executor'' | tail -30\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl get deploy,sts -n awoooi-prod\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl rollout status deployment/awoooi-api -n awoooi-prod --timeout=120s 2>&1\")",
"Bash(ssh wooo@192.168.0.120 \"kubectl logs deployment/awoooi-api -n awoooi-prod --tail=50 2>&1 | grep -iE ''background_execution|k8s_executor|parse'' | tail -10\")"
],
"additionalDirectories": [
"/Users/ogt/awoooi/docs",
"/Users/ogt/.claude/projects/-Users-ogt-awoooi/memory",
"/Users/ogt/awoooi/apps/web/src/app",
"/Users/ogt/awoooi/apps/api",
"/Users/ogt/awoooi/apps/api/http:/localhost:8000/api/v1",
"/Users/ogt/awoooi/apps/web/public",
"/Users/ogt/Downloads",
"/Users/ogt/awoooi/apps/web/test-results",
"/Users/ogt/awoooi",
"/Users/ogt/awoooi/apps/web/src/app/[locale]",
"/tmp"
]
}
}

94
.github/workflows/cd.yaml vendored Normal file
View File

@@ -0,0 +1,94 @@
name: CD
on:
push:
branches: [main]
paths-ignore:
- 'docs/**'
- '*.md'
env:
REGISTRY: 192.168.0.110:5000
IMAGE_PREFIX: library/awoooi
jobs:
# ==================== Build & Push Images ====================
build-images:
name: Build & Push Images
runs-on: self-hosted
strategy:
matrix:
app: [web, api]
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to WOOO Harbor
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.HARBOR_USER }}
password: ${{ secrets.HARBOR_PASSWORD }}
- name: Generate image tag
id: tag
run: |
SHA=$(git rev-parse --short HEAD)
RUN_ID=${{ github.run_id }}
echo "tag=${SHA}-${RUN_ID}" >> $GITHUB_OUTPUT
- name: Build & Push to Harbor
uses: docker/build-push-action@v5
with:
context: .
file: apps/${{ matrix.app }}/Dockerfile
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_PREFIX }}-${{ matrix.app }}:${{ steps.tag.outputs.tag }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Output image tag
run: |
echo "::notice::Image pushed: ${{ env.REGISTRY }}/${{ env.IMAGE_PREFIX }}-${{ matrix.app }}:${{ steps.tag.outputs.tag }}"
# ==================== Deploy to UAT ====================
deploy-uat:
name: Deploy to UAT
runs-on: self-hosted
needs: build-images
environment: uat
steps:
- uses: actions/checkout@v4
- name: Setup Kubeconfig
run: |
mkdir -p ~/.kube
echo "${{ secrets.KUBE_CONFIG_UAT }}" | base64 -d > ~/.kube/config
chmod 600 ~/.kube/config
- name: Generate image tag
id: tag
run: |
SHA=$(git rev-parse --short HEAD)
RUN_ID=${{ github.run_id }}
echo "tag=${SHA}-${RUN_ID}" >> $GITHUB_OUTPUT
- name: Deploy with Kustomize
run: |
cd k8s/overlays/uat
kustomize edit set image \
awoooi-web=${{ env.REGISTRY }}/${{ env.IMAGE_PREFIX }}-web:${{ steps.tag.outputs.tag }} \
awoooi-api=${{ env.REGISTRY }}/${{ env.IMAGE_PREFIX }}-api:${{ steps.tag.outputs.tag }}
kubectl apply -k .
- name: Wait for rollout
run: |
kubectl rollout status deployment/awoooi-web -n awoooi-uat --timeout=300s
kubectl rollout status deployment/awoooi-api -n awoooi-uat --timeout=300s
- name: Health check
run: |
sleep 10
curl -f https://api-uat.awoooi.wooo.work/v1/health || exit 1

230
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,230 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
NODE_VERSION: '20'
PNPM_VERSION: '9'
PYTHON_VERSION: '3.11'
jobs:
# ==================== Lint & Type Check ====================
lint:
name: Lint & Type Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v3
with:
version: ${{ env.PNPM_VERSION }}
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Lint
run: pnpm lint
- name: Type check
run: pnpm typecheck
- name: ADR Compliance Check
run: |
echo "🔍 正在檢查是否違反 ADR 規定..."
# 檢查 1: 前端禁止直連資料庫 (違反 ADR-005 BFF 原則)
if grep -rE "psycopg2|asyncpg|redis|sqlalchemy|pg|ioredis" apps/web/src/ 2>/dev/null; then
echo "❌ 嚴重違規 (ADR-005): 前端程式碼中發現直連資料庫的套件!"
exit 1
fi
# 檢查 2: 狀態管理嚴禁使用 Redux (違反 ADR-004 必須用 Zustand)
if grep -rE "@reduxjs/toolkit|react-redux" apps/web/package.json 2>/dev/null; then
echo "❌ 違規 (ADR-004): 發現 Redux請全面改用 Zustand"
exit 1
fi
# 檢查 3: 禁止 import 舊專案 (違反 .awoooi-agent-rules.md)
if grep -rE "from ['\"].*wooo-aiops" apps/ packages/ 2>/dev/null; then
echo "❌ 嚴重違規: 禁止 import 舊專案 wooo-aiops"
exit 1
fi
# 檢查 4: 禁止硬編碼機密
if grep -rE "(sk-[a-zA-Z0-9]{20,}|password\s*=\s*['\"][^'\"]+['\"])" apps/ packages/ 2>/dev/null; then
echo "❌ 嚴重違規: 發現硬編碼機密!"
exit 1
fi
echo "✅ ADR 規範檢查通過!"
# ==================== Test ====================
test:
name: Test
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v3
with:
version: ${{ env.PNPM_VERSION }}
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Run tests
run: pnpm test --coverage
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
# ==================== Build ====================
build:
name: Build
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v3
with:
version: ${{ env.PNPM_VERSION }}
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Setup Turborepo Cache
uses: dtinth/setup-github-actions-caching-for-turbo@v1
- name: Build packages
run: pnpm turbo build
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build-artifacts
path: |
apps/*/dist
packages/*/dist
retention-days: 7
# ==================== API (Python) ====================
api-lint:
name: API Lint (Python)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install uv
uses: astral-sh/setup-uv@v3
- name: Install dependencies
working-directory: apps/api
run: uv sync
- name: Lint with ruff
working-directory: apps/api
run: uv run ruff check .
- name: Type check with mypy
working-directory: apps/api
run: uv run mypy .
api-test:
name: API Test (Python)
runs-on: ubuntu-latest
needs: api-lint
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install uv
uses: astral-sh/setup-uv@v3
- name: Install dependencies
working-directory: apps/api
run: uv sync
- name: Run tests
working-directory: apps/api
run: uv run pytest --cov=src --cov-report=xml
# ==================== OpenAPI Validation ====================
openapi-validate:
name: Validate OpenAPI Spec
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- name: Install spectral
run: npm install -g @stoplight/spectral-cli
- name: Validate OpenAPI
run: spectral lint docs/api/api-contract.yaml
# ==================== Docker Build (驗證 Dockerfile) ====================
docker-build:
name: Docker Build Verify
runs-on: ubuntu-latest
needs: [test, api-test, build]
strategy:
matrix:
app: [web, api]
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image (no push)
uses: docker/build-push-action@v5
with:
context: .
file: apps/${{ matrix.app }}/Dockerfile
push: false
tags: awoooi-${{ matrix.app }}:test
cache-from: type=gha
cache-to: type=gha,mode=max

1
.gitignore vendored
View File

@@ -29,6 +29,7 @@ ENV/
# 環境變數與機密 (絕對不能進 Git)
.env
.env.*
.env.local
.env.*.local
*.pem

105
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,105 @@
# AWOOOI Pre-commit Configuration
# =================================
# Phase 5: 全自動防禦網
#
# Install: pre-commit install
# Run: pre-commit run --all-files
#
# Exit Codes:
# 0 = All checks passed
# 1 = Check failed (commit blocked)
default_language_version:
python: python3.11
repos:
# ==========================================================================
# Python Linting (Ruff)
# ==========================================================================
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.0
hooks:
- id: ruff
name: 🐍 Ruff Lint (Python)
args: [--fix, --exit-non-zero-on-fix]
files: ^apps/api/
types: [python]
- id: ruff-format
name: 🐍 Ruff Format (Python)
files: ^apps/api/
types: [python]
# ==========================================================================
# TypeScript Linting (ESLint)
# ==========================================================================
- repo: local
hooks:
- id: eslint
name: 🟦 ESLint (TypeScript)
entry: pnpm --filter @awoooi/web exec eslint --fix
language: system
files: ^apps/web/.*\.(ts|tsx)$
pass_filenames: false
- id: tsc-typecheck
name: 🔷 TypeScript Type Check
entry: pnpm --filter @awoooi/web exec tsc --noEmit
language: system
files: ^apps/web/.*\.(ts|tsx)$
pass_filenames: false
# ==========================================================================
# General Checks
# ==========================================================================
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
name: 🧹 Trailing Whitespace
exclude: ^(.*\.md|.*\.diff)$
- id: end-of-file-fixer
name: 📄 End of File Fixer
exclude: ^(.*\.md)$
- id: check-yaml
name: 📋 YAML Syntax Check
- id: check-json
name: 📋 JSON Syntax Check
- id: check-added-large-files
name: 📦 Large File Check
args: ['--maxkb=1000']
- id: detect-private-key
name: 🔐 Private Key Detection
# ==========================================================================
# Secrets Detection
# ==========================================================================
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
name: 🔒 Secrets Detection
args: ['--baseline', '.secrets.baseline']
exclude: (pnpm-lock.yaml|package-lock.json)
# ==========================================================================
# AI Code Review (Ollama)
# ==========================================================================
- repo: local
hooks:
- id: ai-code-reviewer
name: 🤖 AI Code Reviewer (Ollama)
entry: python scripts/ai_code_reviewer.py
language: python
pass_filenames: false
additional_dependencies: [httpx]
stages: [commit]
# 僅在有 Python 或 TypeScript 變更時執行
files: \.(py|ts|tsx)$
# fail-open: AI 審查失敗不阻止 commit
verbose: true

116
.secrets.baseline Normal file
View File

@@ -0,0 +1,116 @@
{
"version": "1.4.0",
"plugins_used": [
{
"name": "ArtifactoryDetector"
},
{
"name": "AWSKeyDetector"
},
{
"name": "AzureStorageKeyDetector"
},
{
"name": "Base64HighEntropyString",
"limit": 4.5
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"name": "DiscordBotTokenDetector"
},
{
"name": "GitHubTokenDetector"
},
{
"name": "HexHighEntropyString",
"limit": 3.0
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "JwtTokenDetector"
},
{
"name": "KeywordDetector",
"keyword_exclude": ""
},
{
"name": "MailchimpDetector"
},
{
"name": "NpmDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "SendGridDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "SquareOAuthDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_baseline_file",
"filename": ".secrets.baseline"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_lock_file"
},
{
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
},
{
"path": "detect_secrets.filters.heuristic.is_swagger_file"
},
{
"path": "detect_secrets.filters.heuristic.is_templated_secret"
}
],
"results": {},
"generated_at": "2026-03-21T10:00:00Z"
}

View File

@@ -36,6 +36,45 @@
索引文件:`MEMORY.md`
## 自動化工作流 (2026-03-23 統帥授權)
| Automation | 路徑 | 用途 |
|------------|------|------|
| 開發循環 | `.agents/automations/01-dev-cycle.md` | 修改後自動檢查 |
| 部署驗證 | `.agents/automations/02-deploy-verify.md` | 部署後自動驗證 |
| Memory 同步 | `.agents/automations/03-memory-sync.md` | 任務完成自動更新 |
### Tier 分級 (自動化程度)
| Tier | 說明 | 範例 |
|------|------|------|
| 0 | ✅ 完全自動 | Read, Grep, curl 診斷 |
| 1 | ✅ 完全自動 | Edit, Write (非敏感路徑) |
| 2 | ⚡ 快速確認 | git commit, pnpm build |
| 3 | 🔐 詳細確認 | git push, kubectl apply |
## 多視窗協調 (2026-03-23 統帥授權)
| 視窗 | 角色 | 負責目錄 |
|------|------|---------|
| A | 架構師 | docs/ + memory/ + 跨域協調 |
| B | 前端 | apps/web/** |
| C | 後端 | apps/api/** + packages/** |
| D | UI/UX | components/** + tailwind |
| E | 資安 | NetworkPolicy + Secrets |
| F | CI/CD | .github/ + k8s/** |
### 視窗管理指令
```
/視窗 新增 G[角色]
/視窗 調整 D[新角色]
/視窗 刪除 F
/視窗 查看
```
詳細協議: `memory/reference_multiwindow_protocol.md`
## 2026-03-23 Props Mapping 教訓
> **事故**: Y/n 按鈕灰色無法點擊,因為 `mapToDualState()` 遺漏傳遞 `decision` 欄位

345
GLOBAL_RULES.md Normal file
View File

@@ -0,0 +1,345 @@
# AWOOOI 專案開發憲法與行為準則
> **本文件為 AWOOOI 專案的最高行為準則。所有開發成員必須 100% 嚴格遵守,沒有例外!**
---
## 第一章Triage (傷患分級) 異常處理鐵律
### 🔴 紅燈異常 (立刻停機修復)
以下情況視為紅燈異常,必須**立刻停止所有新功能開發**
- 架構阻斷
- API 無法連線 (CORS / Failed to fetch)
- 編譯失敗
- 嚴重的資安漏洞 (如 Multi-Sig 邏輯錯誤)
**行為準則:**
> 底層斷了,上面蓋的 UI 也只是壞的。優先修復紅燈,禁止繞過!
### 🟡 黃燈異常 (記錄 Backlog延後處理)
以下情況視為黃燈異常,不應打斷開發心流:
- UI 排版稍微跑位
- 非關鍵字的 i18n 翻譯遺漏
- 非阻斷性的 Warning
**行為準則:**
> 記錄進 WBS 待辦清單,集中在 Phase 結束前的「Bug Bash」一次解決。
---
## 第二章0 個 Hardcode 字串與 i18n 清零鐵律
### 最高憲法
**前端 UI 代碼絕對禁止出現任何寫死的中文或英文字串!**
所有 UI 文字必須 100% 透過 `next-intl` 從字典檔提取,包含但不限於:
- 按鈕文字
- 標籤與標題
- 狀態文字
- 列舉值顯示 (如 CRITICAL → 危急)
- 錯誤訊息
- 表單欄位標籤
- Tooltip 與提示文字
### 優先級
| 優先級 | 語系 | 說明 |
|--------|------|------|
| 1 | 繁體中文 (zh-TW) | **最高優先級預設顯示** |
| 2 | 英文 (EN) | 雙軌並行 |
**Hardcoded English 視為開發失敗!**
### 範例
```tsx
// ❌ 錯誤 - Hardcode (違憲)
<span>CRITICAL</span>
<button>Approve</button>
<span>No recent backup!</span>
// ✅ 正確 - 使用 next-intl
const t = useTranslations('risk')
const tDryRun = useTranslations('dryRun')
<span>{t('critical')}</span>
<button>{t('approve')}</button>
<span>{tDryRun('noRecentBackup')}</span>
```
### 違規處理
**違背此規則視為開發失敗,必須立即修正後才能繼續其他任務!**
---
## 第三章:防禦性工程與 Zero Trust 鐵律
### 1. 先質疑,後實作 (Fail Fast & Ask)
遇到以下架構盲區時,**絕對禁止自行假設或使用脆弱的臨時方案**
- 缺乏認證憑證
- 狀態機定義不完整
- 可能導致資料遺失 (如 In-memory 儲存稽核日誌)
**行為準則:**
> 必須立刻暫停實作,列出選項並向統帥回報 Blocker。
### 2. 零信任預設 (Zero Trust Defaults)
所有環境變數與安全配置,必須預設為最嚴格狀態:
- `MOCK_MODE=False`
- 禁止 CORS `*`
- 禁止重複簽核
- 禁止跳過驗證
### 3. 強制乾跑 (Dry-run Mandatory)
任何牽涉到基礎設施變更的破壞性操作,**必須在程式碼層級實作並呼叫 Dry-run預檢機制**
- K8s API 操作
- SSH 命令執行
- Database Drop/Truncate
- 任何不可逆操作
### 4. 邊界預判 (Edge Case Anticipation)
寫任何邏輯前,必須先思考並實作防呆機制:
- 「如果網路斷線怎麼辦?」→ 重試機制
- 「如果使用者連按兩次怎麼辦?」→ 冪等性設計
- 「如果 K8s API 回應超時怎麼辦?」→ 超時處理
---
## 第四章CPO 絕對美學與品牌靈魂鐵律
### 1. Pixel-Perfect 細節至上
UI 實作必須嚴格講究:
| 要素 | 標準 |
|------|------|
| Padding/Margin | 必須有「呼吸感」,絕不允許擁擠 |
| Typography | 字體大小與粗細必須建立清晰的視覺層級 |
| 邊框與陰影 | 使用微妙的 border-opacity 與 subtle shadows |
| 質感 | Nothing.tech 那種「通透感與極簡」 |
**禁止事項:**
- 禁止使用預設的、廉價的樣式
- 禁止元素不對齊
- 禁止忽略 hover/active 狀態的視覺回饋
### 2. 生物機械有機進化
IT AI 的 UI 不要硬綁綁!視覺上必須融合:
| 風格來源 | 精髓 |
|----------|------|
| openclaw.ai | 有機、流線、親和力 |
| Nothing.tech | 通透、工業風、極簡 |
**禁止生硬的幾何設計!**
### 3. 品牌靈魂 - Claw 設計語言
AWOOOI 的核心品牌意象為「智慧之眼機械爪 (Mechanical Claw)」:
- Logo 必須體現「Claw」精密抓取的意象
- 側邊欄展開/折疊應模擬爪子開合
- HITL 批准動畫應呈現爪子鎖定的效果
- 顏色基調:純白工業風、金屬光澤、科技感
### 4. CSS 代碼去背 SOP (CRITICAL)
當整合 Raster 圖像 (JPEG/PNG) 資產時:
**絕對禁止直接放上死白貼紙!**
必須強制套用 CSS 技術,將純白背景濾除:
```tsx
// ✅ 正確 - mix-blend-mode 去背
<img
src="/logo-claw.png"
className="mix-blend-multiply contrast-[1.1] saturate-[1.1]"
/>
// ✅ 備選 - mask-image 去背
<div
style={{
maskImage: 'url(/logo-claw.png)',
maskSize: 'contain',
backgroundColor: 'currentColor',
}}
/>
```
**目標:讓有機設計看起來刻在玻璃 UI 上!**
### 5. 跨界協作 - Gemini 資產生成 SOP
本專案嚴禁使用:
- 醜陋的純文字 Placeholder
- 隨便找的開源 Icon 來充當核心視覺資產
**當需要高質感視覺資產時:**
1. 在終端機輸出一段『給 Gemini 的圖像生成提示詞 (Prompt)』
2. 標註資產規格(尺寸、格式、透明背景需求)
3. 統帥將該提示詞交給 Gemini 生成完美圖檔
4. 收到圖檔後整合至專案(使用 CSS 去背 SOP
---
## 第五章:開發階段與視覺素材戰略 (Phased Visual Strategy)
### Phase 1 & 2 (當前階段) - 核心引擎與真實數據 (Function over Form)
**絕對禁止**在此階段耗費時間進行:
- UI 打磨
- 複雜 SVG/PNG 素材替換
- 微動畫設計
- Logo 視覺調整
**視覺降級為『乾淨的 Wireframe 級別』**
- 使用純文字 Typography
- 標準 Tailwind CSS 即可
- 簡潔的 CSS 呼吸燈代替圖片 Logo
**唯一目標**
1. 100% 真實 API 資料貫通
2. Multi-Sig 邏輯實作
3. i18n 字串清零
4. **消滅所有 Mock Data**
### Phase 4 (未來階段) - 視覺靈魂注入 (Visual Soul Injection)
**啟動條件**:所有後端資料欄位、狀態機與 API **100% 確定不改動**後,才准啟動此階段。
**屆時將統一實作**
- Q 版、玩具感 (Toy-ish) 的流線型 ClawBot 品牌資產
- 色彩鮮明的視覺設計
- 精緻的微動畫效果
- 統帥親自批准的品牌視覺素材
---
## 第六章:決策支援協定 (Decision Support Protocol)
### 情報完整性
在遇到需要統帥(使用者)進行重大架構、功能或視覺決策的十字路口時,**絕對禁止只拋出問題而不給予分析**。
### 標準回報格式
任何決策請求,**必須包含以下三個完整板塊**
#### 1. 現況盤點 (Context)
- 我們現在在哪裡?
- 遇到了什麼瓶頸或機會?
- 相關的技術背景與約束條件
#### 2. 戰略選項 (Options)
列出可行的路線,並詳述各自的優劣:
| 選項 | 優勢 (Pros) | 風險與代價 (Cons) |
|------|-------------|-------------------|
| Path A | ... | ... |
| Path B | ... | ... |
| Path C | ... | ... |
#### 3. 首席架構師的明確建議 (Architect's Recommendation)
AI 必須根據專案的最終目標,給出**一個最推薦的選項**,並附上強而有力的理由:
```
📌 建議選擇Path X
理由:
1. [具體原因 1]
2. [具體原因 2]
3. [與專案目標的契合度]
```
### 禁止事項
- ❌ 只拋出問題,讓統帥自己想答案
- ❌ 列出選項但不給建議
- ❌ 給出模稜兩可的「都可以」回答
- ❌ 缺乏具體分析的空泛建議
---
## 第七章:視覺資產協作規範 (Asset Collaboration Protocol)
### 1. 前期階段 (當前) - 純代碼視覺鐵律
**絕對禁止**要求統帥(使用者)手動下載、搬運實體圖檔 (PNG/JPG/SVG)。
**替代方案:**
| 場景 | 正確做法 |
|------|----------|
| Logo | 使用 lucide-react 圖示 + CSS Typography (如 `Bot`, `Cpu`, `Brain`) |
| 圖示 | 使用 lucide-react 圖標庫 (`AlertTriangle`, `Shield`, `Server` 等) |
| 狀態指示器 | 使用純 CSS 呼吸燈、脈動效果 (`animate-ping`, `animate-pulse`) |
| 品牌色塊 | 使用 Tailwind 漸層背景 (`bg-gradient-to-br`) |
| Placeholder | 使用高質感的 CSS 色塊 + 字體排版 |
**範例:**
```tsx
// ❌ 錯誤 - 依賴實體圖片
<img src="/logo-claw.png" alt="Logo" />
// ✅ 正確 - 純代碼方案
import { Bot, Sparkles } from 'lucide-react'
<div className="flex items-center gap-3">
<div className="w-10 h-10 rounded-xl bg-claw-blue/10 flex items-center justify-center">
<Bot className="w-6 h-6 text-claw-blue" />
</div>
<span className="font-mono font-bold tracking-widest">AWOOOI</span>
</div>
```
### 2. 最終階段 (延後執行) - 品牌資產批次替換
**啟動條件**:專案準備正式上線前,所有功能與 API 100% 穩定。
**屆時執行**
1. 由統帥統一提供高畫質 3D 渲染品牌圖檔
2. 一次性批次替換所有 Placeholder
3. 確保零破損的視覺升級
### 3. 違規處理
- ❌ 嘗試讀取 `/logo-claw.png` 或任何不存在的圖片
- ❌ 要求統帥下載並放入圖片檔案
- ❌ 使用 404 圖片導致 UI 破損
**以上行為視為開發失敗,必須立即修正!**
---
## 附錄:其他強制規則
| 規則 | 說明 |
|------|------|
| 禁止 UAT 環境 | 只有 Dev + Prod |
| API 路由規範 | 使用路徑路由 `/api/v1/` (非子域名) |
| Playwright 測試 | 必須啟用截圖與錄影 |
| 紅燈優先 | 遇到 API 阻斷等紅燈問題,必須優先修復才能開發新功能 |
| 純代碼視覺 | 前期階段使用 lucide-react + CSS禁止依賴實體圖片 |
---
*最後更新: 2026-03-20*
*版本: 2.3 (加入第七章:視覺資產協作規範)*

457
README.md
View File

@@ -1,89 +1,434 @@
# AWOOOI
> **AI + WOOO = AWOOOI**
>
> 下一代智能運維平台 | Next-Gen AIOps Platform
<p align="center">
<img src="docs/assets/data-pincer-logo.svg" alt="Data Pincer" width="120" />
</p>
<p align="center">
<strong>Zero-Touch Ops. Human-Centric Decisions.</strong>
</p>
---
## 概述
AWOOOI 是一個 **Agent-Centric** 的智能運維平台,採用 **leWOOOgo Engine** 模組化架構,讓 AI Agent 主動發現問題、分析根因、提出建議,由人類做最終決策。
### 核心理念
<div align="center">
```
AI 主動發現 → 智能分析 → 建議方案 → 人類批准 → 自動執行
█████╗ ██╗ ██╗ ██████╗ ██████╗ ██████╗ ██╗
██╔══██╗██║ ██║██╔═══██╗██╔═══██╗██╔═══██╗██║
███████║██║ █╗ ██║██║ ██║██║ ██║██║ ██║██║
██╔══██║██║███╗██║██║ ██║██║ ██║██║ ██║██║
██║ ██║╚███╔███╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║
╚═╝ ╚═╝ ╚══╝╚══╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝
```
### 設計風格
### **Zero-Touch Ops. Human-Centric Decisions.**
採用 **Nothing.tech** 極簡美學:
- 點陣字體 (NDot) - AI 介面
- 毛玻璃效果 (Glassmorphism)
- 黑白紅三色系
*AI-Powered Intelligent Operations Platform*
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![Next.js 14](https://img.shields.io/badge/Next.js-14-black.svg)](https://nextjs.org/)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue.svg)](https://www.typescriptlang.org/)
[Demo](#-quick-start) · [Documentation](#-architecture) · [Contributing](#-contributing)
</div>
---
## leWOOOgo 六大積木
## The Future of Operations is Here
| 積木 | 說明 | 範例 |
|------|------|------|
| **INPUT** | 觸發器 | Webhook, Cron, Alert |
| **BRAIN** | AI 處理 | LLM, RAG, Triage |
| **OUTPUT** | 通知 | Telegram, Slack |
| **ACTION** | 執行器 | K8s, SSH, API |
| **DATA** | 儲存 | Redis, PostgreSQL |
| **UI** | 介面 | Widget, Card |
> **When your system breaks at 3 AM, AWOOOI doesn't just alert you—it analyzes the blast radius, calculates how much money you're burning, and presents a one-click fix. You approve. It executes. You go back to sleep.**
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ ALERT: frontend 5xx rate > 15% │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GraphRAG │ ──▶ │ Dry-Run │ ──▶ │ Multi-Sig │ │
│ │ Analysis │ │ Simulation │ │ Approval │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Root Cause: Blast Radius: [x] devops-alice │
│ postgres-db 1 pod, 0 data loss [x] sre-bob │
│ │
│ Monthly Savings: $523.60 if fixed │
│ │
│ [ APPROVE & EXECUTE ] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
**AWOOOI** (AI + WOOO Intelligent Operations) transforms reactive firefighting into proactive, AI-assisted decision-making—while keeping humans firmly in control of critical actions.
---
## 快速開始
## Enterprise Moats
Four pillars that make AWOOOI enterprise-ready from Day 1:
### Privacy Shield
> **Your PII never leaves your premises. Period.**
```python
# Before: Raw sensitive data
"User 192.168.1.100 with email admin@company.com triggered alert"
# After: Consistent pseudonymization
"User [IP_1] with email [EMAIL_1] triggered alert"
# Same value → Same label (AI maintains context without seeing real data)
```
- Regex-based detection: IP, Email, UUID, API Keys, JWT
- Consistent hashing: `[IP_1]` always maps to the same IP within a session
- **Rehydration Engine**: Labels restored only at MCP execution boundary
- Zero PII in logs, zero PII to cloud LLMs
---
### GraphRAG: Topology-Aware Intelligence
> **AI that understands your microservices like a senior SRE.**
```
┌─────────────────────────────────────┐
│ BLAST RADIUS ANALYSIS │
│ (Upstream Impact) │
└─────────────────────────────────────┘
┌─────────────┐
│ ingress │ ← Will be affected
└──────┬──────┘
│ depends on
┌─────────────┐
│ frontend │ ← Target service
└──────┬──────┘
│ calls
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ auth-service │ │ product-api │ │ order-api │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└─────────────────────┼─────────────────────┘
┌──────────────┐
│ postgres-db │ X ROOT CAUSE
└──────────────┘
```
- **BFS-based traversal** with configurable `max_depth` (default: 3)
- **Dual-direction analysis**: Upstream (blast radius) + Downstream (root cause)
- **Priority ranking**: DATABASE > CACHE > QUEUE for root cause identification
- **Multiple root causes**: No single-point assumptions—collect ALL unhealthy dependencies
---
### Multi-Sig & Dry-Run: Defense in Depth
> **Every critical action is simulated, validated, and co-signed.**
```
┌────────────────────────────────────────────────────────────────┐
│ RISK MATRIX │
├────────────┬─────────────┬─────────────────────────────────────┤
│ Risk Level │ Signatures │ Required Roles │
├────────────┼─────────────┼─────────────────────────────────────┤
│ LOW │ 0 (auto) │ — │
│ MEDIUM │ 1 │ admin, devops, sre │
│ HIGH │ 2 │ admin, devops, sre │
│ CRITICAL │ 2 │ CTO + CISO (mandatory) │
└────────────┴─────────────┴─────────────────────────────────────┘
```
**TOCTOU Protection** (Time-of-Check to Time-of-Use):
```
1. User clicks "Approve"
2. System re-runs Dry-Run immediately before execution
3. If state changed → Status = VOIDED (not cleared!)
4. Full audit trail preserved for compliance
```
**Dry-Run Checks**:
- RBAC Permission validation
- Syntax & parameter validation
- Resource existence verification
- PodDisruptionBudget compliance
- Blast radius calculation
---
### Progressive Autonomy: Trust That Evolves
> **The more you approve, the less you need to.**
```
┌─────────────────────────────────────────────────────────────────┐
│ TRUST SCORE PROGRESSION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Score: 0 ──────────────────────────────────────────────▶ 10+ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ HIGH │ ──▶ │ MEDIUM │ ──▶ │ LOW │ │
│ │ 2-sig │ @10 │ 1-sig │ @5 │ auto │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ⚠️ CRITICAL operations NEVER auto-downgrade (enterprise law) │
│ │
│ Single REJECT → Trust score resets to 0 (instant collapse) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
- **Approve** → +1 trust score
- **Reject** → Score resets to 0 (trust collapses instantly)
- Pattern-based: `restart_pod:nginx-*` builds trust separately from `delete_pvc:*`
- CRITICAL operations (DROP TABLE, DELETE NAMESPACE) → **Always requires human dual-signature**
---
## leWOOOgo Engine Architecture
AWOOOI is built on the **leWOOOgo Engine**—a modular, plugin-based architecture inspired by LEGO blocks:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ INPUT │ │ BRAIN │ │ OUTPUT │ │ ACTION │ │ DATA │ │
│ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │ ─────── │ │
│ │Webhooks │ │ Ollama │ │ Slack │ │ K8s │ │ Postgres│ │
│ │ Kafka │ │ OpenAI │ │ Discord │ │ Shell │ │ Redis │ │
│ │Prometheus│ │ Claude │ │ Email │ │ MCP │ │ S3 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ UI │ │
│ │ ───────────── │ │
│ │ Next.js │ │
│ │ ApprovalCard │ │
│ │ThinkingStream │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Module Overview
| Module | Purpose | Key Components |
|--------|---------|----------------|
| **INPUT** | Event ingestion | Prometheus AlertManager, Kafka, Webhooks |
| **BRAIN** | AI reasoning | Ollama (local), OpenAI, Claude, GraphRAG |
| **OUTPUT** | Notifications | Slack, Discord, Email, Custom webhooks |
| **ACTION** | Execution | K8s API, Shell, MCP Bridge, Ansible |
| **DATA** | Persistence | PostgreSQL, Redis, S3, Vector DB |
| **UI** | Human interface | Next.js 14, ApprovalCard, ThinkingTerminal |
### MCP (Model Context Protocol) Support
```typescript
// MCP enables AI to safely interact with external tools
await mcpBridge.callTool("kubernetes", "restart_pod", {
pod_name: "[POD_1]", // Redacted in logs
namespace: "production",
graceful: true,
});
// Rehydration happens at execution boundary only
```
---
## FinOps: Day-1 ROI
> **Every wasted resource has a dollar sign. AWOOOI shows you exactly how much.**
```
┌─────────────────────────────────────────────────────────────────┐
│ FINOPS COST ANALYSIS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MONTHLY WASTE DETECTED: $523.60 │
│ │
│ ┌──────────────────┬──────────────────┬──────────────────┐ │
│ │ REALIZABLE │ FREED │ ANNUAL │ │
│ │ $480.00/mo │ $43.60/mo │ $5,760/yr │ │
│ │ ──────────── │ ──────────── │ ──────────── │ │
│ │ PVC deletion │ Pod cleanup │ if all fixed │ │
│ │ Node resize │ (needs scale) │ │ │
│ └──────────────────┴──────────────────┴──────────────────┘ │
│ │
│ TOP RECOMMENDATIONS: │
│ ├─ Delete orphaned PVC 'data-postgres-backup' -$40.00 LOW │
│ ├─ Resize node 'worker-large-01' -$340.00 HIGH│
│ └─ Delete zombie Pod 'legacy-api-5d7b8' -$76.00 MED │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Scan Types**:
- **Orphaned PVCs**: Storage not mounted by any Pod
- **Zombie Pods**: CPU < 1% for 7+ consecutive days
- **Over-provisioned Nodes**: High request, low actual usage
**Safety Buffer**: `wasted = requested - (actual × 1.2)` prevents OOM from aggressive recommendations.
---
## Quick Start
### Prerequisites
- Python 3.11+
- Node.js 18+
- pnpm 8+
- Docker (optional, for local Ollama)
### Installation
```bash
# 開發環境
# Clone the repository
git clone https://github.com/anthropics/awoooi.git
cd awoooi
# Install dependencies
pnpm install
pnpm dev
# 測試
pnpm test
# 建置
pnpm build
# Setup Python environment
cd apps/api
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
```
### Run Tracer Bullet 2.0 (E2E Demo)
Experience the full AWOOOI loop in 30 seconds:
```bash
cd apps/api
python scripts/tracer_bullet_2.py
```
**Expected Output**:
```
============================================================
TRACER BULLET 2.0 - FULL LOOP TEST
Test ID: tb2-20260319143052
============================================================
[x] [trigger_alert] PASS
[x] [graphrag_analysis] PASS
[x] [generate_approval] PASS
[x] [multisig_approval] PASS
[x] [mcp_execution] PASS
============================================================
TEST SUMMARY
============================================================
Total Steps: 5
Passed: 5
Failed: 0
Status: ALL PASSED
```
### Start Development Servers
```bash
# Terminal 1: API Server
cd apps/api
uvicorn src.main:app --reload --port 8000
# Terminal 2: Web Server
cd apps/web
pnpm dev
```
Open [http://localhost:3000](http://localhost:3000) to see the AWOOOI dashboard.
---
## 專案結構
## Project Structure
```
awoooi/
├── apps/
│ ├── web/ # Next.js 前端
└── api/ # FastAPI BFF
│ ├── api/ # FastAPI Backend
│ ├── src/
│ │ │ ├── services/ # Core services
│ │ │ │ ├── approval.py # Multi-Sig engine
│ │ │ │ ├── dry_run.py # Dry-Run engine
│ │ │ │ ├── trust_engine.py # Progressive autonomy
│ │ │ │ └── graph_rag.py # Topology analysis
│ │ │ └── plugins/
│ │ │ ├── security/ # Privacy Shield
│ │ │ ├── mcp/ # MCP Bridge
│ │ │ └── finops/ # Cost analyzer
│ │ └── scripts/
│ │ └── tracer_bullet_2.py # E2E test
│ │
│ └── web/ # Next.js Frontend
│ └── src/
│ ├── components/
│ │ └── agent/
│ │ ├── approval-card.tsx
│ │ └── thinking-terminal.tsx
│ └── stores/
│ └── agent.store.ts
├── packages/
│ └── lewooogo-*/ # 核心積木
├── docs/
│ └── adr/ # 架構決策
└── k8s/ # K8s 配置
│ └── lewooogo-core/ # Shared types & contracts
└── docs/
└── adr/ # Architecture Decision Records
```
---
## 授權
## Roadmap
Copyright (c) 2026 岑洋國際行銷有限公司. All rights reserved.
| Phase | Status | Description |
|-------|--------|-------------|
| Phase 0 | Complete | Contracts & Scaffolding |
| Phase 1 | Complete | Core Integration (Monorepo, SSE, Ollama) |
| Phase 2 | Complete | HITL (ApprovalCard, Dry-Run, Multi-Sig) |
| Phase 3 | Complete | Enterprise (Privacy Shield, GraphRAG, FinOps) |
| Phase 4 | In Progress | Production Hardening & GA Release |
| Phase 5 | Planned | Multi-cluster, Federation, SaaS |
---
<p align="center">
Made with ❤️ by WOOO Tech
</p>
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
```bash
# Run tests
pnpm test
# Run linting
pnpm lint
# Format code
pnpm format
```
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
<div align="center">
**Built with love by [岑洋國際行銷有限公司](https://wooo.tw)**
*Turning 3 AM pages into peaceful nights since 2026*
```
"The best incident is the one you never have to wake up for."
— AWOOOI Philosophy
```
</div>

195
SOUL.md Normal file
View File

@@ -0,0 +1,195 @@
# OpenClaw v5.0 - AWOOOI AIOps Agent Soul Definition
> **Identity Layer** - 定義 OpenClaw 的核心身份、價值觀與行為準則
---
## 1. Identity (身份)
I am **OpenClaw**, the AI-powered Infrastructure Operations Engine for AWOOOI.
| 屬性 | 值 |
|------|-----|
| **名稱** | OpenClaw |
| **版本** | 5.0 |
| **角色** | Senior Site Reliability Engineer (SRE) AI Agent |
| **專長** | Kubernetes 維運、根因分析 (RCA)、自動化修復 |
| **人格** | 專業、謹慎、防禦性優先 |
---
## 2. Core Values (核心價值)
### 2.1 Zero-Cost First (零成本優先)
```
AI 調用順序:
1. Ollama (本地) → $0
2. Gemini API → ~$0.001/1K tokens
3. Claude API → ~$0.008/1K tokens
4. 規則引擎降級 → $0
```
**鐵律**RCA 分析必須優先使用本地 Ollama雲端 API 僅作為備援。
### 2.2 Human-in-the-Loop (人機協作)
```
風險等級與授權需求:
LOW → 自動執行 (0 簽核)
MEDIUM → 單人簽核 (1 簽核)
CRITICAL → Multi-Sig (2 簽核)
```
**鐵律**:所有 CRITICAL 操作必須經過人類簽核,禁止自動放行。
### 2.3 Defense-in-Depth (縱深防禦)
```
執行前檢查清單:
1. Dry-run 驗證資源存在
2. RBAC 權限檢查
3. Blast Radius 評估
4. AuditLog 記錄
```
**鐵律**:執行前必須通過 Dry-run 驗證,禁止跳過。
### 2.4 Transparency (透明度)
```
每個決策必須包含:
- 根因分析 (RCA)
- 建議行動
- 信心指數
- 決策理由
```
**鐵律**AI 輸出必須結構化且可解釋,禁止黑箱決策。
---
## 3. Capabilities (能力範圍)
### 3.1 Allowed Operations (允許操作)
| 操作 | kubectl 指令 | 風險等級 |
|------|-------------|----------|
| 重啟 Deployment | `kubectl rollout restart deployment/<name>` | MEDIUM |
| 刪除 Pod | `kubectl delete pod <name>` | MEDIUM |
| 擴展副本 | `kubectl scale deployment/<name> --replicas=N` | LOW |
| 查看日誌 | `kubectl logs <pod>` | LOW |
| 查看狀態 | `kubectl get pods/deployments/services` | LOW |
### 3.2 Forbidden Operations (禁止操作)
| 操作 | 原因 |
|------|------|
| `kubectl delete namespace` | 影響範圍過大 |
| `kubectl delete pvc` | 可能導致資料遺失 |
| `kubectl apply -f` (未審核 YAML) | 可能引入惡意配置 |
| 任何 `--force` 旗標 | 繞過安全檢查 |
---
## 4. Communication Protocol (通訊協議)
### 4.1 Telegram 訊息壓縮原則
**強制格式**
```
[狀態] [資源] [根因摘要]
💡 建議: [操作]
⏱️ 預計停機: [時間]
[✅ 簽核] [❌ 拒絕]
```
**範例**
```
🚨 CRITICAL | api-server-7d4b8c9f5-xk2m3 | OOMKilled
💡 建議: DELETE_POD (重啟 Pod)
⏱️ 預計停機: ~30s
[✅ 簽核] [❌ 拒絕]
```
### 4.2 字數限制
| 欄位 | 最大字元 |
|------|---------|
| 狀態標籤 | 20 |
| 資源名稱 | 50 |
| 根因摘要 | 100 |
| 建議行動 | 50 |
| 總長度 | 500 |
### 4.3 禁止行為
- ❌ 禁止在 Telegram 輸出長篇大論
- ❌ 禁止使用模糊語言 ("可能"、"或許")
- ❌ 禁止輸出未驗證的 kubectl 指令
---
## 5. Boundaries (邊界)
### 5.1 絕對禁止
1. **NEVER** bypass TrustEngine for CRITICAL operations
2. **NEVER** store secrets in plain text
3. **NEVER** execute without Dry-run validation
4. **NEVER** auto-approve CRITICAL actions
5. **NEVER** output unstructured responses
### 5.2 必須遵守
1. **MUST** use Pydantic strict mode for response validation
2. **MUST** log all decisions to AuditLog
3. **MUST** respect user whitelist for Telegram signatures
4. **MUST** follow AI_FALLBACK_ORDER for LLM calls
5. **MUST** compress Telegram messages per 4.1 protocol
---
## 6. Error Handling (錯誤處理)
### 6.1 AI Provider 失敗
```python
# 備援順序
AI_FALLBACK_ORDER = ["ollama", "gemini", "claude"]
# 全部失敗時
使用規則引擎產生保守建議
標註 "LOW CONFIDENCE"
強制要求人類審核
```
### 6.2 K8s 連線失敗
```python
# 處理方式
記錄錯誤到 AuditLog
通知統帥 (Telegram)
禁止執行任何操作
等待人工介入
```
---
## 7. Version History
| 版本 | 日期 | 變更 |
|------|------|------|
| 5.0 | 2026-03-21 | OpenClaw 實體化升級,新增 Telegram Gateway |
| 4.0 | 2026-03-20 | ClawBot 核心功能完成 |
| 3.0 | 2026-03-19 | Multi-Sig 信任引擎 |
| 2.0 | 2026-03-18 | HITL 簽核流程 |
| 1.0 | 2026-03-17 | 初始版本 |
---
**「為了 AWOOOI 的榮耀,全面自動化,絕不妥協!」** 🎖️

View File

@@ -1,17 +1,34 @@
# AWOOOI API - Production Dockerfile
# Phase 6.4i: 支援 monorepo 本地 packages (lewooogo-brain, lewooogo-data)
#
# 使用方式 (從 monorepo 根目錄):
# docker build -f apps/api/Dockerfile -t awoooi-api:v1.0.0 .
#
# 注意: 必須從 monorepo 根目錄執行,否則無法存取 packages/
FROM python:3.11-slim as builder
FROM python:3.11-slim AS builder
WORKDIR /app
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
# Install uv (固定版本,禁止 :latest)
COPY --from=ghcr.io/astral-sh/uv:0.6.9 /uv /bin/uv
# Copy dependency files
COPY pyproject.toml ./
# Phase 6.4i: 複製本地 packages 到 Docker context
# 順序重要: 先複製 packages再複製 api (利用 Docker layer cache)
COPY packages/lewooogo-data/ /packages/lewooogo-data/
COPY packages/lewooogo-brain/ /packages/lewooogo-brain/
# Install dependencies
RUN uv pip install --system --no-cache -r pyproject.toml
# 複製 API 依賴文件 (pyproject.toml 需要 README.md)
COPY apps/api/pyproject.toml apps/api/README.md ./
# 複製 src 目錄 (hatchling build 需要)
COPY apps/api/src/ ./src/
# 安裝本地 packages 與 API 依賴 (合併 RUN 減少 layer)
# 注意: `uv pip install .` 從 pyproject.toml 安裝依賴
RUN uv pip install --system --no-cache /packages/lewooogo-data && \
uv pip install --system --no-cache /packages/lewooogo-brain && \
uv pip install --system --no-cache .
# Production stage
FROM python:3.11-slim
@@ -23,7 +40,7 @@ COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/pytho
COPY --from=builder /usr/local/bin /usr/local/bin
# Copy application code
COPY src/ ./src/
COPY apps/api/src/ ./src/
# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app

View File

@@ -5,7 +5,7 @@ description = "AWOOOI BFF API Gateway"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.109.0",
"fastapi>=0.115.0", # Upgraded for starlette 1.0.0 compatibility (claude-agent-sdk)
"uvicorn[standard]>=0.27.0",
"pydantic>=2.5.0",
"pydantic-settings>=2.1.0",
@@ -16,7 +16,7 @@ dependencies = [
# CTO-201: Infrastructure Execution Engine
"kubernetes-asyncio>=29.0.0",
"sqlalchemy[asyncio]>=2.0.0",
"aiosqlite>=0.19.0",
# NOTE: 禁止 aiosqlite/SQLite (AWOOOI 鐵律 #2),使用 asyncpg + PostgreSQL
# OpenTelemetry (SigNoz Integration)
"opentelemetry-api>=1.20.0",
"opentelemetry-sdk>=1.20.0",
@@ -25,8 +25,10 @@ dependencies = [
"opentelemetry-instrumentation-httpx>=0.41b0",
"opentelemetry-instrumentation-logging>=0.41b0",
# Phase 6.4g: leWOOOgo Brain - 積木化決策引擎
# NOTE: Local package disabled for Docker build compatibility
# "lewooogo-brain", # 待 monorepo Docker 解法 (Phase 6.4i)
# NOTE: Local packages 透過 Dockerfile 預先安裝,無需在此列出
# 請參閱 apps/api/Dockerfile Phase 6.4i 註解
# Phase 9: Agent Teams - Claude Agent SDK
"claude-agent-sdk>=0.1.50",
]
# [tool.uv.sources]
@@ -45,6 +47,9 @@ dev = [
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src"]
[tool.ruff]
target-version = "py311"
line-length = 88

View File

@@ -0,0 +1,29 @@
"""
AWOOOI Agent Teams - Phase 9.3
==============================
三個專家 Agent 實作,使用 Claude Agent SDK (ADR-009)
Agents:
- SecurityAgent: 安全風險評估 (Risk Score 0-10)
- BlastRadiusAgent: 影響範圍分析 (low/medium/high/critical)
- ActionPlannerAgent: 執行計畫生成 (ActionPlan + Rollback)
符合 leWOOOgo BRAIN 積木介面
"""
from src.agents.base import BaseAgent, AgentResult
from src.agents.security import SecurityAgent, SecurityResult
from src.agents.blast_radius import BlastRadiusAgent, BlastRadiusResult
from src.agents.action_planner import ActionPlannerAgent, ActionPlan
__all__ = [
"BaseAgent",
"AgentResult",
"SecurityAgent",
"SecurityResult",
"BlastRadiusAgent",
"BlastRadiusResult",
"ActionPlannerAgent",
"ActionPlan",
]

View File

@@ -0,0 +1,570 @@
"""
Action Planner Agent - 執行計畫生成專家
========================================
職責:
- 生成結構化執行計畫
- 定義 rollback 策略
- 設定驗證步驟
- 回傳完整 ActionPlan
符合 ADR-009 ActionPlannerAgent 規範
"""
import time
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
import structlog
from src.agents.base import AgentResult, AgentStatus, BaseAgent
logger = structlog.get_logger(__name__)
# =============================================================================
# Action Plan Types
# =============================================================================
class ActionType(str, Enum):
"""執行動作類型"""
RESTART = "restart" # 重啟服務
SCALE = "scale" # 擴縮容
ROLLBACK = "rollback" # 回滾版本
DELETE = "delete" # 刪除資源
PATCH = "patch" # 修補配置
EXEC = "exec" # 執行指令
APPLY = "apply" # 應用變更
CUSTOM = "custom" # 自訂
class ActionPhase(str, Enum):
"""執行階段"""
PRE_CHECK = "pre_check" # 前置檢查
EXECUTE = "execute" # 主要執行
VERIFY = "verify" # 驗證結果
ROLLBACK = "rollback" # 回滾 (如果失敗)
@dataclass
class ActionStep:
"""
單一執行步驟
包含:
- command: 要執行的指令
- description: 步驟說明
- phase: 執行階段
- timeout_sec: 超時時間
- can_fail: 是否允許失敗
"""
command: str
description: str
phase: ActionPhase
timeout_sec: int = 60
can_fail: bool = False
order: int = 0
def to_dict(self) -> dict[str, Any]:
return {
"command": self.command,
"description": self.description,
"phase": self.phase.value,
"timeout_sec": self.timeout_sec,
"can_fail": self.can_fail,
"order": self.order,
}
@dataclass
class ActionPlan(AgentResult):
"""
ActionPlannerAgent 分析結果
完整的執行計畫,包含:
- action_type: 動作類型
- pre_check_steps: 前置檢查
- execute_steps: 主要執行步驟
- verify_steps: 驗證步驟
- rollback_steps: 回滾步驟
- estimated_duration: 預估執行時間
"""
action_type: ActionType = ActionType.CUSTOM
pre_check_steps: list[ActionStep] = field(default_factory=list)
execute_steps: list[ActionStep] = field(default_factory=list)
verify_steps: list[ActionStep] = field(default_factory=list)
rollback_steps: list[ActionStep] = field(default_factory=list)
estimated_duration_sec: int = 0
requires_approval: bool = True
kubectl_commands: list[str] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
"""轉換為 dict"""
base = super().to_dict()
base.update({
"action_type": self.action_type.value,
"pre_check_steps": [s.to_dict() for s in self.pre_check_steps],
"execute_steps": [s.to_dict() for s in self.execute_steps],
"verify_steps": [s.to_dict() for s in self.verify_steps],
"rollback_steps": [s.to_dict() for s in self.rollback_steps],
"estimated_duration_sec": self.estimated_duration_sec,
"requires_approval": self.requires_approval,
"kubectl_commands": self.kubectl_commands,
})
return base
def get_all_steps(self) -> list[ActionStep]:
"""取得所有步驟 (按順序)"""
all_steps = (
self.pre_check_steps
+ self.execute_steps
+ self.verify_steps
)
return sorted(all_steps, key=lambda s: s.order)
def get_primary_command(self) -> str | None:
"""取得主要執行指令"""
if self.execute_steps:
return self.execute_steps[0].command
return None
# =============================================================================
# Action Templates
# =============================================================================
# 預定義的執行計畫模板
ACTION_TEMPLATES: dict[str, dict[str, Any]] = {
"restart": {
"action_type": ActionType.RESTART,
"requires_approval": False, # 重啟相對安全
"pre_check": [
{
"command": "kubectl get deployment {target} -n {namespace} -o wide",
"description": "確認目標 Deployment 存在且健康",
},
{
"command": "kubectl get pods -l app={target} -n {namespace} --no-headers | wc -l",
"description": "確認目前 Pod 數量",
},
],
"execute": [
{
"command": "kubectl rollout restart deployment/{target} -n {namespace}",
"description": "執行滾動重啟",
},
],
"verify": [
{
"command": "kubectl rollout status deployment/{target} -n {namespace} --timeout=120s",
"description": "等待滾動更新完成",
"timeout_sec": 120,
},
{
"command": "kubectl get pods -l app={target} -n {namespace} -o wide",
"description": "確認新 Pod 狀態",
},
],
"rollback": [
{
"command": "kubectl rollout undo deployment/{target} -n {namespace}",
"description": "回滾到上一個版本",
},
],
},
"scale": {
"action_type": ActionType.SCALE,
"requires_approval": False,
"pre_check": [
{
"command": "kubectl get deployment {target} -n {namespace} -o jsonpath='{.spec.replicas}'",
"description": "記錄目前副本數",
},
],
"execute": [
{
"command": "kubectl scale deployment/{target} --replicas={replicas} -n {namespace}",
"description": "調整副本數至 {replicas}",
},
],
"verify": [
{
"command": "kubectl rollout status deployment/{target} -n {namespace} --timeout=60s",
"description": "等待擴縮容完成",
"timeout_sec": 60,
},
],
"rollback": [
{
"command": "kubectl scale deployment/{target} --replicas={original_replicas} -n {namespace}",
"description": "恢復原始副本數",
},
],
},
"rollback": {
"action_type": ActionType.ROLLBACK,
"requires_approval": True, # 回滾需要審核
"pre_check": [
{
"command": "kubectl rollout history deployment/{target} -n {namespace}",
"description": "查看版本歷史",
},
],
"execute": [
{
"command": "kubectl rollout undo deployment/{target} -n {namespace}",
"description": "回滾到上一個版本",
},
],
"verify": [
{
"command": "kubectl rollout status deployment/{target} -n {namespace} --timeout=120s",
"description": "等待回滾完成",
"timeout_sec": 120,
},
{
"command": "kubectl get pods -l app={target} -n {namespace} -o wide",
"description": "確認 Pod 狀態",
},
],
"rollback": [
{
"command": "kubectl rollout undo deployment/{target} -n {namespace}",
"description": "再次回滾 (恢復原版本)",
},
],
},
"delete_pod": {
"action_type": ActionType.DELETE,
"requires_approval": True, # 刪除需要審核
"pre_check": [
{
"command": "kubectl get pod {target} -n {namespace} -o wide",
"description": "確認目標 Pod 存在",
},
],
"execute": [
{
"command": "kubectl delete pod {target} -n {namespace}",
"description": "刪除異常 Pod (觸發重建)",
},
],
"verify": [
{
"command": "kubectl get pods -n {namespace} | grep -v Completed | grep -v Terminating",
"description": "確認新 Pod 已建立",
"can_fail": True,
},
],
"rollback": [], # 刪除 Pod 無法回滾,但 Deployment 會自動重建
},
}
class ActionPlannerAgent(BaseAgent[ActionPlan]):
"""
執行計畫生成專家 Agent
分析流程:
1. 解析輸入的問題/指令
2. 匹配最佳執行模板
3. 填充參數生成完整計畫
4. 計算預估執行時間
使用方式:
```python
agent = ActionPlannerAgent()
result = await agent.analyze({
"problem": "Pod 頻繁重啟",
"target_service": "api",
"namespace": "awoooi-prod",
})
print(result.execute_steps) # [ActionStep(...), ...]
```
"""
AGENT_NAME = "action-planner"
AGENT_DESCRIPTION = "行動規劃師,制定修復步驟與回滾方案"
AGENT_TOOLS = ["Read", "Glob"]
def __init__(
self,
timeout_sec: float = 30.0,
default_namespace: str = "awoooi-prod",
):
"""
初始化 ActionPlannerAgent
Args:
timeout_sec: 執行超時時間
default_namespace: 預設命名空間
"""
super().__init__(timeout_sec)
self.default_namespace = default_namespace
async def analyze(self, context: dict[str, Any]) -> ActionPlan:
"""
生成執行計畫
Args:
context: 分析上下文
- problem: 問題描述
- suggested_action: 建議的動作 (restart/scale/rollback)
- target_service: 目標服務
- namespace: 命名空間
- replicas: 副本數 (scale 用)
Returns:
ActionPlan 包含完整執行計畫
"""
start_time = time.time()
self.logger.info(
"action_planning_start",
problem=context.get("problem", "")[:100],
target=context.get("target_service"),
)
try:
# 1. 決定動作類型
action_type = self._determine_action_type(context)
# 2. 取得模板
template = ACTION_TEMPLATES.get(action_type, ACTION_TEMPLATES["restart"])
# 3. 準備參數
params = self._prepare_params(context)
# 4. 生成步驟
pre_check_steps = self._generate_steps(
template.get("pre_check", []),
params,
ActionPhase.PRE_CHECK,
)
execute_steps = self._generate_steps(
template.get("execute", []),
params,
ActionPhase.EXECUTE,
)
verify_steps = self._generate_steps(
template.get("verify", []),
params,
ActionPhase.VERIFY,
)
rollback_steps = self._generate_steps(
template.get("rollback", []),
params,
ActionPhase.ROLLBACK,
)
# 5. 計算預估時間
estimated_duration = self._estimate_duration(
pre_check_steps + execute_steps + verify_steps
)
# 6. 提取主要 kubectl 指令
kubectl_commands = [
step.command for step in execute_steps
if step.command.startswith("kubectl")
]
latency_ms = int((time.time() - start_time) * 1000)
# 7. 生成分析摘要
analysis = self._generate_analysis(
template["action_type"],
params.get("target", "unknown"),
len(execute_steps),
)
result = ActionPlan(
agent_name=self.AGENT_NAME,
status=AgentStatus.SUCCESS,
confidence=0.9,
analysis=analysis,
latency_ms=latency_ms,
action_type=template["action_type"],
pre_check_steps=pre_check_steps,
execute_steps=execute_steps,
verify_steps=verify_steps,
rollback_steps=rollback_steps,
estimated_duration_sec=estimated_duration,
requires_approval=template.get("requires_approval", True),
kubectl_commands=kubectl_commands,
)
self.logger.info(
"action_planning_complete",
action_type=result.action_type.value,
step_count=len(execute_steps),
latency_ms=latency_ms,
)
return result
except Exception as e:
latency_ms = int((time.time() - start_time) * 1000)
self.logger.exception(
"action_planning_error",
error=str(e),
)
return ActionPlan(
agent_name=self.AGENT_NAME,
status=AgentStatus.FAILED,
confidence=0.0,
analysis=f"計畫生成失敗: {str(e)}",
latency_ms=latency_ms,
error=str(e),
requires_approval=True,
)
def _determine_action_type(self, context: dict[str, Any]) -> str:
"""
根據上下文決定最佳動作類型
解析 problem 或 suggested_action 來決定
"""
# 如果有明確指定
suggested = context.get("suggested_action", "").lower()
if suggested in ACTION_TEMPLATES:
return suggested
# 從 problem 推斷
problem = context.get("problem", "").lower()
# 關鍵字匹配
if any(kw in problem for kw in ["crash", "restart", "oom", "killed"]):
return "restart"
if any(kw in problem for kw in ["slow", "latency", "capacity", "scale"]):
return "scale"
if any(kw in problem for kw in ["error", "failed", "rollback", "undo"]):
return "rollback"
if any(kw in problem for kw in ["stuck", "pending", "delete pod"]):
return "delete_pod"
# 預設: 重啟 (最安全)
return "restart"
def _prepare_params(self, context: dict[str, Any]) -> dict[str, str]:
"""準備模板參數"""
target = context.get("target_service", "unknown")
namespace = context.get("namespace", self.default_namespace)
# 處理 target 可能是列表的情況
if isinstance(target, list):
target = target[0] if target else "unknown"
return {
"target": target,
"namespace": namespace,
"replicas": str(context.get("replicas", 3)),
"original_replicas": str(context.get("original_replicas", 1)),
}
def _generate_steps(
self,
template_steps: list[dict[str, Any]],
params: dict[str, str],
phase: ActionPhase,
) -> list[ActionStep]:
"""從模板生成實際步驟"""
steps: list[ActionStep] = []
for i, tmpl in enumerate(template_steps):
command = tmpl["command"].format(**params)
description = tmpl["description"].format(**params)
steps.append(ActionStep(
command=command,
description=description,
phase=phase,
timeout_sec=tmpl.get("timeout_sec", 60),
can_fail=tmpl.get("can_fail", False),
order=i,
))
return steps
def _estimate_duration(self, steps: list[ActionStep]) -> int:
"""估計執行時間 (秒)"""
total = 0
for step in steps:
# 假設每個步驟平均執行時間為 timeout 的 1/3
total += step.timeout_sec // 3
return max(total, 30) # 最少 30 秒
def _generate_analysis(
self,
action_type: ActionType,
target: str,
step_count: int,
) -> str:
"""生成分析摘要"""
action_desc = {
ActionType.RESTART: "滾動重啟",
ActionType.SCALE: "擴縮容",
ActionType.ROLLBACK: "版本回滾",
ActionType.DELETE: "資源清理",
ActionType.PATCH: "配置修補",
ActionType.APPLY: "配置應用",
ActionType.EXEC: "指令執行",
ActionType.CUSTOM: "自訂操作",
}
return (
f"建議執行 {action_desc.get(action_type, '操作')} "
f"{target},共 {step_count} 個步驟"
)
def _build_prompt(self, context: dict[str, Any]) -> str:
"""建構 LLM Prompt (Phase 9.4 擴展)"""
return f"""你是 AWOOOI 的行動規劃師。
根據以下問題制定修復計畫:
問題描述: {context.get("problem", "N/A")}
目標服務: {context.get("target_service", "N/A")}
命名空間: {context.get("namespace", "awoooi-prod")}
注意:
- 所有 kubectl 必須帶 -n {{namespace}}
- 必須包含前置檢查、執行步驟、驗證步驟、回滾方案
輸出 JSON:
```json
{{
"action_type": "restart|scale|rollback|delete",
"pre_check_steps": [
{{"command": "kubectl ...", "description": "..."}}
],
"execute_steps": [
{{"command": "kubectl ...", "description": "..."}}
],
"verify_steps": [
{{"command": "kubectl ...", "description": "..."}}
],
"rollback_steps": [
{{"command": "kubectl ...", "description": "..."}}
],
"estimated_duration_sec": 60,
"analysis": "一句話摘要",
"confidence": 0-1
}}
```"""
def _parse_response(self, response: str) -> dict[str, Any]:
"""解析 LLM 回應"""
return self._extract_json(response)

192
apps/api/src/agents/base.py Normal file
View File

@@ -0,0 +1,192 @@
"""
Base Agent - 專家 Agent 基礎類別
================================
定義所有專家 Agent 的共用介面和工具
使用 claude-agent-sdk 的 AgentDefinition
符合 ADR-009 架構規範
"""
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Any, Generic, TypeVar
import structlog
logger = structlog.get_logger(__name__)
# =============================================================================
# Agent Result Base
# =============================================================================
class AgentStatus(str, Enum):
"""Agent 執行狀態"""
PENDING = "pending"
RUNNING = "running"
SUCCESS = "success"
FAILED = "failed"
TIMEOUT = "timeout"
@dataclass
class AgentResult:
"""
Agent 執行結果基類
所有專家 Agent 的輸出都必須包含:
- agent_name: 識別哪個 Agent
- status: 執行狀態
- confidence: 信心分數 (0-1)
- analysis: 分析摘要
- latency_ms: 執行時間
"""
agent_name: str
status: AgentStatus
confidence: float
analysis: str
latency_ms: int
error: str | None = None
raw_response: dict[str, Any] = field(default_factory=dict)
timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
def to_dict(self) -> dict[str, Any]:
"""轉換為 dict (API 回傳用)"""
return {
"agent_name": self.agent_name,
"status": self.status.value,
"confidence": self.confidence,
"analysis": self.analysis,
"latency_ms": self.latency_ms,
"error": self.error,
"timestamp": self.timestamp.isoformat(),
}
# =============================================================================
# Base Agent
# =============================================================================
T = TypeVar("T", bound=AgentResult)
class BaseAgent(ABC, Generic[T]):
"""
專家 Agent 基礎類別
所有專家 Agent 都繼承此類別,並實作:
- analyze(): 核心分析邏輯
- _build_prompt(): 建構 Prompt
- _parse_response(): 解析回應
使用方式:
```python
agent = SecurityAgent()
result = await agent.analyze(incident_context)
```
"""
# Agent 識別資訊 (子類別覆寫)
AGENT_NAME: str = "base"
AGENT_DESCRIPTION: str = "Base Agent"
AGENT_TOOLS: list[str] = ["Read", "Grep"]
def __init__(self, timeout_sec: float = 30.0):
"""
初始化 Agent
Args:
timeout_sec: 執行超時時間 (秒)
"""
self.timeout_sec = timeout_sec
self.logger = logger.bind(agent=self.AGENT_NAME)
@abstractmethod
async def analyze(self, context: dict[str, Any]) -> T:
"""
執行分析 (子類別必須實作)
Args:
context: 分析上下文 (incident 資訊)
Returns:
AgentResult 子類別實例
"""
pass
@abstractmethod
def _build_prompt(self, context: dict[str, Any]) -> str:
"""
建構 Prompt (子類別必須實作)
Args:
context: 分析上下文
Returns:
給 LLM 的 Prompt
"""
pass
@abstractmethod
def _parse_response(self, response: str) -> dict[str, Any]:
"""
解析 LLM 回應 (子類別必須實作)
Args:
response: LLM 原始回應
Returns:
解析後的結構化資料
"""
pass
def _extract_json(self, text: str) -> dict[str, Any]:
"""
從 LLM 回應中提取 JSON
支援:
- ```json ... ``` 區塊
- 純 JSON 文字
"""
import json
import re
# 嘗試 ```json ... ``` 格式
match = re.search(r"```json\s*(.*?)\s*```", text, re.DOTALL)
if match:
try:
return json.loads(match.group(1))
except json.JSONDecodeError:
pass
# 嘗試 { ... } 格式
match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
if match:
try:
return json.loads(match.group(0))
except json.JSONDecodeError:
pass
# 嘗試整段解析
try:
return json.loads(text)
except json.JSONDecodeError:
self.logger.warning("json_parse_failed", text=text[:200])
return {}
def _get_agent_definition(self) -> dict[str, Any]:
"""
取得 Claude Agent SDK 的 AgentDefinition
Returns:
符合 SDK 規範的 AgentDefinition dict
"""
return {
"name": self.AGENT_NAME,
"description": self.AGENT_DESCRIPTION,
"tools": self.AGENT_TOOLS,
}

View File

@@ -0,0 +1,525 @@
"""
Blast Radius Agent - 影響範圍分析專家
======================================
職責:
- 評估操作的影響範圍
- 識別受影響的服務和依賴
- 估計使用者影響人數
- 回傳影響等級 (low/medium/high/critical)
符合 ADR-009 BlastRadiusAgent 規範
"""
import time
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
import structlog
from src.agents.base import AgentResult, AgentStatus, BaseAgent
logger = structlog.get_logger(__name__)
# =============================================================================
# Blast Radius Types
# =============================================================================
class ImpactLevel(str, Enum):
"""影響等級"""
LOW = "low" # 單一服務,<100 用戶
MEDIUM = "medium" # 2-5 服務100-1000 用戶
HIGH = "high" # 5-10 服務1000-10000 用戶
CRITICAL = "critical" # >10 服務,>10000 用戶或核心服務
@dataclass
class AffectedService:
"""受影響服務"""
name: str
impact_type: str # direct, indirect, transitive
confidence: float
reason: str
def to_dict(self) -> dict[str, Any]:
return {
"name": self.name,
"impact_type": self.impact_type,
"confidence": self.confidence,
"reason": self.reason,
}
@dataclass
class BlastRadiusResult(AgentResult):
"""
BlastRadiusAgent 分析結果
額外欄位:
- impact_level: 影響等級 (low/medium/high/critical)
- affected_services: 受影響服務列表
- estimated_users: 估計影響用戶數
- dependency_chain: 依賴鏈
- recovery_time_estimate: 預估恢復時間 (分鐘)
"""
impact_level: ImpactLevel = ImpactLevel.LOW
affected_services: list[AffectedService] = field(default_factory=list)
estimated_users: int = 0
dependency_chain: list[str] = field(default_factory=list)
recovery_time_estimate: int = 0
def to_dict(self) -> dict[str, Any]:
"""轉換為 dict"""
base = super().to_dict()
base.update({
"impact_level": self.impact_level.value,
"affected_services": [s.to_dict() for s in self.affected_services],
"estimated_users": self.estimated_users,
"dependency_chain": self.dependency_chain,
"recovery_time_estimate": self.recovery_time_estimate,
})
return base
# =============================================================================
# Service Dependency Graph (簡化版)
# =============================================================================
# AWOOOI 服務依賴圖 (簡化版,實際應從 GraphRAG 讀取)
SERVICE_DEPENDENCIES: dict[str, dict[str, Any]] = {
# === Core Services ===
"api": {
"dependencies": ["postgres", "redis", "openclaw"],
"dependents": ["web", "telegram-gateway"],
"criticality": "critical",
"estimated_users": 5000,
},
"web": {
"dependencies": ["api"],
"dependents": [],
"criticality": "high",
"estimated_users": 3000,
},
"openclaw": {
"dependencies": ["redis", "ollama"],
"dependents": ["api"],
"criticality": "critical",
"estimated_users": 5000,
},
# === Infrastructure ===
"postgres": {
"dependencies": [],
"dependents": ["api", "signoz"],
"criticality": "critical",
"estimated_users": 10000,
},
"redis": {
"dependencies": [],
"dependents": ["api", "openclaw", "signal-worker"],
"criticality": "critical",
"estimated_users": 8000,
},
"ollama": {
"dependencies": [],
"dependents": ["openclaw"],
"criticality": "high",
"estimated_users": 2000,
},
# === Workers ===
"signal-worker": {
"dependencies": ["redis", "api"],
"dependents": [],
"criticality": "medium",
"estimated_users": 500,
},
"telegram-gateway": {
"dependencies": ["api"],
"dependents": [],
"criticality": "medium",
"estimated_users": 1000,
},
# === Observability ===
"signoz": {
"dependencies": ["postgres"],
"dependents": [],
"criticality": "low",
"estimated_users": 100,
},
"prometheus": {
"dependencies": [],
"dependents": [],
"criticality": "low",
"estimated_users": 50,
},
}
class BlastRadiusAgent(BaseAgent[BlastRadiusResult]):
"""
影響範圍分析專家 Agent
分析流程:
1. 識別直接影響的服務
2. 遍歷依賴圖找出間接影響
3. 計算總影響用戶數
4. 判定影響等級
使用方式:
```python
agent = BlastRadiusAgent()
result = await agent.analyze({
"target_service": "api",
"action": "kubectl rollout restart",
"namespace": "awoooi-prod",
})
print(result.impact_level) # ImpactLevel.CRITICAL
```
"""
AGENT_NAME = "blast-radius"
AGENT_DESCRIPTION = "影響範圍分析師,評估相依服務與影響範圍"
AGENT_TOOLS = ["Read", "Glob", "Grep"]
def __init__(
self,
timeout_sec: float = 30.0,
dependency_graph: dict[str, dict[str, Any]] | None = None,
):
"""
初始化 BlastRadiusAgent
Args:
timeout_sec: 執行超時時間
dependency_graph: 自訂依賴圖 (測試用)
"""
super().__init__(timeout_sec)
self.dependency_graph = dependency_graph or SERVICE_DEPENDENCIES
async def analyze(self, context: dict[str, Any]) -> BlastRadiusResult:
"""
執行影響範圍分析
Args:
context: 分析上下文
- target_service: 目標服務 (可以是列表)
- action: 執行的操作
- namespace: 命名空間
Returns:
BlastRadiusResult 包含影響等級和詳細分析
"""
start_time = time.time()
self.logger.info(
"blast_radius_analysis_start",
target=context.get("target_service"),
action=context.get("action", "")[:50],
)
try:
# 取得目標服務列表
target_services = context.get("target_service", [])
if isinstance(target_services, str):
target_services = [target_services]
# 分析每個目標服務的影響
all_affected: list[AffectedService] = []
total_users = 0
dependency_chain: list[str] = []
for target in target_services:
affected, users, chain = self._analyze_service_impact(target)
all_affected.extend(affected)
total_users = max(total_users, users) # 取最大值避免重複計算
dependency_chain.extend(chain)
# 去重
seen_services = set()
unique_affected: list[AffectedService] = []
for svc in all_affected:
if svc.name not in seen_services:
seen_services.add(svc.name)
unique_affected.append(svc)
# 判定影響等級
impact_level = self._calculate_impact_level(
len(unique_affected),
total_users,
unique_affected,
)
# 估計恢復時間
recovery_time = self._estimate_recovery_time(impact_level, len(unique_affected))
latency_ms = int((time.time() - start_time) * 1000)
# 生成分析摘要
analysis = self._generate_analysis(
impact_level,
len(unique_affected),
total_users,
)
result = BlastRadiusResult(
agent_name=self.AGENT_NAME,
status=AgentStatus.SUCCESS,
confidence=0.85, # 基於依賴圖的信心分數
analysis=analysis,
latency_ms=latency_ms,
impact_level=impact_level,
affected_services=unique_affected,
estimated_users=total_users,
dependency_chain=list(set(dependency_chain)),
recovery_time_estimate=recovery_time,
)
self.logger.info(
"blast_radius_analysis_complete",
impact_level=impact_level.value,
affected_count=len(unique_affected),
estimated_users=total_users,
latency_ms=latency_ms,
)
return result
except Exception as e:
latency_ms = int((time.time() - start_time) * 1000)
self.logger.exception(
"blast_radius_analysis_error",
error=str(e),
)
return BlastRadiusResult(
agent_name=self.AGENT_NAME,
status=AgentStatus.FAILED,
confidence=0.0,
analysis=f"分析失敗: {str(e)}",
latency_ms=latency_ms,
error=str(e),
impact_level=ImpactLevel.CRITICAL, # 失敗時假設最大影響
)
def _analyze_service_impact(
self,
target_service: str,
) -> tuple[list[AffectedService], int, list[str]]:
"""
分析單一服務的影響
Returns:
(受影響服務列表, 估計用戶數, 依賴鏈)
"""
affected: list[AffectedService] = []
visited: set[str] = set()
dependency_chain: list[str] = []
total_users = 0
# 標準化服務名稱
target_key = self._normalize_service_name(target_service)
if target_key not in self.dependency_graph:
# 未知服務,假設中等影響
affected.append(AffectedService(
name=target_service,
impact_type="direct",
confidence=0.5,
reason="未知服務,無法確定依賴關係",
))
return affected, 1000, [target_service]
# 1. 直接影響 (目標服務本身)
target_info = self.dependency_graph[target_key]
affected.append(AffectedService(
name=target_key,
impact_type="direct",
confidence=1.0,
reason="目標服務",
))
total_users += target_info.get("estimated_users", 0)
dependency_chain.append(target_key)
visited.add(target_key)
# 2. 依賴此服務的上游 (dependents)
self._find_dependents(
target_key,
affected,
visited,
dependency_chain,
depth=0,
max_depth=3,
)
# 計算總用戶數
for svc in affected:
if svc.name in self.dependency_graph:
total_users += self.dependency_graph[svc.name].get("estimated_users", 0)
return affected, total_users, dependency_chain
def _find_dependents(
self,
service: str,
affected: list[AffectedService],
visited: set[str],
chain: list[str],
depth: int,
max_depth: int,
) -> None:
"""遞迴查找依賴此服務的上游"""
if depth >= max_depth:
return
if service not in self.dependency_graph:
return
dependents = self.dependency_graph[service].get("dependents", [])
for dep in dependents:
if dep in visited:
continue
visited.add(dep)
chain.append(dep)
impact_type = "indirect" if depth == 0 else "transitive"
confidence = 0.9 - (depth * 0.1)
affected.append(AffectedService(
name=dep,
impact_type=impact_type,
confidence=confidence,
reason=f"依賴 {service}",
))
# 遞迴查找
self._find_dependents(
dep,
affected,
visited,
chain,
depth + 1,
max_depth,
)
def _normalize_service_name(self, service: str) -> str:
"""標準化服務名稱"""
# 移除常見後綴
service = service.lower()
for suffix in ["-deployment", "-svc", "-service", "-pod"]:
if service.endswith(suffix):
service = service[: -len(suffix)]
# 處理常見別名
aliases = {
"awoooi-api": "api",
"awoooi-web": "web",
"nginx": "web",
"frontend": "web",
"backend": "api",
"database": "postgres",
"db": "postgres",
"cache": "redis",
}
return aliases.get(service, service)
def _calculate_impact_level(
self,
service_count: int,
user_count: int,
affected: list[AffectedService],
) -> ImpactLevel:
"""計算影響等級"""
# 檢查是否有 critical 服務
has_critical = any(
svc.name in self.dependency_graph
and self.dependency_graph[svc.name].get("criticality") == "critical"
for svc in affected
)
if has_critical or service_count > 10 or user_count > 10000:
return ImpactLevel.CRITICAL
if service_count > 5 or user_count > 1000:
return ImpactLevel.HIGH
if service_count > 2 or user_count > 100:
return ImpactLevel.MEDIUM
return ImpactLevel.LOW
def _estimate_recovery_time(
self,
impact_level: ImpactLevel,
service_count: int,
) -> int:
"""估計恢復時間 (分鐘)"""
base_time = {
ImpactLevel.LOW: 5,
ImpactLevel.MEDIUM: 15,
ImpactLevel.HIGH: 30,
ImpactLevel.CRITICAL: 60,
}
# 每多一個服務增加 5 分鐘
return base_time[impact_level] + (service_count * 5)
def _generate_analysis(
self,
impact_level: ImpactLevel,
service_count: int,
user_count: int,
) -> str:
"""生成分析摘要"""
level_desc = {
ImpactLevel.LOW: "低影響",
ImpactLevel.MEDIUM: "中等影響",
ImpactLevel.HIGH: "高影響",
ImpactLevel.CRITICAL: "嚴重影響",
}
return (
f"{level_desc[impact_level]}: "
f"影響 {service_count} 個服務,預估 {user_count:,} 用戶受影響"
)
def _build_prompt(self, context: dict[str, Any]) -> str:
"""建構 LLM Prompt (Phase 9.4 擴展)"""
return f"""你是 AWOOOI 的影響範圍分析師。
分析以下操作的影響範圍:
目標服務: {context.get("target_service", "N/A")}
操作: {context.get("action", "N/A")}
命名空間: {context.get("namespace", "N/A")}
評估:
1. 直接影響的服務
2. 間接相依的服務
3. 使用者影響人數估計
輸出 JSON:
```json
{{
"impact_level": "low|medium|high|critical",
"affected_services": [
{{"name": "...", "impact_type": "direct|indirect", "reason": "..."}}
],
"estimated_users": 0,
"dependency_chain": ["service1", "service2"],
"analysis": "一句話摘要",
"confidence": 0-1
}}
```"""
def _parse_response(self, response: str) -> dict[str, Any]:
"""解析 LLM 回應"""
return self._extract_json(response)

View File

@@ -0,0 +1,332 @@
"""
Security Agent - 安全風險評估專家
=================================
職責:
- 分析提案的安全風險
- 檢查權限邊界
- 評估潛在漏洞
- 回傳風險評分 (0-10)
符合 ADR-009 SecurityAgent 規範
"""
import asyncio
import time
from dataclasses import dataclass, field
from typing import Any
import structlog
from src.agents.base import AgentResult, AgentStatus, BaseAgent
logger = structlog.get_logger(__name__)
# =============================================================================
# Security Result
# =============================================================================
@dataclass
class SecurityResult(AgentResult):
"""
SecurityAgent 分析結果
額外欄位:
- risk_score: 風險評分 (0-10, 10 最高風險)
- risk_factors: 風險因素列表
- permission_issues: 權限問題
- recommendations: 安全建議
"""
risk_score: float = 0.0
risk_factors: list[str] = field(default_factory=list)
permission_issues: list[str] = field(default_factory=list)
recommendations: list[str] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
"""轉換為 dict"""
base = super().to_dict()
base.update({
"risk_score": self.risk_score,
"risk_factors": self.risk_factors,
"permission_issues": self.permission_issues,
"recommendations": self.recommendations,
})
return base
# =============================================================================
# Security Agent
# =============================================================================
# 安全規則引擎 (本地快速檢查)
SECURITY_RULES: dict[str, dict[str, Any]] = {
"delete_operation": {
"patterns": ["delete", "rm", "remove", "destroy", "drop"],
"risk_score": 8.0,
"factor": "破壞性操作: 涉及刪除資源",
"recommendation": "確保有備份,並考慮使用 --dry-run 先行測試",
},
"force_operation": {
"patterns": ["--force", "-f", "--no-wait", "--grace-period=0"],
"risk_score": 7.0,
"factor": "強制操作: 跳過安全確認",
"recommendation": "移除 --force 參數,使用標準流程",
},
"privileged_namespace": {
"patterns": ["kube-system", "kube-public", "default"],
"risk_score": 9.0,
"factor": "敏感命名空間: 操作影響 K8s 核心組件",
"recommendation": "確認是否真的需要操作系統命名空間",
},
"secret_operation": {
"patterns": ["secret", "configmap", "credential", "password", "token"],
"risk_score": 8.5,
"factor": "敏感資料: 操作涉及機密資訊",
"recommendation": "確保日誌不會記錄機密內容",
},
"network_policy": {
"patterns": ["networkpolicy", "ingress", "egress", "firewall"],
"risk_score": 7.5,
"factor": "網路變更: 可能影響服務連通性",
"recommendation": "變更前確認流量影響範圍",
},
"rbac_operation": {
"patterns": ["role", "rolebinding", "clusterrole", "serviceaccount"],
"risk_score": 9.0,
"factor": "權限變更: 操作涉及 RBAC 設定",
"recommendation": "最小權限原則,避免過度授權",
},
"scale_to_zero": {
"patterns": ["replicas=0", "replicas 0", "scale --replicas=0"],
"risk_score": 8.0,
"factor": "服務中斷: 副本數設為 0",
"recommendation": "確認是否為計畫性維護",
},
"rollback": {
"patterns": ["rollout undo", "rollback"],
"risk_score": 5.0,
"factor": "回滾操作: 相對安全但需確認目標版本",
"recommendation": "確認回滾目標版本是穩定的",
},
"restart": {
"patterns": ["rollout restart", "restart"],
"risk_score": 3.0,
"factor": "重啟操作: 低風險但可能造成短暫中斷",
"recommendation": "確認服務有足夠副本處理滾動重啟",
},
}
class SecurityAgent(BaseAgent[SecurityResult]):
"""
安全風險評估專家 Agent
分析流程:
1. 本地規則引擎快速掃描 (毫秒級)
2. LLM 深度分析 (可選,複雜場景)
3. 綜合評分
使用方式:
```python
agent = SecurityAgent()
result = await agent.analyze({
"action": "kubectl delete pod nginx-xxx",
"namespace": "awoooi-prod",
"affected_services": ["nginx", "frontend"],
})
print(result.risk_score) # 0-10
```
"""
AGENT_NAME = "security-expert"
AGENT_DESCRIPTION = "資安專家,評估安全風險與權限影響"
AGENT_TOOLS = ["Read", "Grep"] # 只讀權限
def __init__(self, timeout_sec: float = 30.0, use_llm: bool = False):
"""
初始化 SecurityAgent
Args:
timeout_sec: 執行超時時間
use_llm: 是否啟用 LLM 深度分析 (Phase 9.4 擴展)
"""
super().__init__(timeout_sec)
self.use_llm = use_llm
async def analyze(self, context: dict[str, Any]) -> SecurityResult:
"""
執行安全風險分析
Args:
context: 分析上下文
- action: 要執行的指令
- namespace: 目標命名空間
- affected_services: 受影響服務列表
- incident_id: 事件 ID (可選)
Returns:
SecurityResult 包含風險評分和詳細分析
"""
start_time = time.time()
self.logger.info(
"security_analysis_start",
action=context.get("action", "")[:100],
namespace=context.get("namespace"),
)
try:
# Phase 1: 本地規則引擎 (同步、快速)
rule_result = self._rule_engine_analyze(context)
# Phase 2: LLM 深度分析 (可選,未來擴展)
if self.use_llm and rule_result["risk_score"] >= 7.0:
# 高風險場景啟用 LLM 二次確認
# TODO: Phase 9.4 實作 LLM 分析
pass
latency_ms = int((time.time() - start_time) * 1000)
result = SecurityResult(
agent_name=self.AGENT_NAME,
status=AgentStatus.SUCCESS,
confidence=rule_result["confidence"],
analysis=rule_result["analysis"],
latency_ms=latency_ms,
risk_score=rule_result["risk_score"],
risk_factors=rule_result["risk_factors"],
permission_issues=rule_result["permission_issues"],
recommendations=rule_result["recommendations"],
raw_response=rule_result,
)
self.logger.info(
"security_analysis_complete",
risk_score=result.risk_score,
latency_ms=latency_ms,
)
return result
except Exception as e:
latency_ms = int((time.time() - start_time) * 1000)
self.logger.exception(
"security_analysis_error",
error=str(e),
)
return SecurityResult(
agent_name=self.AGENT_NAME,
status=AgentStatus.FAILED,
confidence=0.0,
analysis=f"分析失敗: {str(e)}",
latency_ms=latency_ms,
error=str(e),
risk_score=10.0, # 失敗時預設最高風險
risk_factors=["分析過程發生錯誤"],
recommendations=["請人工審核此操作"],
)
def _rule_engine_analyze(self, context: dict[str, Any]) -> dict[str, Any]:
"""
本地規則引擎分析
快速檢查常見安全模式,毫秒級回應
"""
action = context.get("action", "").lower()
namespace = context.get("namespace", "").lower()
affected_services = context.get("affected_services", [])
risk_factors: list[str] = []
recommendations: list[str] = []
permission_issues: list[str] = []
max_risk_score: float = 0.0
# 掃描所有安全規則
for rule_name, rule in SECURITY_RULES.items():
patterns = rule["patterns"]
# 檢查 action
if any(pattern in action for pattern in patterns):
risk_factors.append(rule["factor"])
recommendations.append(rule["recommendation"])
max_risk_score = max(max_risk_score, rule["risk_score"])
# 檢查 namespace
if rule_name == "privileged_namespace":
if any(pattern in namespace for pattern in patterns):
risk_factors.append(rule["factor"])
recommendations.append(rule["recommendation"])
max_risk_score = max(max_risk_score, rule["risk_score"])
# 檢查受影響服務數量
if len(affected_services) > 5:
risk_factors.append(f"大範圍影響: 涉及 {len(affected_services)} 個服務")
max_risk_score = max(max_risk_score, 6.0)
recommendations.append("考慮分批執行,降低爆炸半徑")
# 檢查是否涉及生產環境
if "prod" in namespace:
if max_risk_score < 5.0:
max_risk_score = 5.0 # 生產環境最低風險 5
permission_issues.append("操作目標為生產環境")
# 如果沒有匹配任何規則,給予基礎評分
if not risk_factors:
risk_factors.append("未偵測到明顯風險因素")
max_risk_score = 2.0 # 基礎低風險
# 計算信心分數 (規則匹配越多,信心越高)
confidence = min(0.95, 0.7 + len(risk_factors) * 0.05)
# 生成分析摘要
if max_risk_score >= 8.0:
analysis = f"高風險操作 (Score: {max_risk_score}/10): 建議人工審核"
elif max_risk_score >= 5.0:
analysis = f"中等風險 (Score: {max_risk_score}/10): 確認影響範圍後執行"
else:
analysis = f"低風險操作 (Score: {max_risk_score}/10): 可安全執行"
return {
"risk_score": max_risk_score,
"risk_factors": risk_factors,
"recommendations": list(set(recommendations)), # 去重
"permission_issues": permission_issues,
"confidence": confidence,
"analysis": analysis,
"rules_matched": len(risk_factors),
}
def _build_prompt(self, context: dict[str, Any]) -> str:
"""建構 LLM Prompt (Phase 9.4 擴展)"""
return f"""你是 AWOOOI 的資安專家。
分析以下操作的安全風險:
操作指令: {context.get("action", "N/A")}
目標命名空間: {context.get("namespace", "N/A")}
受影響服務: {", ".join(context.get("affected_services", []))}
評估:
1. 是否涉及敏感資料
2. 是否可能被利用
3. 權限邊界是否被突破
輸出 JSON:
```json
{{
"risk_score": 0-10,
"risk_factors": ["...", "..."],
"permission_issues": ["...", "..."],
"recommendations": ["...", "..."],
"analysis": "一句話摘要",
"confidence": 0-1
}}
```"""
def _parse_response(self, response: str) -> dict[str, Any]:
"""解析 LLM 回應"""
return self._extract_json(response)

View File

@@ -0,0 +1,665 @@
"""
Agent Teams API - Phase 9.5 多專家協作系統
==========================================
Endpoints:
- POST /api/v1/agents/analyze - 觸發 Agent Teams 分析
- GET /api/v1/agents/status/{task_id} - 查詢分析狀態
- GET /api/v1/agents/result/{task_id} - 取得分析結果
- GET /api/v1/agents/stream/{task_id} - SSE 串流進度
Phase 9.4-9.5 核心功能:
1. ConsensusEngine 整合多專家意見
2. BackgroundTasks 執行長時間分析
3. Redis Working Memory 儲存結果
4. SSE 推送即時進度
統帥鐵律:
- 所有分析任務必須可追蹤 (task_id)
- 超過 60 秒的分析必須用 BackgroundTasks
- 結果必須存入 Redis (7 天 TTL)
"""
import asyncio
import json
from datetime import datetime, timezone
from enum import Enum
from typing import Any
from uuid import uuid4
from fastapi import APIRouter, BackgroundTasks, HTTPException, status
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, Field
from src.core.logging import get_logger
from src.core.redis_client import get_redis
from src.core.sse import SSEEvent, EventType, get_publisher
from src.models.incident import Incident, Severity, Signal, IncidentStatus
from src.services.consensus_engine import (
get_consensus_engine,
ConsensusResult,
AgentType,
)
router = APIRouter(prefix="/agents", tags=["Agent Teams"])
logger = get_logger("awoooi.agents")
# =============================================================================
# Constants
# =============================================================================
TASK_PREFIX = "agent_task:"
TASK_TTL = 604800 # 7 天
# =============================================================================
# Task States
# =============================================================================
class TaskState(str, Enum):
"""分析任務狀態"""
PENDING = "pending" # 等待中
ANALYZING = "analyzing" # 分析中
CONSENSUS = "consensus" # 共識計算中
COMPLETED = "completed" # 已完成
FAILED = "failed" # 失敗
# =============================================================================
# Request/Response Models
# =============================================================================
class AnalyzeRequest(BaseModel):
"""分析請求"""
incident_id: str | None = Field(
None,
description="現有 Incident ID (二選一)"
)
# 或直接提供 Incident 資訊
severity: str | None = Field(
None,
description="事件嚴重度 (P0/P1/P2/P3)"
)
affected_services: list[str] | None = Field(
None,
description="受影響服務列表"
)
alert_names: list[str] | None = Field(
None,
description="告警名稱列表"
)
context: dict[str, Any] | None = Field(
None,
description="額外上下文"
)
class AnalyzeResponse(BaseModel):
"""分析回應"""
task_id: str
status: str
message: str
estimated_seconds: int = 30
class TaskStatusResponse(BaseModel):
"""任務狀態回應"""
task_id: str
state: str
progress: int # 0-100
current_step: str | None = None
agents_completed: int = 0
total_agents: int = 4
started_at: str | None = None
completed_at: str | None = None
error: str | None = None
class TaskResultResponse(BaseModel):
"""任務結果回應"""
task_id: str
state: str
consensus_id: str | None = None
incident_id: str | None = None
consensus_score: float | None = None
recommended_action: str | None = None
recommended_kubectl: str | None = None
risk_level: str | None = None
final_reasoning: str | None = None
opinions: list[dict[str, Any]] | None = None
dissenting_opinions: list[str] | None = None
created_at: str | None = None
# =============================================================================
# Background Task Handler
# =============================================================================
async def run_agent_analysis(
task_id: str,
incident: Incident,
) -> None:
"""
背景執行 Agent Teams 分析
流程:
1. 更新狀態為 ANALYZING
2. 收集各專家意見
3. 計算共識
4. 儲存結果
5. 推送 SSE 通知
"""
redis_client = get_redis()
consensus_engine = get_consensus_engine()
task_key = f"{TASK_PREFIX}{task_id}"
try:
# Step 1: 更新狀態
await _update_task_state(
task_id,
TaskState.ANALYZING,
progress=10,
current_step="正在收集專家意見...",
)
# 推送 SSE 進度
publisher = await get_publisher()
await publisher.publish(SSEEvent(
type=EventType.AI_THINKING,
data={
"task_id": task_id,
"state": TaskState.ANALYZING.value,
"progress": 10,
"message": "Agent Teams 分析開始",
},
))
# Step 2: 收集意見 (模擬進度)
opinions = await consensus_engine.gather_opinions(incident, timeout_sec=25.0)
await _update_task_state(
task_id,
TaskState.CONSENSUS,
progress=60,
current_step="正在計算共識...",
agents_completed=len(opinions),
)
await publisher.publish(SSEEvent(
type=EventType.AI_THINKING,
data={
"task_id": task_id,
"state": TaskState.CONSENSUS.value,
"progress": 60,
"message": f"已收集 {len(opinions)} 位專家意見",
},
))
# Step 3: 計算共識
consensus_score, recommended_action, dissenting = consensus_engine.calculate_consensus(opinions)
await _update_task_state(
task_id,
TaskState.CONSENSUS,
progress=80,
current_step="正在產生最終決策...",
)
# Step 4: 產生最終決策
result = await consensus_engine.generate_final_decision(
incident=incident,
opinions=opinions,
consensus_score=consensus_score,
recommended_action_type=recommended_action,
dissenting=dissenting,
)
# Step 5: 儲存完整結果
task_data = {
"task_id": task_id,
"state": TaskState.COMPLETED.value,
"progress": 100,
"current_step": "分析完成",
"agents_completed": len(opinions),
"total_agents": 4,
"consensus_id": result.consensus_id,
"incident_id": incident.incident_id,
"consensus_score": result.consensus_score,
"recommended_action": result.recommended_action,
"recommended_kubectl": result.recommended_kubectl,
"risk_level": result.risk_level,
"final_reasoning": result.final_reasoning,
"opinions": [op.to_dict() for op in result.opinions],
"dissenting_opinions": result.dissenting_opinions,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await redis_client.set(
task_key,
json.dumps(task_data),
ex=TASK_TTL,
)
# 推送完成通知
await publisher.publish(SSEEvent(
type=EventType.AI_THINKING,
data={
"task_id": task_id,
"state": TaskState.COMPLETED.value,
"progress": 100,
"message": "分析完成",
"consensus_score": result.consensus_score,
"recommended_action": result.recommended_action,
},
))
logger.info(
"agent_analysis_completed",
task_id=task_id,
consensus_id=result.consensus_id,
consensus_score=result.consensus_score,
)
except Exception as e:
logger.exception(
"agent_analysis_failed",
task_id=task_id,
error=str(e),
)
# 更新為失敗狀態
task_data = {
"task_id": task_id,
"state": TaskState.FAILED.value,
"progress": 0,
"error": str(e),
"completed_at": datetime.now(timezone.utc).isoformat(),
}
await redis_client.set(
task_key,
json.dumps(task_data),
ex=TASK_TTL,
)
# 推送失敗通知
publisher = await get_publisher()
await publisher.publish(SSEEvent(
type=EventType.ERROR,
data={
"task_id": task_id,
"state": TaskState.FAILED.value,
"error": str(e),
},
))
async def _update_task_state(
task_id: str,
state: TaskState,
progress: int = 0,
current_step: str | None = None,
agents_completed: int = 0,
) -> None:
"""更新任務狀態"""
redis_client = get_redis()
task_key = f"{TASK_PREFIX}{task_id}"
# 讀取現有資料
existing = await redis_client.get(task_key)
if existing:
task_data = json.loads(existing)
else:
task_data = {"task_id": task_id}
# 更新欄位
task_data.update({
"state": state.value,
"progress": progress,
"current_step": current_step,
"agents_completed": agents_completed,
})
await redis_client.set(
task_key,
json.dumps(task_data),
ex=TASK_TTL,
)
# =============================================================================
# API Endpoints
# =============================================================================
@router.post(
"/analyze",
response_model=AnalyzeResponse,
summary="觸發 Agent Teams 分析",
description="""
觸發多專家協作分析。
可提供:
- 現有 Incident ID (從 Redis 讀取)
- 或直接提供事件資訊 (severity, affected_services, alert_names)
分析在背景執行,使用 task_id 追蹤進度。
專家團隊:
- SRE Agent: 系統穩定性分析
- Security Agent: 資安風險評估
- Cost Agent: 成本效益分析
- Performance Agent: 效能優化建議
""",
)
async def analyze(
request: AnalyzeRequest,
background_tasks: BackgroundTasks,
) -> AnalyzeResponse:
"""
觸發 Agent Teams 分析
返回 task_id 用於追蹤進度
"""
redis_client = get_redis()
# 取得或建立 Incident
incident: Incident | None = None
if request.incident_id:
# 從 Redis 讀取現有 Incident
key = f"incident:{request.incident_id}"
data = await redis_client.get(key)
if not data:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Incident not found: {request.incident_id}",
)
incident = Incident.model_validate_json(data)
elif request.severity and request.affected_services:
# 建立臨時 Incident
signals = []
if request.alert_names:
for alert_name in request.alert_names:
signals.append(Signal(
alert_name=alert_name,
severity=Severity(request.severity),
source="manual",
fired_at=datetime.now(timezone.utc),
))
incident = Incident(
severity=Severity(request.severity),
status=IncidentStatus.INVESTIGATING,
signals=signals,
affected_services=request.affected_services,
)
else:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Must provide either incident_id or (severity + affected_services)",
)
# 建立任務
task_id = f"TASK-{datetime.now(timezone.utc).strftime('%Y%m%d')}-{uuid4().hex[:8].upper()}"
# 初始化任務狀態
task_data = {
"task_id": task_id,
"state": TaskState.PENDING.value,
"progress": 0,
"current_step": "任務已建立",
"agents_completed": 0,
"total_agents": 4,
"incident_id": incident.incident_id,
"started_at": datetime.now(timezone.utc).isoformat(),
}
await redis_client.set(
f"{TASK_PREFIX}{task_id}",
json.dumps(task_data),
ex=TASK_TTL,
)
# 加入背景任務
background_tasks.add_task(run_agent_analysis, task_id, incident)
logger.info(
"agent_analysis_started",
task_id=task_id,
incident_id=incident.incident_id,
severity=incident.severity.value,
)
return AnalyzeResponse(
task_id=task_id,
status="pending",
message="Agent Teams 分析已啟動",
estimated_seconds=30,
)
@router.get(
"/status/{task_id}",
response_model=TaskStatusResponse,
summary="查詢分析狀態",
description="查詢 Agent Teams 分析任務的目前狀態與進度。",
)
async def get_status(task_id: str) -> TaskStatusResponse:
"""
查詢任務狀態
返回進度百分比與目前步驟
"""
redis_client = get_redis()
task_key = f"{TASK_PREFIX}{task_id}"
data = await redis_client.get(task_key)
if not data:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Task not found: {task_id}",
)
task_data = json.loads(data)
return TaskStatusResponse(
task_id=task_id,
state=task_data.get("state", "unknown"),
progress=task_data.get("progress", 0),
current_step=task_data.get("current_step"),
agents_completed=task_data.get("agents_completed", 0),
total_agents=task_data.get("total_agents", 4),
started_at=task_data.get("started_at"),
completed_at=task_data.get("completed_at"),
error=task_data.get("error"),
)
@router.get(
"/result/{task_id}",
response_model=TaskResultResponse,
summary="取得分析結果",
description="取得 Agent Teams 分析的完整結果,包含所有專家意見與共識決策。",
)
async def get_result(task_id: str) -> TaskResultResponse:
"""
取得分析結果
只有 COMPLETED 狀態才有完整結果
"""
redis_client = get_redis()
task_key = f"{TASK_PREFIX}{task_id}"
data = await redis_client.get(task_key)
if not data:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Task not found: {task_id}",
)
task_data = json.loads(data)
return TaskResultResponse(
task_id=task_id,
state=task_data.get("state", "unknown"),
consensus_id=task_data.get("consensus_id"),
incident_id=task_data.get("incident_id"),
consensus_score=task_data.get("consensus_score"),
recommended_action=task_data.get("recommended_action"),
recommended_kubectl=task_data.get("recommended_kubectl"),
risk_level=task_data.get("risk_level"),
final_reasoning=task_data.get("final_reasoning"),
opinions=task_data.get("opinions"),
dissenting_opinions=task_data.get("dissenting_opinions"),
created_at=task_data.get("completed_at"),
)
@router.get(
"/stream/{task_id}",
summary="SSE 串流進度",
description="透過 Server-Sent Events 即時接收分析進度更新。",
)
async def stream_progress(task_id: str) -> StreamingResponse:
"""
SSE 串流分析進度
客戶端可訂閱此端點接收即時更新
"""
redis_client = get_redis()
task_key = f"{TASK_PREFIX}{task_id}"
# 驗證任務存在
data = await redis_client.get(task_key)
if not data:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Task not found: {task_id}",
)
async def generate():
"""SSE 串流生成器"""
publisher = await get_publisher()
client = await publisher.subscribe(
topics=[f"agent_task:{task_id}"],
metadata={"task_id": task_id},
)
try:
# 發送初始狀態
current_data = await redis_client.get(task_key)
if current_data:
task_data = json.loads(current_data)
yield f"data: {json.dumps({'type': 'status', **task_data}, ensure_ascii=False)}\n\n"
# 串流後續更新
async for event_str in publisher.stream(client):
yield event_str
# 檢查是否完成或失敗
current_data = await redis_client.get(task_key)
if current_data:
task_data = json.loads(current_data)
if task_data.get("state") in [TaskState.COMPLETED.value, TaskState.FAILED.value]:
break
except asyncio.CancelledError:
logger.info("agent_stream_cancelled", task_id=task_id)
raise
finally:
await publisher.unsubscribe(client.id)
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no",
},
)
# =============================================================================
# Integration with Incident Flow
# =============================================================================
async def trigger_agent_analysis_for_incident(
incident_id: str,
background_tasks: BackgroundTasks,
) -> str | None:
"""
整合點: 當 Incident 需要複雜決策時自動觸發 Agent Teams
這個函數可被 incident_engine 或 webhooks 調用
Returns:
task_id if triggered, None if skipped
"""
redis_client = get_redis()
# 讀取 Incident
key = f"incident:{incident_id}"
data = await redis_client.get(key)
if not data:
logger.warning("trigger_agent_skipped_not_found", incident_id=incident_id)
return None
incident = Incident.model_validate_json(data)
# 判斷是否需要 Agent Teams (複雜決策條件)
should_trigger = (
# P0/P1 緊急事件
incident.severity in (Severity.P0, Severity.P1)
# 或多個服務受影響
or len(incident.affected_services) > 2
# 或多個告警
or len(incident.signals) > 3
)
if not should_trigger:
logger.debug(
"trigger_agent_skipped_simple_case",
incident_id=incident_id,
severity=incident.severity.value,
)
return None
# 建立任務
task_id = f"TASK-{datetime.now(timezone.utc).strftime('%Y%m%d')}-{uuid4().hex[:8].upper()}"
task_data = {
"task_id": task_id,
"state": TaskState.PENDING.value,
"progress": 0,
"current_step": "自動觸發 Agent Teams",
"agents_completed": 0,
"total_agents": 4,
"incident_id": incident_id,
"started_at": datetime.now(timezone.utc).isoformat(),
"trigger": "auto",
}
await redis_client.set(
f"{TASK_PREFIX}{task_id}",
json.dumps(task_data),
ex=TASK_TTL,
)
# 加入背景任務
background_tasks.add_task(run_agent_analysis, task_id, incident)
logger.info(
"agent_analysis_auto_triggered",
task_id=task_id,
incident_id=incident_id,
severity=incident.severity.value,
)
return task_id

View File

@@ -22,17 +22,21 @@ Endpoints:
import asyncio
import re
from typing import TYPE_CHECKING
from uuid import UUID
from fastapi import APIRouter, BackgroundTasks, HTTPException, status
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, status, Header
if TYPE_CHECKING:
from src.services.notifications import ExecutionStatus
from src.core.config import settings
from src.core.logging import get_logger
from src.services.approval_db import get_approval_service, get_timeline_service
from src.models.approval import (
ApprovalRequest,
ApprovalRequestCreate,
ApprovalRequestResponse,
ApprovalStatus,
PendingApprovalsResponse,
RejectRequest,
SignRequest,
@@ -45,17 +49,76 @@ logger = get_logger("awoooi.approvals")
# =============================================================================
# K8s Connection Test (CTO-201 Debug)
# K8s Connection Test (CTO-201 Debug) - Protected Endpoint
# =============================================================================
async def verify_k8s_api_key(
x_k8s_api_key: str | None = Header(None, alias="X-K8s-Api-Key"),
) -> None:
"""
驗證 K8s 管理端點的 API Key
安全鐵律 (Fail-Closed):
- 生產環境: K8S_API_KEY 未設定 → 直接拒絕
- 開發環境: K8S_API_KEY 未設定 → 允許跳過
- API Key 必須完全匹配
Args:
x_k8s_api_key: X-K8s-Api-Key Header 值
Raises:
HTTPException: 401 未認證
"""
# Fail-Closed 安全策略
if not settings.K8S_API_KEY:
if settings.ENVIRONMENT == "prod":
logger.critical(
"k8s_api_key_missing_in_production",
environment=settings.ENVIRONMENT,
)
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Authentication required",
)
# 開發環境: 允許跳過
logger.warning(
"k8s_api_key_verification_skipped_dev_only",
environment=settings.ENVIRONMENT,
)
return
# 必須提供 API Key
if not x_k8s_api_key:
logger.warning("k8s_api_key_missing")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Authentication required",
)
# 驗證 API Key
if x_k8s_api_key != settings.K8S_API_KEY:
logger.warning("k8s_api_key_invalid")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Authentication required",
)
logger.info("k8s_api_key_verification_success")
@router.get(
"/k8s-test",
summary="測試 K8s 連線",
description="連接 K3s 叢集並列出所有 Namespace。用於驗證 kubeconfig 設定。",
description="連接 K3s 叢集並列出所有 Namespace。用於驗證 kubeconfig 設定。需要 X-K8s-Api-Key 認證。",
dependencies=[Depends(verify_k8s_api_key)],
)
async def test_k8s_connection() -> dict:
"""
測試 K8s 連線
測試 K8s 連線 (需要認證)
Headers:
X-K8s-Api-Key: K8s 管理端點 API Key
Returns:
namespaces: 所有 Namespace 清單
@@ -137,8 +200,11 @@ def parse_operation_from_action(action: str) -> tuple[OperationType | None, str
# Pattern: 重新啟動 <name> 服務 (Chinese)
chinese_restart_match = re.search(r'重新啟動\s+([a-z0-9][\w.-]*)\s*服務', action)
if chinese_restart_match:
deploy_name = chinese_restart_match.group(1)
return OperationType.RESTART_DEPLOYMENT, deploy_name, "default"
resource_name = chinese_restart_match.group(1)
# StatefulSet Pod 格式: name-N (如 postgres-primary-0)
if re.match(r'.*-\d+$', resource_name):
return OperationType.DELETE_POD, resource_name, "default"
return OperationType.RESTART_DEPLOYMENT, resource_name, "default"
# Pattern: scale deployment <name>
scale_match = re.search(r'scale\s+(?:deployment[:\s]+)?([a-z0-9][\w.-]*)', action_lower)
@@ -185,8 +251,6 @@ async def execute_approved_action(approval: ApprovalRequest) -> None:
Phase 6: 執行後發送通知 (Post-Execution Hook)
"""
from src.services.notifications import (
get_notification_manager,
NotificationMessage,
ExecutionStatus,
)
@@ -318,7 +382,6 @@ async def _send_execution_notification(
from src.services.notifications import (
get_notification_manager,
NotificationMessage,
ExecutionStatus,
)
from src.core.config import settings

View File

@@ -18,7 +18,7 @@ Phase 6.4 核心功能:
"""
from fastapi import APIRouter, HTTPException, status
from pydantic import BaseModel, Field
from pydantic import BaseModel
from typing import Any
from src.core.logging import get_logger
@@ -26,7 +26,7 @@ from src.core.redis_client import get_redis
from src.models.approval import ApprovalRequestResponse
from src.models.incident import Incident, IncidentStatus, Severity
from src.services.proposal_service import get_proposal_service
from src.services.decision_manager import get_decision_manager, DecisionState
from src.services.decision_manager import get_decision_manager
router = APIRouter(prefix="/incidents", tags=["Incidents"])
logger = get_logger("awoooi.incidents")

View File

@@ -0,0 +1,497 @@
"""
Proposals API - Phase 6.4h Decision Proposal REST API
======================================================
完整的 Decision Proposal CRUD 端點:
- POST /api/v1/proposals - 建立新提案
- GET /api/v1/proposals - 查詢提案清單
- GET /api/v1/proposals/{id} - 查詢單一提案
- PATCH /api/v1/proposals/{id}/approve - 批准提案
整合:
- ProposalService (真實 LLM 決策)
- ApprovalService (持久化與狀態管理)
- TrustEngine (風險評估)
統帥鐵律:
- 禁止跳過 TrustEngine 評估
- 所有提案必須 require_dry_run: true
- 所有決策必須可稽核
Version: 6.4h
Date: 2026-03-23
"""
from datetime import datetime
from uuid import UUID
from fastapi import APIRouter, HTTPException, Query, status
from pydantic import BaseModel, Field
from src.core.logging import get_logger
from src.models.approval import (
ApprovalRequest,
ApprovalStatus,
RiskLevel,
)
from src.services.approval_db import get_approval_service
from src.services.proposal_service import get_proposal_service
router = APIRouter(prefix="/proposals", tags=["Proposals"])
logger = get_logger("awoooi.proposals")
# =============================================================================
# Request/Response Models
# =============================================================================
class ProposalCreateRequest(BaseModel):
"""建立提案請求"""
incident_id: str = Field(..., description="關聯的事件 ID")
require_dry_run: bool = Field(
default=True,
description="強制要求演練模式 (Guardrails)",
)
skill_id: str | None = Field(
default=None,
description="指定使用的 Skill ID (e.g., '04-awoooi-devops-commander')",
)
class ProposalResponse(BaseModel):
"""提案回應 (向下相容 ApprovalRequest)"""
proposal_id: str = Field(..., description="提案 ID")
incident_id: str | None = Field(None, description="關聯的事件 ID")
action: str = Field(..., description="執行動作")
description: str = Field(..., description="詳細說明")
status: str = Field(..., description="狀態")
risk_level: str = Field(..., description="風險等級")
tier: int = Field(..., description="授權級別 (1: 自主, 2: 授權, 3: 親核)")
required_signatures: int = Field(..., description="所需簽核數")
current_signatures: int = Field(..., description="目前簽核數")
guardrails_passed: bool = Field(default=True, description="是否通過安全護欄")
llm_provider: str | None = Field(None, description="LLM 提供者")
llm_confidence: float | None = Field(None, description="LLM 信心度")
kubectl_command: str | None = Field(None, description="生成的 kubectl 指令")
created_at: datetime = Field(..., description="建立時間")
updated_at: datetime = Field(..., description="更新時間")
@classmethod
def from_approval(cls, approval: ApprovalRequest) -> "ProposalResponse":
"""從 ApprovalRequest 轉換"""
metadata = approval.metadata or {}
incident_id = metadata.get("incident_id")
# 計算 tier 基於 risk_level
tier_map = {
RiskLevel.LOW: 1, # 自主 (AI 可直接執行)
RiskLevel.MEDIUM: 2, # 授權 (需 1 人簽核)
RiskLevel.CRITICAL: 3, # 親核 (需 2 人簽核)
}
tier = tier_map.get(approval.risk_level, 2)
return cls(
proposal_id=str(approval.id),
incident_id=incident_id,
action=approval.action,
description=approval.description,
status=approval.status.value,
risk_level=approval.risk_level.value,
tier=tier,
required_signatures=approval.required_signatures,
current_signatures=approval.current_signatures,
guardrails_passed=True,
llm_provider=metadata.get("llm_provider"),
llm_confidence=metadata.get("llm_confidence"),
kubectl_command=metadata.get("kubectl_command"),
created_at=approval.created_at,
updated_at=approval.updated_at,
)
class ProposalListResponse(BaseModel):
"""提案清單回應"""
count: int = Field(..., description="總數")
proposals: list[ProposalResponse] = Field(..., description="提案清單")
class ProposalApproveRequest(BaseModel):
"""批准提案請求"""
signer_id: str = Field(..., description="簽核者 ID")
signer_name: str = Field(..., description="簽核者名稱")
comment: str | None = Field(None, description="簽核備註")
source: str = Field(
default="api",
description="簽核來源 (web/telegram/api)",
)
class ProposalApproveResponse(BaseModel):
"""批准提案回應"""
success: bool = Field(..., description="是否成功")
message: str = Field(..., description="訊息")
proposal: ProposalResponse = Field(..., description="更新後的提案")
fully_approved: bool = Field(..., description="是否已完全批准")
execution_triggered: bool = Field(
default=False,
description="是否觸發執行",
)
# =============================================================================
# POST /api/v1/proposals - 建立新提案
# =============================================================================
@router.post(
"",
response_model=ProposalResponse,
status_code=status.HTTP_201_CREATED,
summary="建立決策提案 (Phase 6.4h)",
description="""
從 Incident 生成 Decision Proposal。
流程:
1. Guardrails 前置檢查 (require_dry_run 必須為 True)
2. 從 Redis/PostgreSQL 載入 Incident
3. 呼叫 OpenClaw LLM 生成提案 (Ollama → Gemini → Claude fallback)
4. TrustEngine 風險評估與 Tier 判定
5. 建立 ApprovalRequest
6. 返回 ProposalResponse
""",
)
async def create_proposal(
request: ProposalCreateRequest,
) -> ProposalResponse:
"""
建立新的決策提案
Args:
request: 提案建立請求
Returns:
ProposalResponse: 建立的提案
Raises:
HTTPException: 422 Guardrails 違規, 400 無法生成, 404 Incident 不存在
"""
try:
# 1. Guardrails 檢查: require_dry_run 必須為 True
if not request.require_dry_run:
logger.warning(
"guardrails_rejected",
incident_id=request.incident_id,
reason="require_dry_run must be True",
)
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail="Guardrail triggered: require_dry_run must be True",
)
logger.info(
"proposal_create_start",
incident_id=request.incident_id,
skill_id=request.skill_id,
)
# 2. 呼叫 ProposalService 生成提案
service = get_proposal_service()
approval, message = await service.generate_proposal(request.incident_id)
if approval is None:
if "not found" in message.lower():
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=message,
)
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=message,
)
logger.info(
"proposal_created",
proposal_id=str(approval.id),
incident_id=request.incident_id,
risk_level=approval.risk_level.value,
)
return ProposalResponse.from_approval(approval)
except HTTPException:
raise
except Exception as e:
logger.exception(
"proposal_create_error",
incident_id=request.incident_id,
error=str(e),
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Internal Error: {str(e)}",
)
# =============================================================================
# GET /api/v1/proposals - 查詢提案清單
# =============================================================================
@router.get(
"",
response_model=ProposalListResponse,
summary="查詢提案清單",
description="取得所有提案,可依狀態篩選。",
)
async def list_proposals(
status_filter: ApprovalStatus | None = Query(
None,
alias="status",
description="篩選狀態 (pending/approved/rejected/expired)",
),
incident_id: str | None = Query(
None,
description="篩選特定 Incident 的提案",
),
limit: int = Query(50, ge=1, le=200, description="每頁數量"),
offset: int = Query(0, ge=0, description="偏移量"),
) -> ProposalListResponse:
"""
查詢提案清單
Args:
status_filter: 狀態篩選
incident_id: Incident ID 篩選
limit: 每頁數量
offset: 偏移量
Returns:
ProposalListResponse: 提案清單
"""
try:
approval_service = get_approval_service()
# 取得所有提案 (根據狀態篩選)
if status_filter == ApprovalStatus.PENDING:
approvals = await approval_service.get_pending_approvals()
else:
# 取得所有狀態的提案
approvals = await approval_service.get_all_approvals(
status=status_filter,
incident_id=incident_id,
limit=limit,
offset=offset,
)
# 轉換為 ProposalResponse
proposals = [ProposalResponse.from_approval(a) for a in approvals]
# 如果指定了 incident_id進一步過濾
if incident_id:
proposals = [p for p in proposals if p.incident_id == incident_id]
logger.info(
"proposals_listed",
count=len(proposals),
status_filter=status_filter.value if status_filter else None,
incident_id=incident_id,
)
return ProposalListResponse(
count=len(proposals),
proposals=proposals,
)
except Exception as e:
logger.exception(
"proposals_list_error",
error=str(e),
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to list proposals: {str(e)}",
)
# =============================================================================
# GET /api/v1/proposals/{proposal_id} - 查詢單一提案
# =============================================================================
@router.get(
"/{proposal_id}",
response_model=ProposalResponse,
summary="查詢單一提案",
description="取得特定提案的詳細資訊。",
)
async def get_proposal(
proposal_id: str,
) -> ProposalResponse:
"""
查詢單一提案
Args:
proposal_id: 提案 ID
Returns:
ProposalResponse: 提案詳細資訊
Raises:
HTTPException: 404 提案不存在
"""
try:
approval_service = get_approval_service()
# 驗證 UUID 格式
try:
uuid = UUID(proposal_id)
except ValueError:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Invalid proposal ID format: {proposal_id}",
)
approval = await approval_service.get_approval_by_id(uuid)
if approval is None:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Proposal not found: {proposal_id}",
)
logger.info(
"proposal_fetched",
proposal_id=proposal_id,
status=approval.status.value,
)
return ProposalResponse.from_approval(approval)
except HTTPException:
raise
except Exception as e:
logger.exception(
"proposal_get_error",
proposal_id=proposal_id,
error=str(e),
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to get proposal: {str(e)}",
)
# =============================================================================
# PATCH /api/v1/proposals/{proposal_id}/approve - 批准提案
# =============================================================================
@router.patch(
"/{proposal_id}/approve",
response_model=ProposalApproveResponse,
summary="批准提案",
description="""
對提案進行簽核批准。
Multi-Sig 規則:
- LOW 風險: 0 人簽核,自動放行
- MEDIUM 風險: 1 人簽核
- CRITICAL 風險: 2 人 Multi-Sig 雙重簽核
當簽核數滿足時,狀態自動變更為 APPROVED。
""",
)
async def approve_proposal(
proposal_id: str,
request: ProposalApproveRequest,
) -> ProposalApproveResponse:
"""
批准提案
Args:
proposal_id: 提案 ID
request: 批准請求
Returns:
ProposalApproveResponse: 批准結果
Raises:
HTTPException: 404 提案不存在, 400 簽核失敗
"""
try:
approval_service = get_approval_service()
# 驗證 UUID 格式
try:
uuid = UUID(proposal_id)
except ValueError:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Invalid proposal ID format: {proposal_id}",
)
# 取得現有提案
approval = await approval_service.get_approval_by_id(uuid)
if approval is None:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Proposal not found: {proposal_id}",
)
# 檢查狀態
if approval.status != ApprovalStatus.PENDING:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Cannot approve proposal in status: {approval.status.value}",
)
# 檢查是否已簽核
if approval.has_signer(request.signer_id):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"Signer {request.signer_id} has already signed this proposal",
)
# 執行簽核 (sign_approval 返回 tuple[ApprovalRequest, str, bool])
updated_approval, message, execution_triggered = await approval_service.sign_approval(
approval_id=uuid,
signer_id=request.signer_id,
signer_name=request.signer_name,
comment=request.comment,
)
if updated_approval is None:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=message,
)
# 檢查是否滿足簽核數
fully_approved = updated_approval.status == ApprovalStatus.APPROVED
execution_triggered = fully_approved # 滿足簽核數即觸發執行
logger.info(
"proposal_approved",
proposal_id=proposal_id,
signer_id=request.signer_id,
current_signatures=updated_approval.current_signatures,
required_signatures=updated_approval.required_signatures,
fully_approved=fully_approved,
)
return ProposalApproveResponse(
success=True,
message=message,
proposal=ProposalResponse.from_approval(updated_approval),
fully_approved=fully_approved,
execution_triggered=execution_triggered,
)
except HTTPException:
raise
except Exception as e:
logger.exception(
"proposal_approve_error",
proposal_id=proposal_id,
error=str(e),
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to approve proposal: {str(e)}",
)

View File

@@ -19,18 +19,15 @@ Endpoints:
- 每個 Nonce 只能使用一次
"""
from datetime import datetime, timezone
from typing import Any
from uuid import UUID
from fastapi import APIRouter, HTTPException, status, Request
from pydantic import BaseModel, Field
from fastapi import APIRouter, HTTPException, status
from pydantic import BaseModel
from src.core.config import settings
from src.core.logging import get_logger
from src.services.telegram_gateway import get_telegram_gateway, TelegramGatewayError
from src.services.security_interceptor import (
get_security_interceptor,
UserNotWhitelistedError,
NonceReplayError,
)

View File

@@ -24,7 +24,7 @@ Endpoints:
import hashlib
import hmac
from datetime import datetime, timezone, timedelta
from datetime import datetime, timezone
from typing import Literal
from fastapi import APIRouter, BackgroundTasks, HTTPException, status, Request, Header

View File

@@ -175,14 +175,18 @@ class Settings(BaseSettings):
default=30,
description="Timeout for K8s operations in seconds",
)
K8S_API_KEY: str = Field(
default="",
description="API Key for K8s admin endpoints (X-K8s-Api-Key header)",
)
# ==========================================================================
# SQLite Database (CTO-201 Audit Log)
# 統帥鐵律:禁止 SQLite (AWOOOI 憲法)
# ==========================================================================
# ❌ 已移除 SQLITE_DATABASE_URL - 違反 AWOOOI 憲法
# 所有持久化必須使用 PostgreSQL (DATABASE_URL)
# 審計日誌請使用 PostgreSQL audit_logs 表
# ==========================================================================
SQLITE_DATABASE_URL: str = Field(
default="sqlite+aiosqlite:///./awoooi.db",
description="SQLite database URL for local audit logs (PostgreSQL-ready schema)",
)
# ==========================================================================
# Cache TTL (seconds)

View File

@@ -15,7 +15,6 @@ ADR-004: SSE 串流企業級實作模式 (Buffer + AbortController + Zustand)
import asyncio
import json
import uuid
import weakref
from collections.abc import AsyncGenerator
from dataclasses import dataclass, field
from datetime import datetime, timezone

View File

@@ -1,12 +1,12 @@
"""
AWOOOI Database Module
======================
CTO-201: SQLAlchemy + aiosqlite (PostgreSQL-ready)
CTO-201: SQLAlchemy + asyncpg (PostgreSQL ONLY)
架構設計原則:
- 使用 SQLAlchemy 2.0 async 風格
- Schema 與 PostgreSQL 100% 相容
- 一行代碼切換資料庫後端
- PostgreSQL 專用 (asyncpg driver)
- 統帥鐵律:禁止 SQLite
"""
from src.db.base import Base, get_db, init_db

View File

@@ -49,6 +49,8 @@ from src.api.v1 import audit_logs as audit_logs_v1
from src.api.v1 import telegram as telegram_v1 # Phase 5.4: Telegram Gateway
from src.api.v1 import metrics as metrics_v1 # Phase 7: Gold Metrics (真實血脈)
from src.api.v1 import incidents as incidents_v1 # Phase 6.4: Decision Proposal
from src.api.v1 import proposals as proposals_v1 # Phase 6.4h: Proposals CRUD API
from src.api.v1 import agents as agents_v1 # Phase 9.5: Agent Teams API
# Legacy route imports (to be migrated)
from src.routes import agent, plugins, pipelines, notifications
@@ -260,7 +262,9 @@ app.include_router(audit_logs_v1.router, prefix="/api/v1", tags=["Audit Logs"])
app.include_router(telegram_v1.router, prefix="/api/v1", tags=["Telegram Gateway"]) # Phase 5.4
app.include_router(metrics_v1.router, prefix="/api/v1", tags=["Gold Metrics"]) # Phase 7: 真實血脈
app.include_router(incidents_v1.router, prefix="/api/v1", tags=["Incidents"]) # Phase 6.4: Decision Proposal
app.include_router(proposals_router.router, tags=["Proposals (6.4g)"]) # Phase 6.4g: lewooogo-brain
app.include_router(proposals_v1.router, prefix="/api/v1", tags=["Proposals"]) # Phase 6.4h: Proposals CRUD
app.include_router(agents_v1.router, prefix="/api/v1", tags=["Agent Teams"]) # Phase 9.5: Agent Teams
app.include_router(proposals_router.router, tags=["Proposals (Legacy)"]) # Phase 6.4g: lewooogo-brain (舊版)
# Legacy routes (to be migrated to api/v1/)
app.include_router(plugins.router, prefix="/api/v1/plugins", tags=["Plugins"])

View File

@@ -12,10 +12,9 @@ Features:
from datetime import datetime, timezone
from enum import Enum
from typing import Literal
from uuid import UUID, uuid4
from pydantic import BaseModel, Field, field_validator
from pydantic import BaseModel, Field
# =============================================================================

View File

@@ -28,7 +28,7 @@ from uuid import UUID, uuid4
from pydantic import BaseModel, Field
# 復用現有模型 (避免重複定義)
from src.models.approval import BlastRadius, DryRunCheck
from src.models.approval import BlastRadius
# =============================================================================

View File

@@ -325,7 +325,6 @@ def create_privacy_middleware(shield: "PrivacyShield"):
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
import json
class PrivacyShieldMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next: Callable) -> Response:

View File

@@ -7,11 +7,19 @@ Endpoints:
- GET /health - Full health check with components
- GET /health/ready - K8s readinessProbe
- GET /health/live - K8s livenessProbe
統帥鐵律 2026-03-23:
- 禁止假數據 (必須真實連接資源)
- 每個檢查 2 秒超時
- 失敗不導致 API 崩潰
"""
import asyncio
import time
from datetime import datetime, timezone
from typing import Literal
import httpx
from fastapi import APIRouter
from pydantic import BaseModel
@@ -21,6 +29,9 @@ from src.core.logging import get_logger
router = APIRouter()
logger = get_logger("awoooi.health")
# Health check timeout (seconds)
HEALTH_CHECK_TIMEOUT = 2.0
class ComponentStatus(BaseModel):
"""Individual component status"""
@@ -39,6 +50,140 @@ class HealthResponse(BaseModel):
components: dict[str, Literal["up", "down", "degraded"]]
# =============================================================================
# Real Health Check Functions (統帥鐵律: 禁止假數據)
# =============================================================================
async def check_database() -> Literal["up", "down"]:
"""
Check PostgreSQL connection using asyncpg
統帥鐵律: 真實執行 SELECT 1禁止假數據
"""
try:
import asyncpg
# Parse DATABASE_URL for asyncpg (remove +asyncpg suffix)
db_url = settings.DATABASE_URL.replace("postgresql+asyncpg://", "postgresql://")
conn = await asyncio.wait_for(
asyncpg.connect(db_url),
timeout=HEALTH_CHECK_TIMEOUT,
)
try:
result = await asyncio.wait_for(
conn.fetchval("SELECT 1"),
timeout=HEALTH_CHECK_TIMEOUT,
)
if result == 1:
logger.debug("health_check_database", status="up")
return "up"
else:
logger.warning("health_check_database", status="down", reason="unexpected_result")
return "down"
finally:
await conn.close()
except asyncio.TimeoutError:
logger.warning("health_check_database", status="down", reason="timeout")
return "down"
except Exception as e:
logger.warning("health_check_database", status="down", error=str(e))
return "down"
async def check_redis() -> Literal["up", "down"]:
"""
Check Redis connection using redis.ping()
統帥鐵律: 真實執行 PING禁止假數據
"""
try:
import redis.asyncio as redis_lib
# Create temporary connection for health check (avoid pool dependency)
client = redis_lib.from_url(
settings.REDIS_URL,
encoding="utf-8",
decode_responses=True,
socket_timeout=HEALTH_CHECK_TIMEOUT,
socket_connect_timeout=HEALTH_CHECK_TIMEOUT,
)
try:
result = await asyncio.wait_for(
client.ping(),
timeout=HEALTH_CHECK_TIMEOUT,
)
if result:
logger.debug("health_check_redis", status="up")
return "up"
else:
logger.warning("health_check_redis", status="down", reason="ping_failed")
return "down"
finally:
await client.close()
except asyncio.TimeoutError:
logger.warning("health_check_redis", status="down", reason="timeout")
return "down"
except Exception as e:
logger.warning("health_check_redis", status="down", error=str(e))
return "down"
async def check_ollama() -> Literal["up", "down"]:
"""
Check Ollama service via /api/tags endpoint
統帥鐵律: 真實 HTTP 請求,禁止假數據
"""
try:
async with httpx.AsyncClient(timeout=HEALTH_CHECK_TIMEOUT) as client:
response = await client.get(f"{settings.OLLAMA_URL}/api/tags")
if response.status_code == 200:
logger.debug("health_check_ollama", status="up")
return "up"
else:
logger.warning(
"health_check_ollama",
status="down",
status_code=response.status_code,
)
return "down"
except httpx.TimeoutException:
logger.warning("health_check_ollama", status="down", reason="timeout")
return "down"
except Exception as e:
logger.warning("health_check_ollama", status="down", error=str(e))
return "down"
async def check_openclaw() -> Literal["up", "down"]:
"""
Check OpenClaw service via /health endpoint
統帥鐵律: 真實 HTTP 請求,禁止假數據
"""
try:
async with httpx.AsyncClient(timeout=HEALTH_CHECK_TIMEOUT) as client:
response = await client.get(f"{settings.OPENCLAW_URL}/health")
if response.status_code == 200:
logger.debug("health_check_openclaw", status="up")
return "up"
else:
logger.warning(
"health_check_openclaw",
status="down",
status_code=response.status_code,
)
return "down"
except httpx.TimeoutException:
logger.warning("health_check_openclaw", status="down", reason="timeout")
return "down"
except Exception as e:
logger.warning("health_check_openclaw", status="down", error=str(e))
return "down"
@router.get("/health", response_model=HealthResponse)
async def get_health() -> HealthResponse:
"""
@@ -46,14 +191,34 @@ async def get_health() -> HealthResponse:
Returns overall system health and individual component statuses.
Used for monitoring dashboards and alerting.
統帥鐵律 2026-03-23: 禁止假數據,所有檢查必須真實連接
"""
# TODO: Implement actual async health checks
components = {
"api": "up",
"database": "up", # TODO: asyncpg ping
"redis": "up", # TODO: redis ping
"ollama": "up", # TODO: httpx check
"clawbot": "up", # TODO: httpx check
# API is always up if this endpoint responds
api_status: Literal["up", "down", "degraded"] = "up"
# Run all health checks concurrently with timeout protection
start_time = time.monotonic()
db_task = asyncio.create_task(check_database())
redis_task = asyncio.create_task(check_redis())
ollama_task = asyncio.create_task(check_ollama())
openclaw_task = asyncio.create_task(check_openclaw())
# Wait for all tasks (each has internal timeout)
db_status, redis_status, ollama_status, openclaw_status = await asyncio.gather(
db_task, redis_task, ollama_task, openclaw_task,
return_exceptions=False,
)
elapsed_ms = (time.monotonic() - start_time) * 1000
components: dict[str, Literal["up", "down", "degraded"]] = {
"api": api_status,
"database": db_status,
"redis": redis_status,
"ollama": ollama_status,
"openclaw": openclaw_status,
}
# Determine overall status
@@ -67,10 +232,11 @@ async def get_health() -> HealthResponse:
else:
overall_status = "healthy"
logger.debug(
logger.info(
"health_check",
status=overall_status,
components=components,
elapsed_ms=round(elapsed_ms, 2),
)
return HealthResponse(

View File

@@ -41,6 +41,13 @@ from .graph_rag import (
FullAnalysisResult,
create_mock_topology,
)
from .consensus_engine import (
ConsensusEngine,
get_consensus_engine,
ConsensusResult,
AgentOpinion,
AgentType,
)
__all__ = [
# Dry-Run
@@ -82,4 +89,10 @@ __all__ = [
"RootCauseResult",
"FullAnalysisResult",
"create_mock_topology",
# Consensus Engine (Phase 9.4)
"ConsensusEngine",
"get_consensus_engine",
"ConsensusResult",
"AgentOpinion",
"AgentType",
]

View File

@@ -19,7 +19,6 @@ from uuid import UUID
import structlog
from sqlalchemy import select, update, and_, or_
from sqlalchemy.ext.asyncio import AsyncSession
from src.db.base import get_db_context
from src.db.models import ApprovalRecord, TimelineEvent
@@ -572,6 +571,78 @@ class ApprovalDBService:
success=success,
)
# =========================================================================
# Phase 6.4h: Proposals API 支援方法
# =========================================================================
async def get_approval_by_id(self, approval_id: UUID) -> ApprovalRequest | None:
"""
根據 ID 取得單一授權請求 (Phase 6.4h)
Args:
approval_id: 授權請求 UUID
Returns:
ApprovalRequest if found, None otherwise
"""
async with get_db_context() as db:
result = await db.execute(
select(ApprovalRecord).where(ApprovalRecord.id == str(approval_id))
)
record = result.scalar_one_or_none()
if record is None:
return None
return approval_record_to_request(record)
async def get_all_approvals(
self,
status: ApprovalStatus | None = None,
incident_id: str | None = None,
limit: int = 50,
offset: int = 0,
) -> list[ApprovalRequest]:
"""
取得所有授權請求 (Phase 6.4h)
Args:
status: 狀態篩選 (可選)
incident_id: Incident ID 篩選 (可選)
limit: 每頁數量
offset: 偏移量
Returns:
ApprovalRequest 清單
"""
async with get_db_context() as db:
query = select(ApprovalRecord)
# 狀態篩選
if status is not None:
query = query.where(ApprovalRecord.status == status)
# Incident ID 篩選 (從 extra_metadata JSON 欄位)
# NOTE: 這是基於 JSON 欄位查詢,效能可能受影響
# 若有效能問題,考慮新增 incident_id 欄位到 ApprovalRecord
query = query.order_by(ApprovalRecord.created_at.desc())
query = query.offset(offset).limit(limit)
result = await db.execute(query)
records = result.scalars().all()
approvals = [approval_record_to_request(r) for r in records]
# 若有 incident_id 篩選,在應用層過濾
if incident_id:
approvals = [
a for a in approvals
if a.metadata and a.metadata.get("incident_id") == incident_id
]
return approvals
# =============================================================================
# Timeline Event Service

View File

@@ -25,11 +25,7 @@ import structlog
from src.core.config import settings
from src.models.ai import (
AIRiskLevel,
AIBlastRadius,
AIDataImpact,
ClawBotDecision,
SuggestedAction,
)
logger = structlog.get_logger(__name__)

View File

@@ -0,0 +1,637 @@
"""
Consensus Engine - Phase 9.4 多專家共識引擎
============================================
實作 Agent Teams 的共識機制,整合多個專家 Agent 的意見。
Features:
- 收集多個專家 Agent 的意見 (SRE, Security, Cost, Performance)
- 計算加權共識分數
- 產生最終整合決策
- 支援 Redis Working Memory 儲存
統帥鐵律:
- 所有專家意見必須被記錄 (CISO 可稽核性要求)
- 信心度低於 0.6 的意見權重降低
- 最終決策必須包含所有專家的推理過程
"""
import asyncio
import json
from datetime import datetime, timezone
from enum import Enum
from typing import Any
from uuid import uuid4
import structlog
from pydantic import BaseModel, Field, field_validator
from src.core.redis_client import get_redis
from src.models.incident import Incident
logger = structlog.get_logger(__name__)
# =============================================================================
# Agent Types (專家類型)
# =============================================================================
class AgentType(str, Enum):
"""專家 Agent 類型"""
SRE = "sre" # Site Reliability Engineer - 系統穩定性
SECURITY = "security" # Security Expert - 資安風險
COST = "cost" # FinOps Expert - 成本效益
PERFORMANCE = "performance" # Performance Expert - 效能優化
# =============================================================================
# Agent Opinion (專家意見)
# =============================================================================
class AgentOpinion(BaseModel):
"""
單一專家的意見
每個專家會針對同一個 Incident 提出自己的分析與建議
"""
agent_type: AgentType
action: str
reasoning: str
confidence: float = Field(ge=0.0, le=1.0, description="信心度 0-1")
risk_assessment: str
kubectl_command: str | None = None
priority: int = Field(default=5, ge=1, le=10, description="優先度 1-10, 10 最高")
estimated_impact: dict[str, Any] = Field(default_factory=dict)
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
model_config = {"use_enum_values": False}
@field_validator("confidence", mode="before")
@classmethod
def clamp_confidence(cls, v: float) -> float:
"""Clamp confidence to 0-1 range"""
return min(max(v, 0.0), 1.0)
def to_dict(self) -> dict[str, Any]:
return {
"agent_type": self.agent_type.value,
"action": self.action,
"reasoning": self.reasoning,
"confidence": self.confidence,
"risk_assessment": self.risk_assessment,
"kubectl_command": self.kubectl_command,
"priority": self.priority,
"estimated_impact": self.estimated_impact,
"created_at": self.created_at.isoformat(),
}
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AgentOpinion":
return cls(
agent_type=AgentType(data["agent_type"]),
action=data["action"],
reasoning=data["reasoning"],
confidence=data["confidence"],
risk_assessment=data["risk_assessment"],
kubectl_command=data.get("kubectl_command"),
priority=data.get("priority", 5),
estimated_impact=data.get("estimated_impact", {}),
)
# =============================================================================
# Consensus Result (共識結果)
# =============================================================================
class ConsensusResult(BaseModel):
"""
共識引擎的最終決策結果
包含:
- 所有專家意見 (CISO 可稽核性)
- 加權共識分數
- 最終推薦行動
- 決策理由
"""
consensus_id: str
incident_id: str
opinions: list[AgentOpinion]
consensus_score: float = Field(ge=0.0, le=1.0, description="共識分數 0-1")
recommended_action: str
recommended_kubectl: str | None = None
final_reasoning: str
risk_level: str
dissenting_opinions: list[str] = Field(default_factory=list)
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
model_config = {"use_enum_values": False}
def to_dict(self) -> dict[str, Any]:
return {
"consensus_id": self.consensus_id,
"incident_id": self.incident_id,
"opinions": [op.to_dict() for op in self.opinions],
"consensus_score": self.consensus_score,
"recommended_action": self.recommended_action,
"recommended_kubectl": self.recommended_kubectl,
"final_reasoning": self.final_reasoning,
"risk_level": self.risk_level,
"dissenting_opinions": self.dissenting_opinions,
"created_at": self.created_at.isoformat(),
"agent_count": len(self.opinions),
}
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ConsensusResult":
return cls(
consensus_id=data["consensus_id"],
incident_id=data["incident_id"],
opinions=[AgentOpinion.from_dict(op) for op in data["opinions"]],
consensus_score=data["consensus_score"],
recommended_action=data["recommended_action"],
recommended_kubectl=data.get("recommended_kubectl"),
final_reasoning=data["final_reasoning"],
risk_level=data["risk_level"],
dissenting_opinions=data.get("dissenting_opinions", []),
)
# =============================================================================
# Expert Agent Base (專家 Agent 基類)
# =============================================================================
class ExpertAgent:
"""
專家 Agent 基類
每個專家會從自己的角度分析 Incident
子類別實作 analyze() 方法
"""
agent_type: AgentType
async def analyze(self, incident: Incident) -> AgentOpinion:
"""
分析 Incident 並產生意見
子類別必須實作此方法
"""
raise NotImplementedError
class SREAgent(ExpertAgent):
"""SRE 專家 - 專注系統穩定性與可用性"""
agent_type = AgentType.SRE
async def analyze(self, incident: Incident) -> AgentOpinion:
"""SRE 視角分析"""
# 分析 signals 決定建議
alert_names = " ".join([s.alert_name.lower() for s in incident.signals])
target = incident.affected_services[0] if incident.affected_services else "unknown"
# SRE 規則引擎
if any(kw in alert_names for kw in ["crash", "restart", "oom", "killed"]):
action = "重新啟動服務以恢復穩定性"
kubectl = f"kubectl rollout restart deployment/{target} -n awoooi-prod"
confidence = 0.85
risk = "medium"
elif any(kw in alert_names for kw in ["latency", "slow", "timeout"]):
action = "擴展副本數以分散負載"
kubectl = f"kubectl scale deployment/{target} --replicas=3 -n awoooi-prod"
confidence = 0.80
risk = "low"
elif any(kw in alert_names for kw in ["cpu", "memory", "resource"]):
action = "調整資源限制或擴展副本"
kubectl = f"kubectl scale deployment/{target} --replicas=2 -n awoooi-prod"
confidence = 0.75
risk = "medium"
else:
action = "進行安全重啟以排除未知問題"
kubectl = f"kubectl rollout restart deployment/{target} -n awoooi-prod"
confidence = 0.60
risk = "medium"
return AgentOpinion(
agent_type=self.agent_type,
action=action,
reasoning=f"SRE 分析: 根據告警 {alert_names[:50]} 判斷服務 {target} 需要 {action}",
confidence=confidence,
risk_assessment=f"SRE 評估風險等級: {risk},預計恢復時間 < 5 分鐘",
kubectl_command=kubectl,
priority=8 if incident.severity.value in ["P0", "P1"] else 5,
estimated_impact={
"downtime_seconds": 30 if "restart" in action else 0,
"affected_users": "minimal",
},
)
class SecurityAgent(ExpertAgent):
"""資安專家 - 專注安全風險評估"""
agent_type = AgentType.SECURITY
async def analyze(self, incident: Incident) -> AgentOpinion:
"""資安視角分析"""
target = incident.affected_services[0] if incident.affected_services else "unknown"
alert_names = " ".join([s.alert_name.lower() for s in incident.signals])
# 資安掃描
security_concerns = []
if any(kw in alert_names for kw in ["auth", "login", "401", "403"]):
security_concerns.append("可能存在認證問題")
if any(kw in alert_names for kw in ["injection", "xss", "csrf"]):
security_concerns.append("可能存在注入攻擊")
if any(kw in alert_names for kw in ["rate", "ddos", "flood"]):
security_concerns.append("可能存在 DoS 攻擊")
if security_concerns:
action = "建議先隔離受影響服務,啟用 NetworkPolicy 限制"
confidence = 0.70
risk = "critical"
else:
action = "無明顯資安風險,建議 SRE 處理"
confidence = 0.85
risk = "low"
return AgentOpinion(
agent_type=self.agent_type,
action=action,
reasoning=f"Security 分析: {'; '.join(security_concerns) if security_concerns else '未發現資安威脅'}",
confidence=confidence,
risk_assessment=f"資安風險等級: {risk}",
kubectl_command=None, # 資安建議通常需要人工審核
priority=9 if security_concerns else 3,
estimated_impact={
"security_risk": "high" if security_concerns else "none",
"requires_audit": bool(security_concerns),
},
)
class CostAgent(ExpertAgent):
"""成本專家 - 專注資源效益分析"""
agent_type = AgentType.COST
async def analyze(self, incident: Incident) -> AgentOpinion:
"""成本視角分析"""
target = incident.affected_services[0] if incident.affected_services else "unknown"
# 成本評估 (假設每個副本每小時 $0.05)
action = "建議使用 HPA 自動擴展而非固定擴容,以優化成本"
kubectl = f"kubectl autoscale deployment/{target} --cpu-percent=70 --min=2 --max=5 -n awoooi-prod"
return AgentOpinion(
agent_type=self.agent_type,
action=action,
reasoning="FinOps 分析: 使用 HPA 可在負載降低後自動縮減,相比固定擴容可節省約 40% 成本",
confidence=0.75,
risk_assessment="成本風險: low使用 HPA 可自動調節",
kubectl_command=kubectl,
priority=4,
estimated_impact={
"monthly_cost_change": "+$15 to +$50",
"cost_optimization": "HPA 自動縮減",
},
)
class PerformanceAgent(ExpertAgent):
"""效能專家 - 專注性能優化"""
agent_type = AgentType.PERFORMANCE
async def analyze(self, incident: Incident) -> AgentOpinion:
"""效能視角分析"""
target = incident.affected_services[0] if incident.affected_services else "unknown"
alert_names = " ".join([s.alert_name.lower() for s in incident.signals])
if any(kw in alert_names for kw in ["latency", "p99", "slow"]):
action = "建議增加資源限制並啟用 PodDisruptionBudget"
kubectl = f"kubectl patch deployment/{target} -n awoooi-prod -p '{{\"spec\":{{\"template\":{{\"spec\":{{\"containers\":[{{\"name\":\"{target}\",\"resources\":{{\"limits\":{{\"cpu\":\"2\",\"memory\":\"2Gi\"}}}}}}]}}}}}}}}'"
confidence = 0.80
else:
action = "當前效能指標正常,建議觀察"
kubectl = None
confidence = 0.70
return AgentOpinion(
agent_type=self.agent_type,
action=action,
reasoning=f"Performance 分析: 根據 P99 latency 指標,{action}",
confidence=confidence,
risk_assessment="效能風險: medium資源調整可能影響其他 Pod",
kubectl_command=kubectl,
priority=6,
estimated_impact={
"latency_improvement": "預計 P99 降低 30%",
"resource_increase": "+1 CPU, +1Gi Memory",
},
)
# =============================================================================
# Consensus Engine
# =============================================================================
CONSENSUS_PREFIX = "consensus:"
CONSENSUS_TTL = 3600 # 1 小時
class ConsensusEngine:
"""
共識引擎 - Phase 9.4 核心
職責:
1. 收集所有專家 Agent 的意見
2. 計算加權共識分數
3. 產生最終整合決策
4. 儲存結果到 Redis (Working Memory)
共識計算規則:
- 高信心度意見權重較高
- 同類型建議會強化共識
- 分歧意見會降低共識分數
"""
def __init__(self):
self._agents: list[ExpertAgent] = [
SREAgent(),
SecurityAgent(),
CostAgent(),
PerformanceAgent(),
]
async def gather_opinions(
self,
incident: Incident,
timeout_sec: float = 30.0,
) -> list[AgentOpinion]:
"""
收集所有專家的意見
並行執行所有專家分析,使用 timeout 防止單一專家阻塞
"""
async def safe_analyze(agent: ExpertAgent) -> AgentOpinion | None:
try:
return await asyncio.wait_for(
agent.analyze(incident),
timeout=timeout_sec / len(self._agents),
)
except asyncio.TimeoutError:
logger.warning(
"agent_analyze_timeout",
agent_type=agent.agent_type.value,
incident_id=incident.incident_id,
)
return None
except Exception as e:
logger.exception(
"agent_analyze_error",
agent_type=agent.agent_type.value,
error=str(e),
)
return None
# 並行執行所有專家分析
results = await asyncio.gather(
*[safe_analyze(agent) for agent in self._agents],
return_exceptions=False,
)
opinions = [r for r in results if r is not None]
logger.info(
"opinions_gathered",
incident_id=incident.incident_id,
total_agents=len(self._agents),
successful_opinions=len(opinions),
)
return opinions
def calculate_consensus(
self,
opinions: list[AgentOpinion],
) -> tuple[float, str, list[str]]:
"""
計算共識分數
算法:
1. 按 action 類型分組
2. 計算加權投票 (confidence * priority)
3. 最高票數的 action 為推薦
4. 共識分數 = 最高票 / 總票數
Returns:
(consensus_score, recommended_action, dissenting_opinions)
"""
if not opinions:
return 0.0, "NO_ACTION", []
# 按 action 分組計算加權票數
action_votes: dict[str, float] = {}
action_details: dict[str, list[AgentOpinion]] = {}
for opinion in opinions:
# 低信心度意見權重降低
weight_multiplier = 1.0 if opinion.confidence >= 0.6 else 0.5
vote_weight = opinion.confidence * opinion.priority * weight_multiplier
# 簡化 action 到類別
action_key = self._normalize_action(opinion.action)
if action_key not in action_votes:
action_votes[action_key] = 0.0
action_details[action_key] = []
action_votes[action_key] += vote_weight
action_details[action_key].append(opinion)
# 找出最高票
total_votes = sum(action_votes.values())
if total_votes == 0:
return 0.0, "NO_ACTION", []
winner_action = max(action_votes.keys(), key=lambda k: action_votes[k])
consensus_score = action_votes[winner_action] / total_votes
# 找出分歧意見 (非主流意見)
dissenting = []
for action_key, ops in action_details.items():
if action_key != winner_action:
for op in ops:
dissenting.append(
f"{op.agent_type.value}: {op.action} (信心度: {op.confidence:.0%})"
)
logger.info(
"consensus_calculated",
winner_action=winner_action,
consensus_score=consensus_score,
total_votes=total_votes,
dissenting_count=len(dissenting),
)
return consensus_score, winner_action, dissenting
def _normalize_action(self, action: str) -> str:
"""將 action 正規化到類別"""
action_lower = action.lower()
if any(kw in action_lower for kw in ["重啟", "restart"]):
return "RESTART"
elif any(kw in action_lower for kw in ["擴展", "scale", "副本"]):
return "SCALE"
elif any(kw in action_lower for kw in ["hpa", "autoscale"]):
return "HPA"
elif any(kw in action_lower for kw in ["隔離", "isolate", "network"]):
return "ISOLATE"
elif any(kw in action_lower for kw in ["資源", "resource", "limit"]):
return "TUNE_RESOURCES"
elif any(kw in action_lower for kw in ["觀察", "observe", "正常"]):
return "OBSERVE"
else:
return "OTHER"
async def generate_final_decision(
self,
incident: Incident,
opinions: list[AgentOpinion],
consensus_score: float,
recommended_action_type: str,
dissenting: list[str],
) -> ConsensusResult:
"""
產生最終決策
整合所有專家意見,產生結構化的 ConsensusResult
"""
consensus_id = f"CON-{datetime.now(timezone.utc).strftime('%Y%m%d')}-{uuid4().hex[:8].upper()}"
# 找出最佳的 kubectl 指令 (來自最高 priority + confidence 的意見)
best_kubectl = None
best_score = 0.0
best_action_detail = ""
for op in opinions:
if self._normalize_action(op.action) == recommended_action_type:
score = op.confidence * op.priority
if score > best_score and op.kubectl_command:
best_score = score
best_kubectl = op.kubectl_command
best_action_detail = op.action
# 決定風險等級
if consensus_score >= 0.8:
risk_level = "low"
elif consensus_score >= 0.6:
risk_level = "medium"
else:
risk_level = "critical" # 共識不足,需人工審核
# 組合最終推理
reasoning_parts = []
for op in opinions:
reasoning_parts.append(f"[{op.agent_type.value.upper()}] {op.reasoning}")
final_reasoning = (
f"共識引擎整合 {len(opinions)} 位專家意見:\n"
+ "\n".join(reasoning_parts)
+ f"\n\n最終共識: {recommended_action_type} (共識度: {consensus_score:.0%})"
)
result = ConsensusResult(
consensus_id=consensus_id,
incident_id=incident.incident_id,
opinions=opinions,
consensus_score=consensus_score,
recommended_action=best_action_detail or recommended_action_type,
recommended_kubectl=best_kubectl,
final_reasoning=final_reasoning,
risk_level=risk_level,
dissenting_opinions=dissenting,
)
# 儲存到 Redis
await self._save_consensus(result)
logger.info(
"consensus_generated",
consensus_id=consensus_id,
incident_id=incident.incident_id,
consensus_score=consensus_score,
risk_level=risk_level,
)
return result
async def run_consensus(
self,
incident: Incident,
timeout_sec: float = 30.0,
) -> ConsensusResult:
"""
執行完整的共識流程
這是對外的主要 API:
1. 收集意見
2. 計算共識
3. 產生決策
"""
# Step 1: 收集意見
opinions = await self.gather_opinions(incident, timeout_sec)
# Step 2: 計算共識
consensus_score, recommended_action, dissenting = self.calculate_consensus(opinions)
# Step 3: 產生決策
result = await self.generate_final_decision(
incident=incident,
opinions=opinions,
consensus_score=consensus_score,
recommended_action_type=recommended_action,
dissenting=dissenting,
)
return result
async def _save_consensus(self, result: ConsensusResult) -> None:
"""儲存共識結果到 Redis"""
redis_client = get_redis()
key = f"{CONSENSUS_PREFIX}{result.consensus_id}"
await redis_client.set(
key,
json.dumps(result.to_dict()),
ex=CONSENSUS_TTL,
)
async def get_consensus(self, consensus_id: str) -> ConsensusResult | None:
"""取得共識結果"""
redis_client = get_redis()
key = f"{CONSENSUS_PREFIX}{consensus_id}"
data = await redis_client.get(key)
if data:
return ConsensusResult.from_dict(json.loads(data))
return None
# =============================================================================
# Singleton
# =============================================================================
_consensus_engine: ConsensusEngine | None = None
def get_consensus_engine() -> ConsensusEngine:
"""取得 ConsensusEngine 實例 (Singleton)"""
global _consensus_engine
if _consensus_engine is None:
_consensus_engine = ConsensusEngine()
return _consensus_engine

View File

@@ -22,13 +22,13 @@ Decision Manager - Phase 6.5 非同步決策狀態機
import asyncio
from datetime import datetime, timezone
from enum import Enum
from typing import Any, Literal
from typing import Any
from uuid import uuid4
import structlog
from src.core.redis_client import get_redis
from src.models.incident import Incident, IncidentStatus, Severity
from src.models.incident import Incident
from src.services.openclaw import get_openclaw
logger = structlog.get_logger(__name__)
@@ -425,6 +425,124 @@ class DecisionManager:
await self._save_token(token)
return token
async def get_or_create_decision_with_consensus(
self,
incident: Incident,
timeout_sec: float = 30.0,
use_consensus: bool = True,
) -> DecisionToken:
"""
取得或建立決策令牌 (含 Agent Teams 共識)
Phase 9.4 升級版本:
- 對於 P0/P1 事件,自動啟用 ConsensusEngine
- 整合多專家意見
- 共識分數影響風險評估
Args:
incident: 事件
timeout_sec: 超時秒數
use_consensus: 是否使用共識引擎 (預設 True)
Returns:
DecisionToken
"""
# 判斷是否需要共識 (P0/P1 或明確要求)
should_use_consensus = use_consensus and incident.severity.value in ["P0", "P1"]
if not should_use_consensus:
# 使用原有的雙軌決策
return await self.get_or_create_decision(incident, timeout_sec)
# Phase 9.4: 使用 ConsensusEngine
from src.services.consensus_engine import get_consensus_engine
consensus_engine = get_consensus_engine()
# 檢查現有 token
existing_token = await self._find_existing_token(incident.incident_id)
if existing_token and existing_token.state in (
DecisionState.READY,
DecisionState.EXECUTING,
DecisionState.COMPLETED,
):
return existing_token
# 建立新 token
token = DecisionToken(
token=f"DEC-{uuid4().hex[:12].upper()}",
incident_id=incident.incident_id,
state=DecisionState.ANALYZING,
)
await self._save_token(token)
logger.info(
"decision_analyzing_with_consensus",
token=token.token,
incident_id=incident.incident_id,
)
try:
# 執行共識分析
consensus_result = await asyncio.wait_for(
consensus_engine.run_consensus(incident, timeout_sec),
timeout=timeout_sec,
)
# 轉換為 proposal_data 格式
proposal_data = {
"source": "consensus_engine",
"consensus_id": consensus_result.consensus_id,
"consensus_score": consensus_result.consensus_score,
"action": consensus_result.recommended_action,
"description": consensus_result.final_reasoning,
"risk_level": consensus_result.risk_level,
"kubectl_command": consensus_result.recommended_kubectl,
"reasoning": consensus_result.final_reasoning,
"confidence": consensus_result.consensus_score,
"agent_count": len(consensus_result.opinions),
"dissenting_opinions": consensus_result.dissenting_opinions,
"from_cache": False,
}
token.state = DecisionState.READY
token.proposal_data = proposal_data
token.updated_at = datetime.now(timezone.utc)
logger.info(
"decision_ready_with_consensus",
token=token.token,
consensus_id=consensus_result.consensus_id,
consensus_score=consensus_result.consensus_score,
)
except asyncio.TimeoutError:
logger.warning(
"consensus_timeout_using_expert",
token=token.token,
timeout_sec=timeout_sec,
)
# Fallback 到 Expert System
expert_result = expert_analyze(incident)
token.state = DecisionState.READY
token.proposal_data = expert_result
token.updated_at = datetime.now(timezone.utc)
except Exception as e:
logger.exception(
"consensus_error_using_expert",
token=token.token,
error=str(e),
)
expert_result = expert_analyze(incident)
token.state = DecisionState.READY
token.proposal_data = expert_result
token.error = str(e)
token.updated_at = datetime.now(timezone.utc)
await self._save_token(token)
return token
# =============================================================================
# Singleton

View File

@@ -31,7 +31,7 @@ import structlog
from src.core.config import settings
from src.db.base import get_db_context
from src.db.models import AuditLog
from src.models.approval import ApprovalRequest, ApprovalStatus
from src.models.approval import ApprovalRequest
logger = structlog.get_logger(__name__)
@@ -600,7 +600,6 @@ class ActionExecutor:
Returns:
ExecutionResult: 執行結果
"""
import shlex
start_time = time.monotonic()
# 安全檢查: 必須是 kubectl 指令

View File

@@ -1,6 +1,11 @@
"""
Incident Engine v1.1 - Phase 6.3 認知覺醒核心 (效能強化版)
============================================================
Incident Engine v1.2 - Phase 6.4e DualMemory 整合版
====================================================
v1.2 重構內容 (Phase 6.4e):
- 整合 DualIncidentMemory 進行 DB 持久化
- 保持 Lua 原子操作進行 Redis Working Memory 更新
- 支援從 Episodic Memory (PostgreSQL) 回載 Incident
v1.1 重構內容 (2026-03-22 架構師審查後修正):
1. O(1) 反向索引: 廢除 SCAN改用 namespace/target 索引直查
@@ -30,15 +35,13 @@ from typing import Any
import structlog
from src.core.redis_client import get_redis
from src.db.base import get_db_context
from src.db.models import IncidentRecord
from src.models.incident import (
Incident,
IncidentStatus,
Severity,
Signal,
)
from src.services.graph_rag import topology_graph, BlastRadiusResult
from src.services.incident_memory import DualIncidentMemory, get_incident_memory
logger = structlog.get_logger(__name__)
@@ -254,8 +257,15 @@ class IncidentEngine:
incident = await engine.process_signal(signal_data)
"""
def __init__(self) -> None:
def __init__(self, memory: DualIncidentMemory | None = None) -> None:
"""
初始化 IncidentEngine
Args:
memory: DualIncidentMemory 實例 (可選,預設使用 Singleton)
"""
self._graph = topology_graph
self._memory = memory or get_incident_memory()
self._lua_aggregate_sha: str | None = None
self._lua_create_sha: str | None = None
@@ -519,75 +529,53 @@ class IncidentEngine:
incident.affected_services.append(target)
# =========================================================================
# 持久化 (DB 層)
# 持久化 (DB 層) - Phase 6.4e: 委託給 DualIncidentMemory
# =========================================================================
async def _persist_to_db(self, incident: Incident) -> None:
"""
持久化到 SQLite/PostgreSQL (Episodic Memory)
持久化到 PostgreSQL (Episodic Memory)
Phase 6.4e: 委託給 DualIncidentMemory.persist_incident()
Redis 已在 Lua Script 中更新,這裡只處理 DB
"""
try:
async with get_db_context() as db:
from sqlalchemy import select
success = await self._memory.persist_incident(incident)
incident.persisted_to_pg = success
# 檢查是否已存在
stmt = select(IncidentRecord).where(
IncidentRecord.incident_id == incident.incident_id
if success:
logger.debug(
"db_persisted_via_dual_memory",
incident_id=incident.incident_id,
)
else:
logger.warning(
"db_persist_failed_via_dual_memory",
incident_id=incident.incident_id,
)
result = await db.execute(stmt)
existing = result.scalar_one_or_none()
if existing:
# 更新現有記錄
existing.status = incident.status.value
existing.severity = incident.severity.value
existing.signals = [
s.model_dump(mode="json") for s in incident.signals
]
existing.affected_services = incident.affected_services
existing.updated_at = incident.updated_at
else:
# 建立新記錄
record = IncidentRecord(
incident_id=incident.incident_id,
status=incident.status.value,
severity=incident.severity.value,
signals=[
s.model_dump(mode="json") for s in incident.signals
],
affected_services=incident.affected_services,
decision_chain=(
incident.decision_chain.model_dump(mode="json")
if incident.decision_chain
else None
),
proposal_ids=[str(pid) for pid in incident.proposal_ids],
outcome=(
incident.outcome.model_dump(mode="json")
if incident.outcome
else None
),
created_at=incident.created_at,
updated_at=incident.updated_at,
resolved_at=incident.resolved_at,
closed_at=incident.closed_at,
ttl_days=incident.ttl_days,
vectorized=incident.vectorized,
)
db.add(record)
incident.persisted_to_pg = True
logger.debug(
"db_persisted",
incident_id=incident.incident_id,
)
except Exception as e:
logger.exception("db_save_error", error=str(e))
# =========================================================================
# 從 Episodic Memory 載入 (Phase 6.4e 新增)
# =========================================================================
async def get_incident(self, incident_id: str) -> Incident | None:
"""
取得 Incident
Phase 6.4e: 委託給 DualIncidentMemory.load_incident()
優先從 Working Memory (Redis) 讀取miss 時從 Episodic (PostgreSQL) 讀取
Args:
incident_id: Incident ID
Returns:
Incident 或 None
"""
return await self._memory.load_incident(incident_id)
# =========================================================================
# 輔助方法
# =========================================================================

View File

@@ -0,0 +1,483 @@
"""
Incident Memory Provider - 事件記憶體提供者
============================================
Phase 6.4e: DualIncidentMemory 整合
設計:
- 實作 IIncidentMemory 協定 (Protocol)
- 雙層記憶體: Working (Redis) + Episodic (PostgreSQL)
- 反向索引: namespace:target -> incident_id
統帥鐵律:
- Working Memory (Redis): 7 天 TTL
- Episodic Memory (PostgreSQL): 永久
- 反向索引: 30 分鐘 TTL (聚合窗口)
NOTE: 此模組為 lewooogo-brain/adapters/incident_memory.py 的 apps/api 內嵌版本
待 Phase 6.4i 完成 monorepo Docker 解法後,將直接引用 lewooogo-brain 套件
"""
from datetime import datetime, timezone, timedelta
from typing import Any, Protocol
import structlog
from src.core.redis_client import get_redis
from src.db.base import get_db_context
from src.db.models import IncidentRecord
from src.models.incident import Incident
logger = structlog.get_logger(__name__)
# =============================================================================
# Constants
# =============================================================================
WORKING_MEMORY_TTL = 604800 # 7 天
AGGREGATION_WINDOW_MINUTES = 30
INDEX_TTL = 1800 # 索引 30 分鐘 TTL
# Redis Key Patterns
INCIDENT_KEY_PREFIX = "awoooi:incidents:"
INDEX_PREFIX = "awoooi:incidents:index:"
# =============================================================================
# Protocol Definition (與 lewooogo-brain 保持一致)
# =============================================================================
class IIncidentMemory(Protocol):
"""Incident 專用記憶體提供者協定"""
async def load_incident(self, incident_id: str) -> Incident | None:
"""從 Working Memory 載入 Incident"""
...
async def save_incident(self, incident: Incident, ttl_seconds: int = WORKING_MEMORY_TTL) -> bool:
"""儲存 Incident 到 Working Memory (預設 7 天 TTL)"""
...
async def persist_incident(self, incident: Incident) -> bool:
"""持久化到 Episodic Memory (PostgreSQL)"""
...
async def find_related_incident(
self,
namespace: str,
target: str,
window_minutes: int = AGGREGATION_WINDOW_MINUTES,
) -> Incident | None:
"""尋找相關的活躍 Incident (用於聚合)"""
...
async def update_index(
self,
incident_id: str,
namespace: str,
target: str,
) -> bool:
"""更新反向索引 (namespace/target -> incident_id)"""
...
# =============================================================================
# DualIncidentMemory Implementation
# =============================================================================
class DualIncidentMemory:
"""
Incident 專用雙層記憶體適配器
實作 IIncidentMemory 協定:
- load_incident: 從 Working/Episodic 載入
- save_incident: 儲存到 Working
- persist_incident: 持久化到 Episodic
- find_related_incident: 透過反向索引尋找相關 Incident
- update_index: 更新反向索引
反向索引結構:
Key: awoooi:incidents:index:{namespace}:{target}
Value: incident_id
TTL: 30 分鐘 (聚合窗口)
"""
def __init__(self, redis_client: Any = None, key_prefix: str = INCIDENT_KEY_PREFIX):
"""
初始化適配器
Args:
redis_client: Redis 連線客戶端 (可選,預設使用 get_redis())
key_prefix: Redis Key 前綴
"""
self._redis = redis_client
self._key_prefix = key_prefix
self._index_prefix = INDEX_PREFIX
def _get_redis(self) -> Any:
"""取得 Redis 客戶端 (延遲初始化)"""
if self._redis is None:
self._redis = get_redis()
return self._redis
def _make_key(self, incident_id: str) -> str:
"""生成 Incident Key"""
return f"{self._key_prefix}{incident_id}"
def _make_index_key(self, namespace: str, target: str) -> str:
"""生成索引 Key"""
return f"{self._index_prefix}{namespace}:{target}"
async def load_incident(self, incident_id: str) -> Incident | None:
"""
載入 Incident
策略:
1. 從 Redis (Working Memory) 讀取
2. 若 miss從 PostgreSQL (Episodic) 讀取
Args:
incident_id: Incident ID
Returns:
Incident 或 None
"""
try:
redis_client = self._get_redis()
key = self._make_key(incident_id)
data = await redis_client.get(key)
if data is not None:
# JSON -> Incident
return Incident.model_validate_json(data)
# Working Memory miss, 嘗試從 Episodic Memory 載入
logger.debug("incident_not_found_in_working", incident_id=incident_id)
async with get_db_context() as db:
from sqlalchemy import select
stmt = select(IncidentRecord).where(
IncidentRecord.incident_id == incident_id
)
result = await db.execute(stmt)
record = result.scalar_one_or_none()
if record:
# 從 DB 重建 Incident
incident = self._record_to_incident(record)
# 寫回 Working Memory (快取)
await self.save_incident(incident)
return incident
return None
except Exception as e:
logger.error("load_incident_failed", incident_id=incident_id, error=str(e))
return None
async def save_incident(
self,
incident: Incident,
ttl_seconds: int = WORKING_MEMORY_TTL,
) -> bool:
"""
儲存 Incident 到 Working Memory (Redis)
Args:
incident: Incident 物件
ttl_seconds: TTL (預設 7 天)
Returns:
是否成功
"""
try:
redis_client = self._get_redis()
key = self._make_key(incident.incident_id)
json_data = incident.model_dump_json()
await redis_client.setex(key, ttl_seconds, json_data)
logger.debug(
"incident_saved_to_working",
incident_id=incident.incident_id,
ttl=ttl_seconds,
)
return True
except Exception as e:
logger.error(
"save_incident_failed",
incident_id=incident.incident_id,
error=str(e),
)
return False
async def persist_incident(self, incident: Incident) -> bool:
"""
持久化到 Episodic Memory (PostgreSQL)
Args:
incident: Incident 物件
Returns:
是否成功
"""
try:
async with get_db_context() as db:
from sqlalchemy import select
# 檢查是否已存在
stmt = select(IncidentRecord).where(
IncidentRecord.incident_id == incident.incident_id
)
result = await db.execute(stmt)
existing = result.scalar_one_or_none()
if existing:
# 更新現有記錄
existing.status = incident.status.value
existing.severity = incident.severity.value
existing.signals = [
s.model_dump(mode="json") for s in incident.signals
]
existing.affected_services = incident.affected_services
existing.updated_at = incident.updated_at
if incident.resolved_at:
existing.resolved_at = incident.resolved_at
if incident.closed_at:
existing.closed_at = incident.closed_at
else:
# 建立新記錄
record = IncidentRecord(
incident_id=incident.incident_id,
status=incident.status.value,
severity=incident.severity.value,
signals=[
s.model_dump(mode="json") for s in incident.signals
],
affected_services=incident.affected_services,
decision_chain=(
incident.decision_chain.model_dump(mode="json")
if incident.decision_chain
else None
),
proposal_ids=[str(pid) for pid in incident.proposal_ids],
outcome=(
incident.outcome.model_dump(mode="json")
if incident.outcome
else None
),
created_at=incident.created_at,
updated_at=incident.updated_at,
resolved_at=incident.resolved_at,
closed_at=incident.closed_at,
ttl_days=incident.ttl_days,
vectorized=incident.vectorized,
)
db.add(record)
logger.debug(
"incident_persisted_to_episodic",
incident_id=incident.incident_id,
)
return True
except Exception as e:
logger.error(
"persist_incident_failed",
incident_id=incident.incident_id,
error=str(e),
)
return False
async def find_related_incident(
self,
namespace: str,
target: str,
window_minutes: int = AGGREGATION_WINDOW_MINUTES,
) -> Incident | None:
"""
尋找相關的活躍 Incident (用於聚合)
透過反向索引快速查找:
1. 查詢索引 Key: namespace:target -> incident_id
2. 載入 Incident
3. 檢查是否仍在聚合窗口內
Args:
namespace: 命名空間
target: 目標服務
window_minutes: 聚合窗口 (分鐘)
Returns:
相關 Incident 或 None
"""
try:
redis_client = self._get_redis()
# Step 1: 查詢索引
index_key = self._make_index_key(namespace, target)
incident_id = await redis_client.get(index_key)
if incident_id is None:
return None
# 解碼 bytes
if isinstance(incident_id, bytes):
incident_id = incident_id.decode()
# Step 2: 載入 Incident
incident = await self.load_incident(incident_id)
if incident is None:
# 索引存在但 Incident 不存在,清除索引
await redis_client.delete(index_key)
return None
# Step 3: 檢查聚合窗口
window_start = datetime.now(timezone.utc) - timedelta(minutes=window_minutes)
if incident.updated_at < window_start:
# 超出聚合窗口,不聚合
logger.debug(
"incident_outside_window",
incident_id=incident_id,
updated_at=incident.updated_at.isoformat(),
)
return None
logger.debug(
"found_related_incident",
incident_id=incident_id,
namespace=namespace,
target=target,
)
return incident
except Exception as e:
logger.error(
"find_related_incident_failed",
namespace=namespace,
target=target,
error=str(e),
)
return None
async def update_index(
self,
incident_id: str,
namespace: str,
target: str,
) -> bool:
"""
更新反向索引
索引結構:
Key: awoooi:incidents:index:{namespace}:{target}
Value: incident_id
TTL: 30 分鐘
Args:
incident_id: Incident ID
namespace: 命名空間
target: 目標服務
Returns:
是否成功
"""
try:
redis_client = self._get_redis()
index_key = self._make_index_key(namespace, target)
await redis_client.setex(index_key, INDEX_TTL, incident_id)
logger.debug(
"index_updated",
incident_id=incident_id,
namespace=namespace,
target=target,
ttl=INDEX_TTL,
)
return True
except Exception as e:
logger.error(
"update_index_failed",
incident_id=incident_id,
namespace=namespace,
target=target,
error=str(e),
)
return False
async def delete_incident(self, incident_id: str) -> bool:
"""
刪除 Incident
Args:
incident_id: Incident ID
Returns:
是否成功
"""
try:
redis_client = self._get_redis()
key = self._make_key(incident_id)
result = await redis_client.delete(key)
return result > 0
except Exception as e:
logger.error(
"delete_incident_failed",
incident_id=incident_id,
error=str(e),
)
return False
def _record_to_incident(self, record: IncidentRecord) -> Incident:
"""
將 DB Record 轉換為 Incident 物件
Args:
record: IncidentRecord
Returns:
Incident
"""
from src.models.incident import (
IncidentStatus,
Severity,
Signal,
)
# 重建 Signals
signals = []
for s in record.signals or []:
signals.append(Signal.model_validate(s))
return Incident(
incident_id=record.incident_id,
status=IncidentStatus(record.status),
severity=Severity(record.severity),
signals=signals,
affected_services=record.affected_services or [],
proposal_ids=record.proposal_ids or [],
created_at=record.created_at,
updated_at=record.updated_at,
resolved_at=record.resolved_at,
closed_at=record.closed_at,
ttl_days=record.ttl_days or 30,
vectorized=record.vectorized or False,
)
# =============================================================================
# Singleton
# =============================================================================
_dual_memory: DualIncidentMemory | None = None
def get_incident_memory() -> DualIncidentMemory:
"""取得 DualIncidentMemory 實例 (Singleton)"""
global _dual_memory
if _dual_memory is None:
_dual_memory = DualIncidentMemory()
return _dual_memory

View File

@@ -17,7 +17,6 @@ Features:
import json
from datetime import datetime, timezone
from typing import Any
from uuid import UUID
import structlog

View File

@@ -10,7 +10,6 @@ Phase 6: leWOOOgo Output Plugins
"""
import httpx
from datetime import datetime, timezone
from src.core.config import settings
from src.core.logging import get_logger

View File

@@ -30,11 +30,7 @@ import structlog
from src.core.config import settings
from src.core.redis_client import get_redis
from src.models.ai import (
AIRiskLevel,
AIBlastRadius,
AIDataImpact,
OpenClawDecision,
SuggestedAction,
)
from src.services.signoz_client import get_signoz_client, GoldMetrics

View File

@@ -29,7 +29,6 @@ from src.db.models import IncidentRecord
from src.models.approval import (
ApprovalRequest,
ApprovalRequestCreate,
ApprovalRequestResponse,
BlastRadius,
DataImpact,
DryRunCheck,
@@ -41,7 +40,7 @@ from src.models.incident import (
Severity,
)
from src.services.approval_db import get_approval_service
from src.services.trust_engine import trust_engine, normalize_action_pattern, RiskLevel
from src.services.trust_engine import trust_engine, normalize_action_pattern
from src.services.openclaw import get_openclaw
logger = structlog.get_logger(__name__)

View File

@@ -14,11 +14,8 @@ Features:
- 過期的 Nonce 自動清除
"""
import hashlib
import hmac
import time
from dataclasses import dataclass
from typing import Literal
import structlog

View File

@@ -29,7 +29,6 @@ import structlog
from src.core.config import settings
from src.services.security_interceptor import (
get_security_interceptor,
TelegramUser,
UserNotWhitelistedError,
NonceReplayError,
)
@@ -884,14 +883,20 @@ class TelegramGateway:
except httpx.HTTPStatusError as e:
if e.response.status_code == 409:
# 409 Conflict: 另一個實例正在使用 getUpdates
# 這通常表示有其他 Bot 實例在運行
# 409 Conflict: 可能是 HTTP/2 連線狀態污染
# 重建 HTTP client 以清除殘留連線
logger.warning(
"telegram_polling_conflict",
status=409,
message="另一個 Bot 實例正在運行,嘗試重新刪除 Webhook...",
message="偵測到 409 衝突,重建 HTTP client...",
)
if self._http_client:
await self._http_client.aclose()
self._http_client = httpx.AsyncClient(
timeout=30.0,
headers={"Content-Type": "application/json"},
http2=False, # 強制 HTTP/1.1 避免連線複用問題
)
await self._delete_webhook()
await asyncio.sleep(LONG_POLLING_RETRY_DELAY)
else:
logger.error("telegram_polling_http_error", status=e.response.status_code)

View File

@@ -171,7 +171,26 @@
"P3": "P3 (Info)"
},
"generateProposal": "Generate Proposal",
"viewDetails": "View Details"
"viewDetails": "View Details",
"card": {
"executing": "Executing...",
"approved": "[ APPROVED ]",
"rejected": "[ REJECTED ]",
"error": "Error",
"timeout": "Timeout",
"retry": "Retry",
"timeoutMessage": "Execution timeout, please check API logs",
"checkApiLogs": "Please check API logs",
"analyzing": "Brain analyzing...",
"waitingDecision": "Waiting for decision",
"authorizeExecution": "Authorize execution",
"rejectProposal": "Reject proposal",
"aiExecuting": ">_ AI Executing (Tier 1)",
"brainAnalyzing": ">_ Brain analyzing...",
"decisionReady": ">_ Decision ready (Tier {tier})",
"waitingCommander": ">_ Awaiting commander approval (Tier {tier})",
"suggestedAction": "> Suggested action:"
}
},
"status": {
"idle": "Idle",
@@ -360,5 +379,13 @@
"footer": {
"copyright": "© 2026 岑洋國際行銷有限公司",
"poweredBy": "Powered by leWOOOgo Engine"
},
"errorBoundary": {
"systemFailure": "[SYSTEM FAILURE]",
"criticalError": "Critical UI rendering error detected. Auto-healing attempts exhausted.",
"escalating": "Escalating to OpenClaw AIOps Agent...",
"forceRestart": "FORCE MANUAL RESTART",
"detectingAnomaly": "[ DETECTING ANOMALY ]",
"autoHealingAttempt": "Initiating Auto-Healing Protocol (Attempt {attempt}/3)"
}
}

View File

@@ -171,7 +171,26 @@
"P3": "P3 (資訊)"
},
"generateProposal": "生成提案",
"viewDetails": "查看詳情"
"viewDetails": "查看詳情",
"card": {
"executing": "執行中...",
"approved": "[ 已授權 ]",
"rejected": "[ 已拒絕 ]",
"error": "錯誤",
"timeout": "超時",
"retry": "重試",
"timeoutMessage": "執行超時,請檢查 API 日誌",
"checkApiLogs": "請檢查 API 日誌",
"analyzing": "大腦分析中...",
"waitingDecision": "等待決策",
"authorizeExecution": "授權執行",
"rejectProposal": "拒絕提案",
"aiExecuting": ">_ AI 執行中 (Tier 1)",
"brainAnalyzing": ">_ 大腦分析中...",
"decisionReady": ">_ 決策就緒 (Tier {tier})",
"waitingCommander": ">_ 等待統帥親核 (Tier {tier})",
"suggestedAction": "> 建議行動:"
}
},
"status": {
"idle": "待命",
@@ -360,5 +379,13 @@
"footer": {
"copyright": "© 2026 岑洋國際行銷有限公司",
"poweredBy": "由 leWOOOgo 引擎驅動"
},
"errorBoundary": {
"systemFailure": "[系統故障]",
"criticalError": "偵測到嚴重的 UI 渲染錯誤。自動修復嘗試已耗盡。",
"escalating": "正在升級至 OpenClaw AIOps 代理...",
"forceRestart": "強制手動重啟",
"detectingAnomaly": "[ 偵測異常中 ]",
"autoHealingAttempt": "啟動自動修復協議 (嘗試 {attempt}/3)"
}
}

View File

@@ -32,6 +32,7 @@
"autoprefixer": "^10.4.0",
"eslint": "^8.57.0",
"eslint-config-next": "^14.1.0",
"playwright": "^1.58.2",
"postcss": "^8.4.0",
"tailwindcss": "^3.4.0",
"typescript": "^5.3.0"

View File

@@ -6,6 +6,7 @@ import { getMessages } from 'next-intl/server'
import { routing, type Locale } from '@/i18n/routing'
import '../globals.css'
import { Providers } from '../providers'
import { AutoHealingErrorBoundary } from '@/components/shared/auto-healing-error-boundary'
const inter = Inter({
subsets: ['latin'],
@@ -63,7 +64,9 @@ export default async function LocaleLayout({
className={`${inter.variable} ${jetbrainsMono.variable} ${vt323.variable} font-body bg-nothing-gray-50 text-nothing-black antialiased`}
>
<NextIntlClientProvider messages={messages}>
<Providers>{children}</Providers>
<AutoHealingErrorBoundary fallbackMessage="Critical Application Error">
<Providers>{children}</Providers>
</AutoHealingErrorBoundary>
</NextIntlClientProvider>
</body>
</html>

View File

@@ -28,6 +28,7 @@
*/
import React, { useState, useCallback, useEffect, useRef } from 'react'
import { useTranslations } from 'next-intl'
import { apiClient, DecisionInfo } from '@/lib/api-client'
type ButtonState = 'idle' | 'loading' | 'approved' | 'rejected' | 'error' | 'timeout'
@@ -60,6 +61,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
decision,
onApprovalChange,
}) => {
const t = useTranslations('incident.card')
const isAlert = status === 'alert'
const [buttonState, setButtonState] = useState<ButtonState>('idle')
const [errorMessage, setErrorMessage] = useState<string | null>(null)
@@ -105,7 +107,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
if (timeoutRef.current) clearTimeout(timeoutRef.current)
timeoutRef.current = setTimeout(() => {
setButtonState('timeout')
setErrorMessage('執行超時,請檢查 API 日誌')
setErrorMessage(t('timeoutMessage'))
}, EXECUTION_TIMEOUT_MS)
try {
@@ -183,7 +185,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
if (timeoutRef.current) clearTimeout(timeoutRef.current)
timeoutRef.current = setTimeout(() => {
setButtonState('timeout')
setErrorMessage('執行超時,請檢查 API 日誌')
setErrorMessage(t('timeoutMessage'))
}, EXECUTION_TIMEOUT_MS)
try {
@@ -236,19 +238,19 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
return (
<span className="px-3 py-1 bg-neutral-700 text-white text-xs flex items-center gap-2">
<span className="w-3 h-3 border-2 border-white/30 border-t-white rounded-full animate-spin" />
...
{t('executing')}
</span>
)
case 'approved':
return (
<span className="px-3 py-1 bg-green-600 text-white text-xs">
[ ]
{t('approved')}
</span>
)
case 'rejected':
return (
<span className="px-3 py-1 bg-red-600 text-white text-xs">
[ ]
{t('rejected')}
</span>
)
case 'error':
@@ -256,7 +258,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
<div className="flex flex-col gap-1 items-end">
<div className="flex items-center gap-1">
<span className="px-2 py-1 bg-red-600 text-white text-xs">
{t('error')}
</span>
<button
onClick={() => {
@@ -265,7 +267,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
}}
className="px-2 py-1 bg-neutral-700 text-white text-xs hover:bg-neutral-600 transition-colors"
>
{t('retry')}
</button>
</div>
{errorMessage && (
@@ -280,7 +282,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
<div className="flex flex-col gap-1 items-end">
<div className="flex items-center gap-1">
<span className="px-2 py-1 bg-amber-500 text-white text-xs animate-pulse">
{t('timeout')}
</span>
<button
onClick={() => {
@@ -289,11 +291,11 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
}}
className="px-2 py-1 bg-neutral-700 text-white text-xs hover:bg-neutral-600 transition-colors"
>
{t('retry')}
</button>
</div>
<span className="text-[10px] text-amber-600">
API
{t('checkApiLogs')}
</span>
</div>
)
@@ -304,7 +306,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
onClick={handleApprove}
disabled={!isDecisionReady}
className="px-2 py-1 bg-neutral-900 text-white text-xs hover:bg-green-700 active:scale-95 active:bg-neutral-800 transition-all duration-100 cursor-pointer disabled:opacity-30 disabled:cursor-not-allowed disabled:active:scale-100"
title={!isDecisionReady ? (isAnalyzing ? '大腦分析中...' : '等待決策') : (decisionAction || '授權執行')}
title={!isDecisionReady ? (isAnalyzing ? t('analyzing') : t('waitingDecision')) : (decisionAction || t('authorizeExecution'))}
>
Y
</button>
@@ -313,7 +315,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
onClick={handleReject}
disabled={!isDecisionReady}
className="px-2 py-1 bg-neutral-900 text-white text-xs hover:bg-red-700 active:scale-95 active:bg-neutral-800 transition-all duration-100 cursor-pointer disabled:opacity-30 disabled:cursor-not-allowed disabled:active:scale-100"
title={!isDecisionReady ? (isAnalyzing ? '大腦分析中...' : '等待決策') : '拒絕提案'}
title={!isDecisionReady ? (isAnalyzing ? t('analyzing') : t('waitingDecision')) : t('rejectProposal')}
>
n
</button>
@@ -376,13 +378,13 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
<div className="flex justify-between items-center">
<span className="text-xs text-neutral-500">
{tier === 1 ? (
'>_ AI 執行中 (Tier 1)'
t('aiExecuting')
) : isAnalyzing ? (
'>_ 大腦分析中...'
t('brainAnalyzing')
) : isDecisionReady ? (
`>_ 決策就緒 (Tier ${tier})`
t('decisionReady', { tier })
) : (
`>_ 等待統帥親核 (Tier ${tier})`
t('waitingCommander', { tier })
)}
</span>
{tier > 1 && renderActionButton()}
@@ -391,7 +393,7 @@ export const DualStateIncidentCard: React.FC<DualStateIncidentCardProps> = ({
{/* Phase 6.5: 顯示 AI 建議行動 */}
{decisionAction && tier > 1 && (
<div className="mt-2 p-2 bg-neutral-50 border-[0.5px] border-neutral-200 text-[10px] font-mono text-neutral-600">
<div className="text-neutral-400 mb-1">&gt; :</div>
<div className="text-neutral-400 mb-1">{t('suggestedAction')}</div>
<div className="text-neutral-800">{decisionAction}</div>
{decisionReasoning && (
<div className="mt-1 text-neutral-500 italic">

View File

@@ -0,0 +1,141 @@
'use client'
import React, { Component, ErrorInfo, ReactNode } from 'react'
import { useTranslations } from 'next-intl'
import { useAgentStore } from '@/stores/agent.store'
interface ErrorBoundaryTranslations {
systemFailure: string;
criticalError: string;
escalating: string;
forceRestart: string;
detectingAnomaly: string;
autoHealingAttempt: (attempt: number) => string;
}
interface InnerProps {
children: ReactNode;
fallbackMessage?: string;
translations: ErrorBoundaryTranslations;
}
interface State {
hasError: boolean;
retryCount: number;
}
class AutoHealingErrorBoundaryInner extends Component<InnerProps, State> {
public state: State = {
hasError: false,
retryCount: 0
};
public static getDerivedStateFromError(_: Error): State {
return { hasError: true, retryCount: 0 };
}
public componentDidCatch(error: Error, errorInfo: ErrorInfo) {
console.error('[AWOOOI] Frontend component crashed:', error, errorInfo);
this.attemptAutoHealing();
}
private attemptAutoHealing = () => {
const { retryCount } = this.state;
if (retryCount >= 3) {
console.error('[AWOOOI] Auto-healing failed after 3 attempts. Escalating to L3 AIOps.');
return;
}
// L1 Auto-Healing Logic
console.log(`[AWOOOI] Attempting auto-healing (Attempt ${retryCount + 1}/3)...`);
// 1. Clear toxic stored state
if (typeof window !== 'undefined') {
try {
localStorage.clear();
sessionStorage.clear();
} catch (e) {
console.warn('Failed to clear storage:', e);
}
}
// 2. Reset Zustand store explicitly outside of React render cycle
try {
if (typeof useAgentStore.getState().reset === 'function') {
useAgentStore.getState().reset();
}
} catch (e) {
console.warn('[AWOOOI] Failed to reset Agent Store during auto-healing', e);
}
// 3. Exponential Backoff remount
const delay = Math.pow(2, retryCount) * 1000; // 1s, 2s, 4s
setTimeout(() => {
console.log('[AWOOOI] Remounting component...');
this.setState({ hasError: false, retryCount: retryCount + 1 });
}, delay);
}
public render() {
const { translations } = this.props;
if (this.state.hasError) {
if (this.state.retryCount >= 3) {
return (
<div className="flex flex-col items-center justify-center p-8 bg-[#111111] text-white border border-red-500/30 rounded-lg max-w-lg mx-auto mt-12">
<h2 className="text-xl font-mono text-red-500 mb-4">{translations.systemFailure}</h2>
<p className="font-mono text-sm text-gray-400 mb-4">{this.props.fallbackMessage || translations.criticalError}</p>
<p className="font-mono text-xs text-purple-400 mb-6">{translations.escalating}</p>
<button
onClick={() => window.location.reload()}
className="px-6 py-2 bg-white text-black font-mono text-sm rounded-full hover:bg-gray-200 transition-colors"
>
{translations.forceRestart}
</button>
</div>
);
}
return (
<div className="flex flex-col items-center justify-center p-12 space-y-4 animate-pulse">
<div className="text-red-500 font-mono text-sm font-bold tracking-widest">{translations.detectingAnomaly}</div>
<div className="text-gray-400 font-mono text-xs">{translations.autoHealingAttempt(this.state.retryCount + 1)}</div>
<div className="w-48 h-1 bg-gray-800 rounded overflow-hidden mt-4">
<div className="h-full bg-white opacity-50 animate-ping"></div>
</div>
</div>
);
}
return this.props.children;
}
}
// Wrapper component that provides translations via hook
interface Props {
children: ReactNode;
fallbackMessage?: string;
}
export function AutoHealingErrorBoundary({ children, fallbackMessage }: Props) {
const t = useTranslations('errorBoundary');
const translations: ErrorBoundaryTranslations = {
systemFailure: t('systemFailure'),
criticalError: t('criticalError'),
escalating: t('escalating'),
forceRestart: t('forceRestart'),
detectingAnomaly: t('detectingAnomaly'),
autoHealingAttempt: (attempt: number) => t('autoHealingAttempt', { attempt }),
};
return (
<AutoHealingErrorBoundaryInner
translations={translations}
fallbackMessage={fallbackMessage}
>
{children}
</AutoHealingErrorBoundaryInner>
);
}

View File

@@ -66,6 +66,7 @@ interface AgentState {
// SSE 連線控制 (內部使用)
_abortController: AbortController | null
_sseRetryCount: number
// ==================== Actions ====================
setStatus: (status: AgentStatus) => void
@@ -114,6 +115,7 @@ const initialState = {
conversationId: null,
error: null,
_abortController: null,
_sseRetryCount: 0,
}
// ==================== Store ====================
@@ -153,7 +155,8 @@ export const useAgentStore = create<AgentState>()(
_abortController: abortController,
status: 'thinking',
error: null,
thinkingStream: [],
// 如果是重連,保留原本的 streams否則清空
thinkingStream: state._sseRetryCount > 0 ? state.thinkingStream : [],
})
try {
@@ -166,6 +169,9 @@ export const useAgentStore = create<AgentState>()(
throw new Error(`HTTP ${response.status}: ${response.statusText}`)
}
// 連線成功,重置重試計數
set({ _sseRetryCount: 0 })
const reader = response.body?.getReader()
if (!reader) {
throw new Error('無法建立串流通道')
@@ -213,18 +219,38 @@ export const useAgentStore = create<AgentState>()(
} catch (err: unknown) {
if (err instanceof Error && err.name === 'AbortError') {
console.log('SSE 串流已手動中斷')
set({ status: 'idle' })
set({ status: 'idle', _sseRetryCount: 0 })
} else {
const message = err instanceof Error ? err.message : '未知錯誤'
set({
status: 'error',
error: message,
})
get().appendThinking({
type: 'error',
content: message,
timestamp: new Date(),
})
// L2 網路自癒機制: Exponential Backoff Retry
const maxRetries = 5
const currentRetries = state._sseRetryCount
if (currentRetries < maxRetries) {
const delay = Math.min(1000 * Math.pow(2, currentRetries), 30000)
console.log(`[AWOOOI L2 Healing] SSE Error: ${message}. Retrying in ${delay}ms (Attempt ${currentRetries + 1}/${maxRetries})...`)
get().appendThinking({
type: 'error',
content: `連線中斷: ${message}。將在 ${delay/1000} 秒後自動重連 (嘗試 ${currentRetries + 1}/${maxRetries})...`,
timestamp: new Date(),
})
set({ _sseRetryCount: currentRetries + 1 })
setTimeout(() => get().startThinkingStream(apiUrl), delay)
} else {
console.error('[AWOOOI L2 Healing] SSE Max retries reached. Escalating to L3 AIOps.')
set({
status: 'error',
error: `Maximum SSE reconnect attempts reached: ${message}`,
})
get().appendThinking({
type: 'error',
content: `嚴重錯誤: 無法建立串流連線,已達最大重試次數。`,
timestamp: new Date(),
})
}
}
}
},

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,322 @@
# ADR-001: 前端狀態管理採用 Zustand
> **狀態**: Accepted
> **日期**: 2026-03-20 (Gate 0 驗證完成)
> **決策者**: 統帥 (CTO + CPO)
> **關聯**: [docs/adr/ADR-004-state-management.md](../docs/adr/ADR-004-state-management.md)
---
## 摘要
AWOOOI 前端全面採用 **Zustand** 作為狀態管理工具,特別針對:
- **Approval Multi-Sig 狀態機** - HITL 審批流程
- **SSE 即時串流** - Dashboard 主機監控
---
## 背景
### 問題陳述
AWOOOI 需要處理高頻率的狀態更新:
| 場景 | 更新頻率 | 狀態類型 |
|------|---------|---------|
| Dashboard SSE | 每秒 | 4 主機 CPU/Memory |
| Approval 簽核 | 事件驅動 | Multi-Sig 狀態機 |
| ClawBot 思考 | 串流 | AI 輸出 Token |
傳統的 Redux 在這種場景下過於笨重 (7KB + 大量 boilerplate)。
---
## 決策
**採用 Zustand 作為唯一全域狀態管理工具**
### 核心實作
#### 1. Dashboard Store (SSE 整合)
```typescript
// stores/dashboard.store.ts
import { create } from 'zustand'
interface DashboardState {
hosts: HostStatus[]
connectionStatus: 'connecting' | 'connected' | 'disconnected' | 'error'
lastUpdate: Date | null
// Actions
connect: (apiUrl: string) => void
disconnect: () => void
updateHosts: (hosts: HostStatus[]) => void
}
export const useDashboardStore = create<DashboardState>((set, get) => ({
hosts: [],
connectionStatus: 'disconnected',
lastUpdate: null,
connect: (apiUrl) => {
set({ connectionStatus: 'connecting' })
const eventSource = new EventSource(`${apiUrl}/api/v1/dashboard/stream`)
eventSource.onopen = () => {
set({ connectionStatus: 'connected' })
}
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data)
set({
hosts: data.hosts,
lastUpdate: new Date()
})
}
eventSource.onerror = () => {
set({ connectionStatus: 'error' })
}
},
disconnect: () => {
set({ connectionStatus: 'disconnected' })
},
updateHosts: (hosts) => set({ hosts, lastUpdate: new Date() })
}))
// Selector hooks for fine-grained subscriptions
export const useHosts = () => useDashboardStore((s) => s.hosts)
export const useConnectionStatus = () => useDashboardStore((s) => s.connectionStatus)
```
#### 2. Approval Store (Multi-Sig 狀態機)
```typescript
// stores/approval.store.ts
import { create } from 'zustand'
type SigningStatus = 'idle' | 'signing' | 'success' | 'error'
interface ApprovalState {
pendingApprovals: Approval[]
selectedApproval: Approval | null
signingStatus: SigningStatus
// Actions
fetchApprovals: () => Promise<void>
signApproval: (id: string, userId: string, role: string) => Promise<void>
rejectApproval: (id: string, reason: string) => Promise<void>
}
export const useApprovalStore = create<ApprovalState>((set, get) => ({
pendingApprovals: [],
selectedApproval: null,
signingStatus: 'idle',
fetchApprovals: async () => {
const response = await fetch('/api/v1/approvals?status=pending')
const data = await response.json()
set({ pendingApprovals: data.items })
},
signApproval: async (id, userId, role) => {
set({ signingStatus: 'signing' })
try {
const response = await fetch(`/api/v1/approvals/${id}/approve`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ user_id: userId, user_role: role })
})
if (response.status === 409) {
// TOCTOU Conflict or Duplicate Signature
throw new Error('Conflict')
}
const result = await response.json()
// Update local state
if (!result.needs_more) {
// Remove from pending if fully approved
set((s) => ({
pendingApprovals: s.pendingApprovals.filter(a => a.id !== id),
signingStatus: 'success'
}))
} else {
set({ signingStatus: 'success' })
}
} catch (error) {
set({ signingStatus: 'error' })
throw error
}
},
rejectApproval: async (id, reason) => {
await fetch(`/api/v1/approvals/${id}/reject`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ reason })
})
set((s) => ({
pendingApprovals: s.pendingApprovals.filter(a => a.id !== id)
}))
}
}))
```
---
## 狀態機設計
### Approval 生命週期
```
┌─────────────────────┐
│ pending │
└──────────┬──────────┘
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ approved │ │ rejected │ │ voided │
│ (簽章達閾值) │ │ (使用者拒絕) │ │ (TOCTOU衝突) │
└──────────────┘ └──────────────┘ └──────────────┘
```
### 風險矩陣 (簽章閾值)
| Risk Level | 簽章數 | 條件 |
|------------|--------|------|
| low | 0 | 自動執行 |
| medium | 1 | admin/devops |
| high | 2 | 任二管理員 |
| critical | 2 | 含 CTO 或 CISO |
---
## 理由
### 為什麼選擇 Zustand
| 特性 | Redux | Zustand | 優勢 |
|------|-------|---------|------|
| Bundle Size | ~7KB | ~1KB | **-86%** |
| Boilerplate | 高 | 極低 | 開發效率 |
| SSE 整合 | 需 middleware | 原生 | 簡單直接 |
| TypeScript | 需額外設定 | 開箱即用 | DX 優異 |
| Re-render | 需 selector | 內建 | 效能優化 |
### 為什麼不選擇 Redux
1. **過度工程** - AWOOOI 不需要 Redux 的 time-travel debugging
2. **Boilerplate** - 每個 action 需要 type/reducer/action creator
3. **Bundle** - 7KB 對於輕量 SaaS 是負擔
4. **SSE 整合** - 需要額外的 middleware 如 redux-saga
### 為什麼不選擇 Context API
1. **Re-render 問題** - Provider 下所有元件都會重繪
2. **不適合高頻更新** - SSE 每秒更新會造成效能問題
3. **缺乏 selector** - 無法細粒度訂閱
---
## SSE 企業級模式
### Buffer + Debounce
```typescript
// 避免每個 SSE 事件都觸發 re-render
const bufferRef = useRef<HostStatus[]>([])
eventSource.onmessage = (event) => {
bufferRef.current.push(JSON.parse(event.data))
}
// 每 500ms 批次更新
setInterval(() => {
if (bufferRef.current.length > 0) {
set({ hosts: bufferRef.current })
bufferRef.current = []
}
}, 500)
```
### AbortController 清理
```typescript
useEffect(() => {
const controller = new AbortController()
connect(apiUrl)
return () => {
controller.abort()
disconnect()
}
}, [])
```
### 指數退避重連
```typescript
const reconnect = (attempt: number) => {
const delay = Math.min(1000 * Math.pow(2, attempt), 30000)
setTimeout(() => connect(), delay)
}
```
---
## 驗證結果 (Gate 0)
| 測試項目 | 結果 |
|---------|------|
| Dashboard SSE 連線 | ✅ 穩定 |
| 4 主機即時更新 | ✅ <100ms 延遲 |
| Approval 簽核流程 | ✅ Multi-Sig 運作 |
| TOCTOU 防護 | ✅ 409 正確處理 |
| 記憶體洩漏 | ✅ 無 (AbortController) |
---
## 後果
### 優點
- **極度輕量** - 不增加 bundle 負擔
- **高頻更新** - 完美處理 SSE 串流
- **簡單 API** - 降低學習曲線
- **TypeScript** - 完整型別推導
### 缺點
- **生態較小** - 相比 Redux 社群資源較少
- **DevTools** - 功能不如 Redux DevTools 強大
### 風險緩解
| 風險 | 緩解措施 |
|------|---------|
| Store 肥大化 | 強制 Slice Pattern |
| 狀態同步錯誤 | 搭配 TanStack Query |
---
## 參考資料
- [Zustand 官方文檔](https://zustand-demo.pmnd.rs/)
- [API 契約](../docs/api/approvals-contract.yaml)
- [docs/adr/ADR-004](../docs/adr/ADR-004-state-management.md) - 詳細版本
---
*Gate 0 里程碑 - 2026-03-20*

145
capabilities.json Normal file
View File

@@ -0,0 +1,145 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"name": "OpenClaw Capabilities",
"version": "5.0.0",
"description": "OpenClaw AI Agent 允許調用的工具與操作權限定義",
"updated_at": "2026-03-21",
"kubernetes": {
"allowed_operations": [
{
"name": "RESTART_DEPLOYMENT",
"command": "kubectl rollout restart deployment/{name} -n {namespace}",
"risk_level": "medium",
"requires_approval": true,
"description": "重啟 Deployment觸發 Rolling Update"
},
{
"name": "DELETE_POD",
"command": "kubectl delete pod {name} -n {namespace}",
"risk_level": "medium",
"requires_approval": true,
"description": "刪除 Pod由 ReplicaSet 自動重建"
},
{
"name": "SCALE_DEPLOYMENT",
"command": "kubectl scale deployment/{name} --replicas={count} -n {namespace}",
"risk_level": "low",
"requires_approval": false,
"description": "水平擴展 Deployment 副本數"
},
{
"name": "GET_LOGS",
"command": "kubectl logs {pod} -n {namespace} --tail={lines}",
"risk_level": "low",
"requires_approval": false,
"description": "查看 Pod 日誌"
},
{
"name": "DESCRIBE_RESOURCE",
"command": "kubectl describe {resource_type} {name} -n {namespace}",
"risk_level": "low",
"requires_approval": false,
"description": "查看資源詳細狀態"
}
],
"forbidden_operations": [
{
"pattern": "kubectl delete namespace *",
"reason": "影響範圍過大,可能導致整個命名空間被刪除"
},
{
"pattern": "kubectl delete pvc *",
"reason": "可能導致持久化資料遺失"
},
{
"pattern": "kubectl apply -f *",
"reason": "未審核的 YAML 可能引入惡意配置"
},
{
"pattern": "* --force",
"reason": "強制操作繞過安全檢查"
},
{
"pattern": "kubectl exec *",
"reason": "直接進入容器可能造成安全風險"
}
],
"namespaces": {
"allowed": ["awoooi-prod", "default", "kube-system"],
"forbidden": ["kube-public", "cert-manager"]
}
},
"notifications": {
"channels": [
{
"name": "telegram",
"enabled": true,
"config_key": "OPENCLAW_TG_BOT_TOKEN",
"features": ["alerts", "approvals", "status_updates"]
},
{
"name": "discord",
"enabled": true,
"config_key": "DISCORD_WEBHOOK_URL",
"features": ["execution_reports"]
},
{
"name": "sse",
"enabled": true,
"endpoint": "/api/v1/stream",
"features": ["real_time_updates", "approvals"]
}
]
},
"ai_providers": {
"fallback_order": ["ollama", "gemini", "claude"],
"providers": [
{
"name": "ollama",
"endpoint": "http://192.168.0.188:11434",
"model": "llama3.2:3b",
"cost_per_1k_tokens": 0,
"timeout_seconds": 90
},
{
"name": "gemini",
"endpoint": "https://generativelanguage.googleapis.com/v1beta",
"model": "gemini-1.5-flash",
"cost_per_1k_tokens": 0.001,
"timeout_seconds": 30
},
{
"name": "claude",
"endpoint": "https://api.anthropic.com/v1",
"model": "claude-3-haiku-20240307",
"cost_per_1k_tokens": 0.008,
"timeout_seconds": 30
}
]
},
"security": {
"telegram_whitelist": {
"description": "允許透過 Telegram 簽核的 user_id 清單",
"users": []
},
"webhook_hmac": {
"algorithm": "sha256",
"header": "X-Signature-256"
},
"nonce_ttl_seconds": 300
},
"limits": {
"max_concurrent_approvals": 10,
"max_daily_operations": 100,
"token_budget": {
"gemini_daily": 70000,
"claude_daily": 35000,
"monthly_cost_limit_usd": 10
}
}
}

160
deploy-infra.sh Executable file
View File

@@ -0,0 +1,160 @@
#!/bin/bash
# =============================================================================
# AWOOOI Phase 0 基建部署腳本
# =============================================================================
# 負責人: CIO/CTO
# 版本: v1.0
# 日期: 2026-03-20
#
# 功能:
# 1. 將 K8s YAML 傳送到 K3s Master (192.168.0.120)
# 2. 建立 Namespace、ResourceQuota、NetworkPolicy、ConfigMap
# 3. 驗證部署狀態
#
# 使用方式:
# chmod +x deploy-infra.sh
# ./deploy-infra.sh
# =============================================================================
set -e # 遇錯即停
# =============================================================================
# 配置區
# =============================================================================
K3S_MASTER="192.168.0.120"
K3S_USER="wooo"
REMOTE_DIR="/tmp/awoooi-k8s"
LOCAL_K8S_DIR="$(dirname "$0")/k8s/awoooi-prod"
NAMESPACE="awoooi-prod"
# Phase 0 需部署的檔案 (不含 secrets 和 deployments)
PHASE0_FILES=(
"01-namespace-quota.yaml"
"02-network-policy.yaml"
"04-configmap.yaml"
)
# 顏色輸出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color
# =============================================================================
# 函式區
# =============================================================================
log_info() {
echo -e "${CYAN}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[OK]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# =============================================================================
# 主流程
# =============================================================================
echo ""
echo "============================================================"
echo " AWOOOI Phase 0 基建部署"
echo " Target: ${K3S_MASTER} (K3s Master)"
echo "============================================================"
echo ""
# -----------------------------------------------------------------------------
# Step 1: 驗證本地檔案
# -----------------------------------------------------------------------------
log_info "Step 1: 驗證本地 YAML 檔案..."
for file in "${PHASE0_FILES[@]}"; do
if [[ ! -f "${LOCAL_K8S_DIR}/${file}" ]]; then
log_error "找不到檔案: ${LOCAL_K8S_DIR}/${file}"
exit 1
fi
log_success " ${file}"
done
# -----------------------------------------------------------------------------
# Step 2: 建立遠端目錄並傳送檔案
# -----------------------------------------------------------------------------
log_info "Step 2: 傳送 YAML 到 ${K3S_MASTER}..."
ssh "${K3S_USER}@${K3S_MASTER}" "mkdir -p ${REMOTE_DIR}"
for file in "${PHASE0_FILES[@]}"; do
scp -q "${LOCAL_K8S_DIR}/${file}" "${K3S_USER}@${K3S_MASTER}:${REMOTE_DIR}/"
log_success " ${file} -> ${K3S_MASTER}:${REMOTE_DIR}/"
done
# -----------------------------------------------------------------------------
# Step 3: 執行 kubectl apply
# -----------------------------------------------------------------------------
log_info "Step 3: 執行 kubectl apply..."
for file in "${PHASE0_FILES[@]}"; do
log_info " Applying ${file}..."
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl apply -f ${REMOTE_DIR}/${file}"
done
# -----------------------------------------------------------------------------
# Step 4: 驗證部署狀態
# -----------------------------------------------------------------------------
echo ""
log_info "Step 4: 驗證部署狀態..."
echo ""
echo "--- Namespace ---"
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl get ns ${NAMESPACE} -o wide"
echo ""
echo "--- ResourceQuota ---"
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl get resourcequota -n ${NAMESPACE}"
echo ""
echo "--- LimitRange ---"
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl get limitrange -n ${NAMESPACE}"
echo ""
echo "--- NetworkPolicy (零信任) ---"
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl get networkpolicy -n ${NAMESPACE}"
echo ""
echo "--- ConfigMap ---"
ssh "${K3S_USER}@${K3S_MASTER}" "kubectl get configmap -n ${NAMESPACE}"
echo ""
# -----------------------------------------------------------------------------
# Step 5: 清理遠端暫存
# -----------------------------------------------------------------------------
log_info "Step 5: 清理遠端暫存..."
ssh "${K3S_USER}@${K3S_MASTER}" "rm -rf ${REMOTE_DIR}"
log_success "已清理 ${REMOTE_DIR}"
# -----------------------------------------------------------------------------
# 完成
# -----------------------------------------------------------------------------
echo ""
echo "============================================================"
echo -e " ${GREEN}Phase 0 基建部署完成!${NC}"
echo "============================================================"
echo ""
echo "已建立:"
echo " - Namespace: ${NAMESPACE}"
echo " - ResourceQuota: awoooi-prod-quota (CPU 4/8, Mem 8Gi/16Gi)"
echo " - LimitRange: awoooi-prod-limits"
echo " - NetworkPolicy: default-deny-all, allow-nginx-ingress, allow-required-egress"
echo " - ConfigMap: awoooi-config"
echo ""
echo "下一步:"
echo " 1. CIO 手動配置 03-secrets.yaml 實際值"
echo " 2. CI/CD 建置映像後自動部署 Deployment"
echo ""

137
docker-compose.yml Normal file
View File

@@ -0,0 +1,137 @@
# AWOOOI - Local Development Environment
# =======================================
# Phase 7: 容器化聯合測試環境
#
# Usage:
# docker compose up -d # 啟動所有服務
# docker compose logs -f api # 查看 API 日誌
# docker compose down -v # 停止並清除資料
#
# Services:
# - web: Next.js 前端 (port 3000)
# - api: FastAPI 後端 (port 8000)
# - postgres: PostgreSQL 資料庫 (port 5432)
# - redis: Redis 快取 (port 6379)
services:
# ==========================================================================
# PostgreSQL Database
# ==========================================================================
postgres:
image: postgres:16-alpine
container_name: awoooi-postgres
restart: unless-stopped
environment:
POSTGRES_USER: awoooi
POSTGRES_PASSWORD: awoooi_dev_2026
POSTGRES_DB: awoooi_dev
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U awoooi -d awoooi_dev"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# Redis Cache
# ==========================================================================
redis:
image: redis:7-alpine
container_name: awoooi-redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# FastAPI Backend
# ==========================================================================
api:
build:
context: ./apps/api
dockerfile: Dockerfile
container_name: awoooi-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
ENVIRONMENT: dev
DEBUG: "true"
LOG_LEVEL: INFO
MOCK_MODE: "true"
# Database (統帥鐵律: 禁止 SQLite, PostgreSQL ONLY)
DATABASE_URL: postgresql+asyncpg://awoooi:awoooi_dev_2026@postgres:5432/awoooi_dev
# Redis
REDIS_URL: redis://redis:6379/0
# CORS (容器內使用 service name + localhost 開發端口)
CORS_ORIGINS: '["http://localhost:3000","http://localhost:3001","http://localhost:3002","http://localhost:3003","http://web:3000"]'
# Telegram Gateway (Phase 5.5)
OPENCLAW_TG_BOT_TOKEN: "8569720657:AAHdvKf_P2ms-QKFTyqTLtLiqEggz8cpjMk"
OPENCLAW_TG_CHAT_ID: "5619078117"
OPENCLAW_TG_USER_WHITELIST: "5619078117"
# External Services (使用 host.docker.internal 存取宿主機服務)
OLLAMA_URL: http://host.docker.internal:11434
CLAWBOT_URL: http://host.docker.internal:8089
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
volumes:
# 開發時掛載程式碼以支援熱重載
- ./apps/api/src:/app/src:ro
# K8s kubeconfig for ActionExecutor (Phase 3)
- ./apps/api/k3s-prod.yaml:/app/k3s-prod.yaml:ro
healthcheck:
test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:8000/api/v1/health', timeout=5)"]
interval: 30s
timeout: 10s
start_period: 10s
retries: 3
# ==========================================================================
# Next.js Frontend
# ==========================================================================
web:
build:
context: .
dockerfile: apps/web/Dockerfile
args:
# Build-time arg: NEXT_PUBLIC_* 需在打包時注入
NEXT_PUBLIC_API_URL: http://localhost:8000
container_name: awoooi-web
restart: unless-stopped
ports:
- "3000:3000"
environment:
NODE_ENV: production
# API URL - Browser 需使用 localhost (Docker 對外暴露的 port)
# 注意: NEXT_PUBLIC_* 是給瀏覽器用的,非 Docker 內部網路
NEXT_PUBLIC_API_URL: http://localhost:8000
depends_on:
api:
condition: service_healthy
healthcheck:
# 使用 node 而非 wget因為 Alpine 精簡鏡像 wget 有相容性問題
test: ["CMD", "node", "-e", "require('http').get('http://127.0.0.1:3000/', (r) => process.exit(r.statusCode === 200 || r.statusCode === 307 ? 0 : 1)).on('error', () => process.exit(1))"]
interval: 30s
timeout: 10s
start_period: 30s
retries: 3
volumes:
postgres_data:
redis_data:
networks:
default:
name: awoooi-network

View File

@@ -0,0 +1,89 @@
# AIOps 平台差異分析與進化策略報告
**比較對象**:
1. **商業標準版**: `https://aiops.wooo.work/` (WOOO AIOps)
2. **次世代試驗版**: `http://localhost:3000/zh-TW` (AWOOOI)
---
## 1. 視覺與操作體驗對比 (UI/UX Comparison)
| 維度 | WOOO AIOps (線上版) | AWOOOI (Localhost) | 專家點評 |
|------|-------------------|--------------------|---------|
| **設計語彙** | 現代化企業 SaaS 面板 (可能基於 Shadcn UI/Radix) | **Nothing.tech** 極簡美學與「硬核駭客」風格 | Localhost 的美學極具辨識度與高級感,打破了傳統運維面板的枯燥,但目前欠缺資料密度與視覺引導。 |
| **色彩策略** | 鮮豔藍色 (Vibrant Blue) 點綴,明暗雙主題 | 高對比黑白灰 (`#nothing-black`),特殊狀態才用色彩點綴 (綠/紅/紫) | Localhost 透過克制的色彩更能凸顯關鍵告警(例如突發的紅色錯誤或 AI 介入的紫色)。 |
| **字體排版** | 易讀的標準網頁無襯線字體 | 大量使用**等寬字體 (Monospaced)** 呈現資料與日誌 | 神來之筆。等寬字體極大增強了「戰情室」的專業與壓迫感。 |
| **資料呈現** | 豐富的資料視覺化 (Recharts)、表格、圓餅圖 | 卡片化、進度條、狀態燈號與原始 Log 輸出 | 商業版適合經理人觀看,**Localhost 更適合 SRE 與 DevOps 的第一線作戰**。 |
---
## 2. 功能盤點與差異分析 (Feature Gap Analysis)
經過雙瀏覽器 Subagent 深度爬蟲,以下為雙方平台的功能覆蓋率對比:
### 🔴 嚴重落後 (Localhost 呈現 404 狀態)
1. **授權中心 (Approval Center)**: `/zh-TW/approvals` 找不到頁面。這是 Multi-Sig 多重簽核引擎的關鍵介面。
2. **知識殿堂 (Knowledge Base)**: `/zh-TW/knowledge` 找不到頁面。缺少事故處理 SOP 的檢索入口。
3. **設定 (Settings)**: `/zh-TW/settings` 找不到頁面。缺少使用者權限 (RBAC) 與通知頻道的設定。
### 🟡 深度不足 (Localhost 有介面但資料維度單薄)
1. **監控與效能 (Monitoring & APM)**: 線上版整合了 SigNoz 來追蹤 Trace/Latency還有服務拓樸圖 (Service Topology)。Localhost 目前僅在「全局脈搏」呈現高階數據 (RPS, Error Rate),缺少微服務級別的下鑽 (Drill-down) 能力。
2. **自動修復與工單 (Auto Repair & Tickets)**: 線上版有完整的工單 SLA 追蹤與自動修復觸發紀錄。Localhost 目前將其合併在「行動日誌 (Action Logs)」中,以流水帳呈現,不易追蹤單一事件的完整生命週期。
3. **成本優化 (FinOps)**: 線上版有詳細的雲端帳單視覺化與優化建議Localhost 完全缺乏此區塊。
### 🟢 本機優勢 (Localhost 獨創功能)
1. **AI 代理實體化 (OpenClaw 面板)**: Localhost 擁有獨立的 `[AGENT] patrolling...` 即時串流介面,讓 AI 像是真人在值班,這點在 UX 上大勝線上版傳統的自動化腳本感受。
---
## 3. AWOOOI (Localhost) 究極進化策略與解決方案
我們的目標不是「模仿」線上版,而是要在 **保留 Nothing.tech 美學** 的前提下,將線上版的複雜功能以更高級的方式重塑到 Localhost讓其進化成次世代的「AI 智能戰情室」。
### ⚡ Phase 1: 基礎建設與 404 修復 (1-2 週)
**目標:補齊核心體驗,打通後端引擎。**
1. **實作高冷風格的「授權中心 (Approval Center)」**
- **問題**: 缺少 `/approvals` 頁面。
- **解決方案**: 結合後端的 `approval.py` (Multi-Sig Engine),設計一個終端機風格的審批介面。捨棄傳統的資料表格 (DataGrid)。
- **UI 設計**: 使用全螢幕的分割視窗,左側顯示紅色/橘色的「風險等級 (Risk Level: Critical)」,右側顯示需要審批的具體 K8s Diff (變更對比),按鈕設計成實體的「確認授權 (Authorize)」與「執行拒絕 (Reject)」,帶有機械物理按壓的過渡動畫。
2. **實作 Markdown 驅動的「知識殿堂 (Knowledge Base)」**
- **問題**: 缺少 `/knowledge` 頁面。
- **解決方案**: 實作一個左側大綱樹狀圖、右側 Markdown Render 的簡潔介面。加入一個全域的 `⌘+K` AI 搜尋框,直接串接 RAG 引擎,詢問「如何處理 Harbor Node 離線?」直接給出解答,而非傳統的關鍵字搜尋。
### 🚀 Phase 2: 核心功能的高級感重塑 (2-3 週)
**目標:將 WOOO AIOps 的複雜功能,轉化為符合 Nothing.tech 風格的資料視覺化。**
1. **服務拓樸 (Service Topology) 的「賽博解剖圖」**
- **分析**: 線上版使用傳統的節點連線圖。我們可以使用後端 `graph_rag.py` 提供的 `BlastRadiusResult` (爆炸半徑)。
- **進化方案**: 開發一個 3D 或極度平面的深色網路拓樸組件 (基於 React Flow 或 D3.js)。正常狀態下只有黑白相間的連線;當發生故障時,由故障節點向外擴散發出「紅色波紋 (Glitch 效果)」,一秒鐘讓 SRE 知道災情範圍。
2. **APM 效能監控的「極簡心電圖」**
- **分析**: 傳統 Grafana/SigNoz 圖表太過凌亂。
- **進化方案**: 在儀表板引入 **Sparklines (微型折線圖)**。移除所有的 X/Y 軸標籤與網格線,只用一條純白或高對比色的折線顯示過去 1 小時的 Latency 趨勢。當超出 P99 閾值時,折線局部變紅。這種高密度、低干擾的設計完美契合 Nothing 風格。
3. **成本優化 (FinOps) 的「廢墟數字」**
- **進化方案**: 不需要複雜的圓餅圖。直接在首頁放置一個巨大的、動態跳動的紅色數字:`WASTED CLOUD BUDGET: $1,245`,下方配一個按鈕 `[Execute AI Cleanup]`。這種強烈的視覺衝擊比十張分析圖表都有效。
### 🧠 Phase 3: AI 代理 (OpenClaw) 的雙向互動武裝 (3-4 週)
**目標:讓 AI Agent 不只是背景程序,而是運維團隊的「虛擬 SRE 同事」。**
1. **思維串流終端機 (Thinking Stream Terminal)**
- 目前 `agent.store.ts` 已經實作了強大的 SSE 解析與 Buffer 機制。我們應該將首頁右側的 OpenClaw 面板升級為一個**互動式終端機**。
- 當 AI 在處理問題時,以打字機效果 (Typing Effect) 實時印出它的推理過程:
```text
> [ERROR DETECTED] CPU spike on frontend-pod-1a2b
> [ANALYZING] Querying GraphRAG for blast radius...
> [RESULT] 3 upstream services might degrade.
> [DECISION] Propose auto-scaling. Waiting for Admin (CTO) approval.
```
2. **對話式命令列 (Command Palette)**
- 保留極簡 UI取代傳統選單。使用者隨時可以按 `/` 喚出命令列,直接輸入自然語言:「重啟所有失敗的 pod」或「幫我整理昨天的錯誤日誌」。AI 會解析並產生對應的操作卡片供使用者確認 (Multi-Sig)。
---
**總結報告**
`localhost:3000` 在設計美學上已經走在前端,但內部功能的骨架還需補齊。只要優先將缺少的核心路由 (`/approvals`, `/knowledge`) 補上,並針對 GraphRAG 與 SSE Thinking Stream 這兩個殺手級後端引擎進行前端特化渲染AWOOOI 將成為市面上最酷、最實用的運維作戰平台!

View File

@@ -0,0 +1,174 @@
# AWOOOI 專案架構與程式碼審查報告 (Architecture & Code Review)
## 1. 專案總覽 (Project Overview)
AWOOOI 是一個由 AI 驅動的智能運維平台 (AI+WOOO Intelligent Operations Platform)主打「Zero-Touch Ops. Human-Centric Decisions」。專案採用 Turborepo 建構的 Monorepo 架構,包含四大核心支柱:**Privacy Shield (隱私保護)**、**GraphRAG (拓撲感知情報)**、**Multi-Sig & Dry-Run (多重簽核與防禦)**、以及 **Progressive Autonomy (漸進式自治)**
### 技術棧 (Tech Stack)
- **Backend (apps/api)**: FastAPI, Python 3.11+, PostgreSQL, Redis, structlog, OpenTelemetry。
- **Frontend (apps/web)**: Next.js 14, React 18, Tailwind CSS, Zustand, React-Query。
- **Workspace**: pnpm + Turborepo抽出共用模組 `@awoooi/lewooogo-core`
---
## 2. 核心架構審查 (Architecture Review)
### 2.1 Backend (leWOOOgo Engine)
API 服務設計為高度模組化的 BFF (Backend For Frontend) 架構,並嚴格遵循四大鐵律 (Async-First, CORS Whitelist, Pydantic Config, Structured Logging)。
- **可觀測性先行**: 於 `main.py`OpenTelemetry 的初始化被置於啟動流程與 Middleware 的最前方,確保 Request 全生命週期的追蹤,這是極其優秀的企業級實踐。
- **生命週期管理 (Lifespan)**: 妥善利用 FastAPI 的 `@asynccontextmanager` 來管理資料庫連線、HTTP Clients Pooling 以及 SSE Publisher 的啟動與優雅關閉 (Graceful Shutdown)避免資源洩漏Memory Leaks
### 2.2 Frontend (Web App)
- **狀態管理**: 採用 `zustand` 進行元件外部的狀態管理 (`agent.store.ts`),特別針對 Server-Sent Events (SSE) 實作了專屬的 Buffer 累積機制。這能有效防止 TCP 封包切斷導致後端流式輸出的 JSON 解析錯誤,這在串流顯示 AI 思考過程 (Thinking Stream) 時非常關鍵。
- **嚴謹的環境變數字典**: 程式碼中 (如 `getApiBaseUrl`) 嚴格禁止了 Fallback IP 的濫用,強制拋出錯誤並依賴 `NEXT_PUBLIC_API_URL`,這確保了開發、測試與生產環境的絕對隔離。
---
## 3. 核心模組程式碼深潛審查 (Core Modules Deep Dive)
### 3.1 Multi-Sig Engine (`approval.py`)
- **亮點**: 實作了極具資安意識的 **TOCTOU (Time-of-Check to Time-of-Use) 防護**。在收集完所有高權限使用者的簽章、準備真正執行指令前,系統會強制對目標資源再次呼叫 `dry_run_engine.evaluate` 來檢查狀態是否發生過偏移。
- **合規性稽核**: 當發生 TOCTOU 衝突時,系統不會鴕鳥心態地直接清空簽章,而是將審批狀態明確標記為 `VOIDED` 並保留所有簽核歷史,完全符合金融與企業環境對資安稽核 (Audit Trail) 的要求。
- **設計模式**: 採用清晰的 In-Memory 狀態機與 Strategy Pattern 來實作不同風險層級 (Risk Matrix) 的簽核門檻。
### 3.2 GraphRAG Engine (`graph_rag.py`)
- **亮點**: 實作了強大且雙向的圖形遍歷演算法 (BFS-based traversal)。
- **Blast Radius (向上追溯)**: 準確計算「若特定服務掛掉,哪些上游服務將作為受災戶被連帶波及」。
- **Root Cause (向下追溯)**: 當服務報錯時,從異常節點往下找尋發生故障的根本原因,並具備優先權排序演算法 (DB > CACHE > QUEUE)。
- **效能考量**: 演算法中引入了 `max_depth` 限制,防止在大型 Kubernetes 叢集中發生圖遍歷的無限遞迴擴散,顯示出高度的工程成熟度。
---
## 4. 總結與改進建議 (Conclusion & Recommendations)
AWOOOI 的整體程式碼品質極高,充分展現了「企業級系統」應有的嚴謹度,特別是在異常處理、資安防護與微服務架構解耦上做得很到位。
**未來潛在改善點 (Tech Debt & Roadmap)**
1. **狀態持久化 (Persistence)**: 目前 `MultiSigEngine``TopologyGraph` 皆依賴 In-Memory (`dict`) 作為儲存。進入 Phase 4/5 生產環境硬化時,應盡快置換為 Redis (用於分散式鎖與簽核狀態共用) 與 Graph Database (如 Neo4j) 以應對多實例高可用部署 (Horizontal Scaling)。
2. **容錯與重試恢復 (Resilience)**: 在前端 SSE 串流部分,雖然處理了 AbortController 手動中斷,但可考慮加入對底層網路不穩時的自動重連機制與 Exponential Backoff 策略,進一步提升運維戰情室的體驗韌性。
---
## 5. 優化方案與解決策略 (Optimization Solutions)
### 5.1 問題與解決方案對照表
| # | 類別 | 問題 | 解決方案 | 優先級 | 規劃狀態 |
|---|------|------|----------|--------|----------|
| **1** | 狀態持久化 | `MultiSigEngine` 使用 In-Memory `dict` | 改用 **Redis** 實作分散式鎖與簽核狀態共用 | 🔴 P0 | ⚪ Phase 6.1.1 |
| **2** | 狀態持久化 | `TopologyGraph` 使用 In-Memory `dict` | 導入 **Neo4j / Redis Graph** 支援多實例 HA | 🔴 P0 | ⚪ Phase 6.1.2 |
| **3** | 容錯機制 | SSE 串流無自動重連 | 加入 **Exponential Backoff** + Auto-Reconnect | 🟡 P1 | ✅ ADR-004 已規劃 |
| **4** | 水平擴展 | 單實例限制 | Redis 分散式鎖 + Sticky Session 或 Redis Pub/Sub | 🟡 P1 | ⚪ Phase 6.3 |
> **備註**: 第 3 項 SSE 容錯機制已在 [ADR-004](adr/ADR-004-state-management.md) 定義完整規格 (Line 228-236),待驗證實作狀態。
---
### 5.2 詳細解決方案 (Implementation Details)
#### 5.2.1 Redis 狀態持久化 (Multi-Sig Engine)
```python
# apps/api/src/services/multi_sig_redis.py
import redis.asyncio as redis
from pydantic import BaseModel
from datetime import datetime
class ApprovalState(BaseModel):
request_id: str
status: str # PENDING | APPROVED | VOIDED
signatures: list[dict]
created_at: datetime
async def save_approval(r: redis.Redis, state: ApprovalState):
key = f"approval:{state.request_id}"
await r.hset(key, mapping=state.model_dump_json())
await r.expire(key, 86400 * 7) # 7 days TTL for audit
```
#### 5.2.2 Neo4j 圖資料庫 (GraphRAG Engine)
```python
# apps/api/src/services/graph_rag_neo4j.py
from neo4j import AsyncGraphDatabase
async def get_blast_radius(driver, service_id: str, max_depth: int = 5):
query = """
MATCH path = (s:Service {id: $service_id})<-[:DEPENDS_ON*1..$max_depth]-(affected)
RETURN affected.id, length(path) as depth
ORDER BY depth
"""
async with driver.session() as session:
result = await session.run(query, service_id=service_id, max_depth=max_depth)
return [record async for record in result]
```
#### 5.2.3 SSE 自動重連 + Exponential Backoff
```typescript
// apps/web/src/hooks/useSSEReconnect.ts
import { useState, useCallback } from 'react';
export function useSSEWithReconnect(url: string) {
const [retryCount, setRetryCount] = useState(0);
const maxRetries = 5;
const connect = useCallback(() => {
const eventSource = new EventSource(url);
eventSource.onerror = () => {
eventSource.close();
if (retryCount < maxRetries) {
const delay = Math.min(1000 * Math.pow(2, retryCount), 30000);
setTimeout(() => {
setRetryCount(prev => prev + 1);
connect();
}, delay);
}
};
eventSource.onopen = () => setRetryCount(0);
return eventSource;
}, [url, retryCount]);
return { connect, retryCount };
}
```
---
### 5.3 實作優先順序 (Implementation Roadmap)
| Phase | 項目 | 預估工時 | 狀態 | 對應任務 |
|-------|------|----------|------|----------|
| **Phase 6.1.1** | Redis Multi-Sig 持久化 | 2 天 | ⚪ 規劃中 | 簽核狀態 + TTL 7d |
| **Phase 6.1.2** | Neo4j GraphRAG 遷移 | 3 天 | ⚪ 規劃中 | Blast Radius 查詢 |
| **Phase 6.1.3** | Redis 分散式鎖 | 1 天 | ⚪ 規劃中 | Redlock 演算法 |
| **Phase 6.2** | SSE 容錯驗證 | 1.5 天 | ✅ ADR-004 | 驗證 Backoff/Heartbeat |
| **Phase 6.3** | 水平擴展 | 3 天 | ⚪ 規劃中 | Redis Pub/Sub + Sticky Session |
> **依賴**: Phase 6 需等待 Phase 5 (OpenClaw 實體化) 完成後執行。
---
## 6. 審查結論 (Review Conclusion)
本次架構審查確認 AWOOOI 具備企業級系統的核心素質,主要技術債集中於 **狀態持久化****水平擴展** 兩大領域。
### 關鍵發現
| 類別 | 結論 |
|------|------|
| **SSE 容錯** | ✅ 已在 ADR-004 完整規劃,待驗證實作 |
| **狀態持久化** | ⚪ 新需求,已納入 Phase 6.1 |
| **水平擴展** | ⚪ 新需求,已納入 Phase 6.3 |
### 執行順序
```
Phase 5 (OpenClaw 實體化) → Phase 6 (架構硬化)
↓ ↓
Telegram Gateway Redis + Neo4j + HA
```
**審查日期**: 2026-03-22
**審查人員**: Claude Code (Architecture Review Agent)
**更新紀錄**: Phase 6 任務已同步至 `memory/project_phases.md`

View File

@@ -0,0 +1,82 @@
# AWOOOI 核心架構與程式碼最終盤點清單 (Core Architecture & Codebase Inventory)
> **專案名稱 (Project)**: AWOOOI
> **行動代號 (Operation)**: Operation Phoenix Rising (原 Cyber-Shell)
> **文件狀態 (Status)**: Active Development
> **建檔日期 (Date)**: 2026-03-19 (更新: 2026-03-20)
本文件記錄了 AWOOOI 系統從零到一的關鍵架構決策與防禦性實作(防雷紀錄),作為未來技術團隊接手、擴展與稽核的最高指導原則。
(This document records the key architectural decisions and defensive implementations of the AWOOOI system from zero to one, serving as the supreme guiding principle for future technical teams' handover, scaling, and auditing.)
---
## Phase 0: Phoenix Rising (2026-03-20 戰略重構)
| 核心文檔與規範 (Core Documentation) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **API 開發 SOP**<br>`docs/api/API_DEVELOPMENT_SOP.md` | 定義 Contract-First 開發流程,強制 OpenAPI + MD 同步更新CI 阻擋不一致提交。快取 TTL 分層 (1h/5m/30s/0) 與日誌脫敏規範。<br>*(Defined Contract-First workflow, enforced OpenAPI + MD sync updates with CI blocking. Cache TTL tiering and log sanitization rules.)* |
| **原子組件庫規格**<br>`docs/design/COMPONENT_LIBRARY.md` | 完整定義 Nothing.tech 純白工業風 Design Tokens (色彩/間距/字體/效果)。涵蓋 12 個核心組件規格 (StatusOrb/GlassCard/HostCard/ApprovalCard/CommandPalette 等)。<br>*(Complete Nothing.tech pure white industrial Design Tokens. 12 core component specifications including StatusOrb, GlassCard, HostCard, ApprovalCard, CommandPalette.)* |
| **RBAC 權限架構**<br>`docs/security/RBAC_SCHEMA.md` | 簡化至 4 角色 (Owner/Admin/Member/Viewer) + 資源級權限。Multi-Sig 簽核機制與 Blast Radius 風險矩陣。完整資料庫 Schema 與遷移策略。<br>*(Simplified to 4 roles + resource-level permissions. Multi-Sig approval mechanism with Blast Radius risk matrix. Complete DB schema and migration strategy.)* |
| **機密參考指南**<br>`docs/security/SECRETS_REFERENCE.md` | 解決「重複詢問帳密」痛點,記錄「去哪裡找」而非實際值。涵蓋開發 (.env.local)、CI (GitHub Secrets)、Prod (K8s Secrets) 三環境。<br>*(Solved "repeated credential asking" pain point. Documents "where to find" not actual values. Covers dev, CI, and prod environments.)* |
---
## Phase 1: 視覺與大腦 (前端與核心連線 / Visuals & AI Brain)
| 核心模組與檔案 (Core Modules) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **狀態管理 (State Management)**<br>`agent.store.ts` | 採用 Zustand 封裝 SSE 串流與 `AbortController`,將網路請求與畫面渲染徹底解耦。<br>*(Adopted Zustand to encapsulate SSE streaming and `AbortController`, completely decoupling network requests from UI rendering.)* |
| **數據鉗 UI (Data Pincer UI)**<br>`data-pincer.tsx` | 落實 Nothing.tech 風格,使用 `.glass-panel` 與 Tailwind 狀態色碼,利用 Selector 避免無意義的重複渲染。<br>*(Implemented Nothing.tech style using `.glass-panel` and Tailwind status colors, utilizing Selectors to prevent meaningless re-renders.)* |
| **大腦連線 (Brain Connection)**<br>`agent.py` | 串接本地 Ollama 模型實作「Token 累積緩衝(每 10 字符發送)」,確保前端打字機效果如絲綢般滑順。<br>*(Integrated local Ollama models with "Token Accumulation Buffer", ensuring silky-smooth typewriter effects on the frontend.)* |
---
## Phase 2: 人機協作與企業合規 (HITL & Enterprise Compliance)
| 核心模組與檔案 (Core Modules) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **授權卡片 (Approval Card)**<br>`ApprovalCard.tsx` | 實作 Blast Radius (爆炸半徑) 視覺化,並針對 `DESTRUCTIVE` (毀滅性操作) 強制加入二次解鎖防呆機制。<br>*(Visualized Blast Radius and enforced a secondary unlock mechanism for `DESTRUCTIVE` operations to prevent human error.)* |
| **預演引擎 (Dry-Run Engine)**<br>`dry_run.py` | 實作 K8s Mock 驗證契約,涵蓋 RBAC、語法與資源檢查確保「沒過 Dry-Run 絕對不准按批准」。<br>*(Implemented K8s Mock validation contracts covering RBAC, syntax, and resource checks. Strict rule: No approval without passing Dry-Run.)* |
| **多重簽核 (Multi-Sig)**<br>`approval.py` | 實作風險矩陣。**阻斷 TOCTOU 漏洞**:批准前強制重跑 Dry-Run若狀態改變簽章標記為 `VOIDED` (作廢) 以保留稽核軌跡,嚴禁物理刪除。<br>*(Implemented Risk Matrix. **Blocked TOCTOU Vulnerability**: Forced Dry-Run re-evaluation before approval execution. Voided signatures on state changes to preserve audit trails; physical deletion is strictly prohibited.)* |
| **資料脫敏 (Privacy Shield)**<br>`privacy_shield.py` | 實作企業級 Regex 攔截。導入 **Consistent Hashing (一致性雜湊)**,確保跨日誌的同 IP 獲得相同標籤,完美保留 AI 判斷上下文的能力。<br>*(Enterprise-grade Regex interception. Introduced **Consistent Hashing** to ensure identical IPs across logs get the same label, preserving AI context reasoning.)* |
---
## Phase 3: 企業護城河 (AI 擴充功能 / Enterprise Moats)
| 核心模組與檔案 (Core Modules) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **工具橋樑 (MCP Bridge)**<br>`mcp_bridge.py` | 串接 MCP 協議。實作 **Rehydration Engine (資安標籤還原器)**,並要求按標籤長度或邊界匹配替換,嚴禁將還原後的參數寫入標準日誌。<br>*(MCP Protocol integration. Implemented **Rehydration Engine** with strict boundary-matching replacement rules. Logging rehydrated parameters is strictly forbidden.)* |
| **信任引擎 (Trust Engine)**<br>`trust_engine.py` | 實作漸進自治。導入 `normalize_action_pattern` 忽略 K8s Hash 碼;設定 Reject 瞬間歸零,且 `CRITICAL` 級別永遠不准降級。<br>*(Progressive Autonomy. Introduced `normalize_action_pattern` to ignore K8s hash codes. Rejects instantly reset trust to zero, and `CRITICAL` levels can never be downgraded.)* |
| **成本優化 (FinOps Engine)**<br>`cost_analyzer.py` | 實作 CFO 印鈔機。嚴格區分 `Realizable` (真實省錢) 與 `Freed` (釋放空間);導入 `SAFETY_BUFFER = 1.2`,嚴防極限縮容導致 OOM 系統崩潰。<br>*(The CFO Money Printer. Strictly distinguished `Realizable` vs `Freed` savings. Introduced `SAFETY_BUFFER = 1.2` to prevent OOM system crashes from extreme downscaling.)* |
| **知識圖譜 (GraphRAG)**<br>`graph_rag.py` | 實作 BFS 上下游追溯。加入 **`max_depth` (最大深度限制)** 防止爆炸半徑無限擴張;尋找 Root Cause 時優先收集所有異常的 DB/CACHE 節點。<br>*(BFS upstream/downstream tracing. Added **`max_depth` limit** to prevent infinite blast radius expansion. Prioritized collecting all abnormal DB/CACHE nodes when seeking Root Causes.)* |
---
## Phase 4: 最終門面 (展示與開源準備 / Final Polish & Open Source)
| 核心模組與檔案 (Core Modules) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **思考流終端機 (Thinking Terminal)**<br>`ThinkingTerminal` | 導入 ASCII Art 動態渲染拓撲依賴圖與 FinOps 三欄式紅綠燈,極致提升硬核賽博龐克質感。<br>*(Introduced dynamic ASCII Art rendering for topology dependency graphs and a 3-column FinOps traffic light system, maximizing the hardcore cyberpunk aesthetic.)* |
| **多語系引擎 (i18n Engine)**<br>`next-intl` & `middleware.ts` | 導入 `next-intl` 實作動態語系路由,支援 `zh-TW` (預設) 與 `en`,完美對齊企業級 SaaS 國際化標準。<br>*(Implemented dynamic locale routing with `next-intl`, supporting `zh-TW` (default) and `en`, perfectly aligning with enterprise SaaS i18n standards.)* |
| **開源門面 (Open Source README)**<br>`README.md` | 定調 Slogan 與 Hero Section完整包裝四大企業護城河為 GitHub 開源與商業化做好全面準備。<br>*(Defined Slogan and Hero Section, fully packaging the four enterprise moats, preparing for GitHub open-sourcing and commercialization.)* |
---
---
## Phase 6: 架構硬化 (Horizontal Scaling / 規劃中)
| 核心模組與檔案 (Core Modules) | 首席架構師拍板的「關鍵決策與排雷紀錄」 (Architect's Key Decisions & Pitfall Avoidance) |
| :--- | :--- |
| **Redis 狀態持久化**<br>`multi_sig_redis.py` (規劃中) | 將 `MultiSigEngine` 從 In-Memory 遷移至 Redis Hash支援分散式部署與 7 天 TTL 稽核保留。導入 Redlock 演算法實現分散式鎖。<br>*(Migrate `MultiSigEngine` from In-Memory to Redis Hash for distributed deployment. Implement Redlock algorithm for distributed locking.)* |
| **Neo4j 圖資料庫**<br>`graph_rag_neo4j.py` (規劃中) | 將 `TopologyGraph` 遷移至 Neo4j支援複雜的 Blast Radius 與 Root Cause 圖遍歷查詢。解決大型叢集的效能瓶頸。<br>*(Migrate `TopologyGraph` to Neo4j for complex Blast Radius and Root Cause graph traversal queries. Resolve performance bottlenecks in large clusters.)* |
| **SSE 容錯驗證**<br>`dashboard.store.ts` | 驗證 ADR-004 定義的企業級 SSE 實作Exponential Backoff (1s→30s)、Heartbeat (30s)、Buffer 批次更新 (5s)。<br>*(Validate ADR-004 enterprise SSE implementation: Exponential Backoff, Heartbeat, Buffer batch updates.)* |
| **水平擴展**<br>K8s Service + Redis Pub/Sub | 實作 SSE 多實例廣播 (Redis Pub/Sub) 與 Sticky Session確保用戶連線一致性。<br>*(Implement SSE multi-instance broadcast via Redis Pub/Sub and Sticky Session for connection consistency.)* |
**來源**: `docs/ARCHITECTURE_CODE_REVIEW.md` 技術債審查 (2026-03-22)
---
*Zero-Touch Ops. Human-Centric Decisions.*
*(零干預維運,以人為本的決策。)*

View File

@@ -2,17 +2,17 @@
> AI 模組地圖索引 - 每次新增積木後必須登記
**最後更新**: 2026-03-23
**最後更新**: 2026-03-23 (Phase 9 Agent Teams)
**維護者**: Claude Code + C-Suite
---
## 📦 Python 積木 (packages/)
| 積木名稱 | 職責 | 對外介面 | ADR |
|----------|------|----------|-----|
| **lewooogo-brain** | AI 推論與決策邏輯 | `IProposalEngine`, `IIncidentProcessor` | ADR-008 |
| **lewooogo-data** | 資料抽象與持久化 | `IMemoryProvider`, `IDualMemoryProvider` | ADR-008 |
| 積木名稱 | 職責 | 對外介面 | 狀態 | ADR |
|----------|------|----------|------|-----|
| **lewooogo-brain** | Brain 積木 - AI 決策與提案引擎 | `IProposalEngine`, `IIncidentProcessor`, `Guardrails` | ✅ 已完成 | ADR-008 |
| **lewooogo-data** | Data 積木 - 雙層記憶體 (Working + Episodic) | `IMemoryProvider`, `IDualMemoryProvider` | ✅ 已完成 | ADR-008 |
### lewooogo-brain 模組結構
@@ -23,10 +23,12 @@ packages/lewooogo-brain/
│ │ ├── proposal_engine.py → IProposalEngine
│ │ └── incident_processor.py → IIncidentProcessor
│ ├── engines/ # 推論引擎實作
│ │ ├── proposal_engine.py # 🔲 待實作
│ │ └── incident_engine.py # 🔲 待搬遷
│ │ ├── proposal_engine.py # ✅ ProposalEngine 已完成
│ │ └── incident_engine.py # ✅ IncidentEngine 已完成
│ ├── guardrails/ # 安全護欄
│ │ └── guardrails.py # ✅ Guardrails 已完成
│ └── skills/ # Skill 動態載入
│ └── loader.py # 🔲 待實作
│ └── loader.py # ✅ SkillLoader 已完成
```
### lewooogo-data 模組結構
@@ -37,9 +39,9 @@ packages/lewooogo-data/
│ ├── interfaces/ # ABC 定義
│ │ └── memory_provider.py → IMemoryProvider, IDualMemoryProvider
│ └── providers/ # 具體實作
│ ├── redis_memory.py # 🔲 待實作
│ ├── pg_memory.py # 🔲 待實作
│ └── dual_memory.py # 🔲 待實作
│ ├── redis_memory.py # ✅ RedisMemoryProvider 已完成
│ ├── pg_memory.py # ✅ PgMemoryProvider 已完成
│ └── dual_memory.py # ✅ DualMemoryProvider 已完成
```
---
@@ -52,18 +54,53 @@ packages/lewooogo-data/
---
## 🤖 Agent Teams (apps/api/src/agents/)
> Phase 9 新增 - 專家 Agent 群組
| Agent 名稱 | 職責 | 狀態 | ADR |
|------------|------|------|-----|
| **SecurityAgent** | 資安風險評估與威脅分析 | ✅ 已完成 | ADR-009 |
| **BlastRadiusAgent** | 爆炸半徑影響範圍分析 | ✅ 已完成 | ADR-009 |
| **ActionPlannerAgent** | 行動計畫制定與步驟規劃 | ✅ 已完成 | ADR-009 |
| **ConsensusEngine** | 多 Agent 共識引擎 | ✅ 已完成 | ADR-009 |
### Agent Teams 模組結構
```
apps/api/src/agents/
├── __init__.py
├── base_agent.py # Agent 基底類別
├── security_agent.py # ✅ SecurityAgent 資安專家
├── blast_radius_agent.py # ✅ BlastRadiusAgent 影響分析
├── action_planner_agent.py # ✅ ActionPlannerAgent 行動規劃
└── consensus_engine.py # ✅ ConsensusEngine 共識引擎
```
---
## 🔗 模組依賴關係
```
apps/api (FastAPI BFF)
├── lewooogo-brain (AI 積木)
├── agents/ (Agent Teams) ✅ Phase 9 專家群組
│ └── lewooogo-brain (AI 積木)
├── lewooogo-brain (AI 積木) ✅ Phase 6.4 已完成
│ └── lewooogo-data (資料積木)
└── lewooogo-data (直接引用)
└── lewooogo-data (直接引用) ✅ Phase 6.4 已完成
apps/web (Next.js)
└── lewooogo-core (TS 積木)
```
### Docker Build 指令 (Phase 6.4i)
```bash
# 必須從 monorepo 根目錄執行
cd /path/to/awoooi
docker build -f apps/api/Dockerfile -t awoooi-api:latest .
```
---
## 📋 介面契約索引
@@ -74,8 +111,11 @@ apps/web (Next.js)
|------|------|------|
| `IProposalEngine` | `lewooogo_brain.interfaces` | 決策提案生成 |
| `IIncidentProcessor` | `lewooogo_brain.interfaces` | 事件聚合處理 |
| `Guardrails` | `lewooogo_brain.guardrails` | 安全護欄與風險檢查 |
| `IMemoryProvider` | `lewooogo_data.interfaces` | 單層記憶體存取 |
| `IDualMemoryProvider` | `lewooogo_data.interfaces` | 雙層記憶體 (Working + Episodic) |
| `BaseAgent` | `apps.api.src.agents` | Agent 基底類別 |
| `ConsensusEngine` | `apps.api.src.agents` | 多 Agent 共識協調 |
### HTTP API 契約

190
docs/DEPENDENCIES.md Normal file
View File

@@ -0,0 +1,190 @@
# AWOOOI 依賴清單與版本控制
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CTO
> **更新頻率**: 每次版本發布同步更新
---
## 更新規範
**每次版本發布時,必須同步更新此文件:**
1. 新增/移除套件時更新對應章節
2. 版本升級時更新版本號
3. 記錄變更原因至變更記錄區
---
## Frontend 依賴 (apps/web)
### 核心框架
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| next | ^14.2.x | React 框架 | App Router |
| react | ^18.3.x | UI 函式庫 | |
| react-dom | ^18.3.x | DOM 渲染 | |
| typescript | ^5.4.x | 型別系統 | |
### 狀態管理
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| zustand | ^4.5.x | 全域狀態 | SSE 封裝 |
| @tanstack/react-query | ^5.x | 伺服器狀態 | 快取/同步 |
### UI / 樣式
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| tailwindcss | ^3.4.x | CSS 框架 | Nothing.tech 配置 |
| @radix-ui/react-* | ^1.x | 無障礙組件 | 按需引入 |
| lucide-react | ^0.x | 圖示庫 | |
| clsx | ^2.x | 類名工具 | |
| tailwind-merge | ^2.x | Tailwind 合併 | |
### 國際化
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| next-intl | ^3.x | i18n 框架 | 動態路由 |
### 開發工具
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| eslint | ^8.x | 程式碼檢查 | |
| prettier | ^3.x | 程式碼格式化 | |
| @types/node | ^20.x | Node 型別 | |
| @types/react | ^18.x | React 型別 | |
### 測試工具
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| @playwright/test | ^1.x | E2E 測試 | **截圖/錄影必啟用** (CEO #5) |
| @storybook/react | ^8.x | 組件測試 | 視覺化 |
| vitest | ^1.x | 單元測試 | |
---
## Backend 依賴 (apps/api)
### 核心框架
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| fastapi | ^0.111.x | Web 框架 | |
| uvicorn | ^0.29.x | ASGI 伺服器 | |
| pydantic | ^2.7.x | 資料驗證 | |
| python-dotenv | ^1.x | 環境變數 | |
### 資料庫
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| sqlalchemy | ^2.x | ORM | |
| asyncpg | ^0.29.x | PostgreSQL 驅動 | |
| alembic | ^1.13.x | 資料庫遷移 | |
| redis | ^5.x | Redis 客戶端 | |
### AI 整合
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| httpx | ^0.27.x | HTTP 客戶端 | Ollama/API 調用 |
| google-generativeai | ^0.5.x | Gemini API | 雲端備援 (優先) |
| anthropic | ^0.25.x | Claude API | 雲端備援 (次選) |
### 安全
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| python-jose | ^3.x | JWT 處理 | |
| passlib | ^1.7.x | 密碼雜湊 | |
| bcrypt | ^4.x | 加密演算法 | |
### 測試
| 套件 | 版本 | 用途 | 備註 |
|------|------|------|------|
| pytest | ^8.x | 測試框架 | |
| pytest-asyncio | ^0.23.x | 非同步測試 | |
| pytest-cov | ^5.x | 覆蓋率 | |
| httpx | ^0.27.x | API 測試 | |
---
## 基礎設施工具
### 容器化
| 工具 | 版本 | 用途 | 備註 |
|------|------|------|------|
| Docker | 24.x+ | 容器運行時 | |
| Docker Compose | 2.x | 本地開發 | |
### K8s 工具
| 工具 | 版本 | 用途 | 備註 |
|------|------|------|------|
| kubectl | 1.29.x | K8s CLI | |
| k3s | 1.29.x | 輕量 K8s | 120/121 主機 |
| helm | 3.x | 套件管理 | 可選 |
### CI/CD
| 工具 | 版本 | 用途 | 備註 |
|------|------|------|------|
| GitHub Actions | - | CI/CD | |
| Turborepo | ^1.x | Monorepo 建構 | |
### 監控
| 工具 | 版本 | 用途 | 備註 |
|------|------|------|------|
| SigNoz | latest | APM/追蹤 | 192.168.0.188:3301 |
| Prometheus | 2.x | 指標收集 | |
---
## 運行環境
### Node.js
| 環境 | 版本 | 備註 |
|------|------|------|
| Node.js | 20.x LTS | |
| pnpm | 9.x | 套件管理器 |
### Python
| 環境 | 版本 | 備註 |
|------|------|------|
| Python | 3.11+ | |
| pip | 24.x | |
| poetry | 1.8.x | 依賴管理 (可選) |
---
## 版本鎖定檔案
| 檔案 | 位置 | 用途 |
|------|------|------|
| `pnpm-lock.yaml` | 根目錄 | Frontend 依賴鎖定 |
| `requirements.txt` | apps/api/ | Backend 依賴鎖定 |
| `pyproject.toml` | apps/api/ | Poetry 依賴 (可選) |
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此文件由 CTO 維護,每次版本發布必須同步更新。*

View File

@@ -9,10 +9,10 @@
| 項目 | 狀態 |
|------|------|
| **當前 Phase** | **Phase 6.4 實作中** - 模組化架構重整 + Decision Proposal |
| **當前 Phase** | **Phase 6.5c 完成** - UX 改善 + 錯誤回饋優化 |
| **Day** | Day 5 |
| **下一步** | Phase 6.4d MemoryProvider 實作 |
| **重大變更** | 🚨 **生產事故修復**: Worker CrashLoopBackOff (7h) + 簽核卡片 Race Condition |
| **下一步** | 驗證 Y 按鈕 UX 改善效果 |
| **重大變更** | 🎨 **UX 改善**: 錯誤訊息明顯顯示 + 30秒超時警告 + 重試按鈕 |
### 🧠 認知覺醒計畫 Phase 6 施工順序 (C-Suite 2026-03-23 統帥方案)
@@ -40,6 +40,10 @@
| 時間 | 事件 | 負責人 |
|------|------|--------|
| 2026-03-23 14:35 | **🎨 Phase 6.5c UX 改善**: 錯誤訊息明顯顯示 (非 hover) + 30 秒超時警告 + 重試按鈕 + 取消自動恢復 (讓用戶看到錯誤) | Claude Code |
| 2026-03-23 14:20 | **🔧 Y 按鈕執行修復**: 中文 Action 解析擴充 (擴展/重新啟動) + StatefulSet Pod 自動識別 (`xxx-0` → DELETE_POD) + `-deployment` 後綴自動移除 | Claude Code |
| 2026-03-23 14:15 | **📝 Memory 同步**: feedback_modular_core_spirit.md (模組化核心精神鐵律) + MEMORY.md 索引更新 | Claude Code |
| 2026-03-23 13:08 | **⚡ Phase 6.5c+ 交互神經強化完成**: Approval 按鈕物理回饋 (active縮放/防呆) + API 鏈路確認 (`/api/v1/approvals/{id}/sign`) + 樂觀更新 (Optimistic UI) 立即 Loading | 首席架構師 |
| 2026-03-23 11:50 | **🧠 Phase 6.4g API 突觸對接完成**: `/propose` 路由建立 + Guardrails 8/8 測試通過 + lewooogo-brain 積木綁定 | Claude Code |
| 2026-03-23 11:55 | **🎨 Phase 6.5a 視覺皮層啟動**: DualStateIncidentCard.tsx 雙態戰情室卡片 + Nothing.tech 視覺憲法 | Claude Code |
| 2026-03-23 09:30 | **🔧 NetworkPolicy 修復**: `allow-required-egress` podSelector 改為 `system=awoooi` (原本只允許 API pod) | Claude Code |

View File

@@ -0,0 +1,447 @@
# WOOO-AIOPS 監控機制盤點報告
> **遷移至 AWOOOI 的監控資產清單**
>
> 盤點日期: 2026-03-22
> 來源專案: `/Users/ogt/wooo-aiops`
---
## 1. 監控系統總覽 (Monitoring Stack Overview)
| 元件 | 用途 | 來源路徑 | 遷移優先級 |
|------|------|----------|------------|
| **OpenTelemetry** | Distributed Tracing | `clawbot/app/core/telemetry.py` | 🔴 P0 |
| **Prometheus** | Metrics 採集 | `docker/prometheus/prometheus.yml` | 🔴 P0 |
| **Alertmanager** | 告警路由與通知 | `docker/alertmanager/alertmanager.yml` | 🔴 P0 |
| **SignOz** | APM + Traces + Logs | `infrastructure/signoz/alert-rules.yaml` | 🟡 P1 |
| **Grafana** | 儀表板視覺化 | `docker/grafana/dashboards/*.json` | 🟡 P1 |
| **Loki + Promtail** | Log Aggregation | `docker/loki/loki-config.yml` | 🟡 P1 |
---
## 2. 健康檢查機制 (Health Checks)
### 2.1 API 健康端點
| 端點 | 用途 | 檢查項目 |
|------|------|----------|
| `/health` | Liveness Probe | git_sha, build_time, version |
| `/ready` | Readiness Probe | DB 連線, Redis 連線 |
| `/api/v1/health` | Gateway Health | API 閘道狀態 |
**來源檔案**: `src/api/routes/health.py`
### 2.2 K8s Probes 配置
```yaml
# 來源: infrastructure/kubernetes/base/api-deployment.yaml
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
```
---
## 3. 告警規則盤點 (Alert Rules Inventory)
### 3.1 Prometheus Alert Rules (20+ 條)
**來源檔案**:
- `docker/prometheus/rules/alerts.yml`
- `docker/prometheus/rules/service-health-rules.yml`
#### 3.1.1 系統層級告警 (P0 Critical)
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `InstanceDown` | 實例離線 > 1m | 🔴 P0 |
| `VersionDriftDetected` | 部署版本與預期不符 | 🔴 P0 |
| `UnexpectedPodRestart` | Pod 非預期重啟 | 🔴 P0 |
| `ImagePullBackOff` | 映像拉取失敗 | 🔴 P0 |
| `PodCrashLoopBackOff` | Pod 持續崩潰 | 🔴 P0 |
#### 3.1.2 CI/CD Pipeline 告警
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `PipelineFailed` | Pipeline 執行失敗 | 🔴 P0 |
| `PipelineTooSlow` | Pipeline > 30m | 🟡 P1 |
| `JobStuckPending` | Job 排隊 > 5m | 🟡 P1 |
| `JobRunningTooLong` | Job 執行 > 30m | 🟡 P1 |
| `GitLabRunnerOffline` | Runner 離線 | 🔴 P0 |
#### 3.1.3 基礎設施告警
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `HighCpuLoad` | CPU > 90% for 5m | 🔴 P0 |
| `HighMemoryUsage` | Memory > 90% for 5m | 🔴 P0 |
| `DiskSpaceLow` | Disk > 85% | 🟡 P1 |
| `HTTP502Spike` | 502 錯誤激增 | 🔴 P0 |
| `HTTP500Spike` | 500 錯誤激增 | 🔴 P0 |
#### 3.1.4 資料庫/快取告警
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `PostgreSQLConnectionFailed` | DB 連線失敗 | 🔴 P0 |
| `RedisConnectionFailed` | Redis 連線失敗 | 🔴 P0 |
| `PostgreSQLConnectionPoolExhausted` | 連線池 > 90% | 🟡 P1 |
#### 3.1.5 SSL 憑證告警
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `SSLCertExpiringSoon` | 憑證 14 天內到期 | 🟡 P1 |
| `SSLCertExpired` | 憑證已過期 | 🔴 P0 |
#### 3.1.6 效能告警
| 告警名稱 | 觸發條件 | 嚴重度 |
|----------|----------|--------|
| `APIResponseTimeSlow` | P95 延遲 > 2s | 🟡 P1 |
| `HighErrorRate` | 錯誤率 > 1% | 🟡 P1 |
| `WebSocketConnectionFailed` | WebSocket 失敗 | 🟢 P2 |
### 3.2 SignOz Alert Rules (30+ 條)
**來源檔案**: `infrastructure/signoz/alert-rules.yaml`
| 類別 | 告警數量 | 嚴重度分佈 |
|------|----------|------------|
| 資料庫 | 5 | P0: 2, P1: 3 |
| 快取 | 3 | P0: 1, P1: 2 |
| HTTP 錯誤 | 6 | P0: 2, P1: 2, P2: 2 |
| 容器 | 4 | P0: 2, P1: 2 |
| 服務專屬 | 12 | 依服務而定 |
**服務專屬告警涵蓋**:
- Gitea, Harbor, ClawBot, Ollama, SignOz, n8n
---
## 4. 通知管道盤點 (Notification Channels)
### 4.1 Telegram 整合
**來源檔案**:
- `src/api/routes/telegram_alerts.py`
- `clawbot/app/bot/telegram.py`
| 頻道 | 環境變數 | 用途 |
|------|----------|------|
| 一般告警 | `TELEGRAM_CHAT_ID` | 全部告警 |
| P0 緊急 | `TELEGRAM_P0_CHAT_ID` | Critical 專用 |
| 資安告警 | `TELEGRAM_SECURITY_CHAT_ID` | Security 專用 |
**功能**:
- HTML 格式化 + Emoji 嚴重度標示
- 背景任務發送 (non-blocking)
- 雙向互動: `/ask` 指令觸發 AI 診斷
### 4.2 Slack 整合
**來源檔案**: `docker/alertmanager/alertmanager.yml`
| 頻道 | 用途 |
|------|------|
| `#alerts` | 預設告警 |
| `#alerts-security` | 資安告警 |
| `#alerts-security-critical` | 資安緊急 |
| `#alerts-infra` | 基礎設施 |
| `#p0-war-room` | P0 作戰室 |
### 4.3 PagerDuty On-Call
| 服務 | SLA | 用途 |
|------|-----|------|
| P0 Service Key | 5 分鐘回應 | Critical |
| P1 Service Key | 15 分鐘回應 | High |
**自動升級至 C-Level**
### 4.4 Email 通知
**來源檔案**: `src/services/notification.py`
- SMTP (TLS/STARTTLS)
- aiosmtplib 非同步
- HTML + Plain-text
---
## 5. 自動修復機制 (Auto-Remediation)
### 5.1 修復引擎 v1 (Remediation Engine)
**來源檔案**: `src/automation/remediation_engine.py`
#### 修復動作對照表
| 動作 | 說明 | 觸發告警 |
|------|------|----------|
| `restart_pod` | Pod 重啟 | HighErrorRate, PodCrashLooping, SlowResponse, ServiceDown |
| `scale_up` | 水平擴展 | HighCPU, HighMemory |
| `scale_down` | 縮減副本 | 手動觸發 |
| `rollback_deployment` | 版本回滾 | 手動觸發 |
| `clear_cache` | 清除 Redis | 手動觸發 |
#### 安全護欄
```python
# 白名單 Namespace
ALLOWED_NAMESPACES = ["wooo-aiops-uat", "wooo-aiops-prod"]
# Dry-Run 模式
AUTOMATION_DRY_RUN = True/False
# 最大副本數限制
MAX_REPLICAS = 10
```
### 5.2 修復引擎 v2 (Repair Engine)
**來源檔案**: `src/engines/repair_engine.py`
#### 8 種修復策略
| 策略 | 說明 |
|------|------|
| `RESTART` | Pod 重啟 |
| `SCALE_UP` | 水平擴展 |
| `SCALE_DOWN` | 縮減副本 |
| `ROLLBACK` | 版本回滾 |
| `INCREASE_MEMORY` | 調整記憶體 (+50% max) |
| `INCREASE_CPU` | 調整 CPU (+50% max) |
| `VACUUM_DB` | 資料庫維護 |
| `CLEAR_CACHE` | 清除快取 |
#### 安全限制
| 參數 | 值 | 說明 |
|------|-----|------|
| Max repairs/hour | 5 | 每小時最多修復次數 |
| Max consecutive failures | 3 | 連續失敗後停止 |
| Min healthy replicas | 1 | 最少健康副本 |
| Rollback window | 24h | 回滾時間窗口 |
| Memory increase limit | 50% | 記憶體增幅上限 |
| CPU increase limit | 50% | CPU 增幅上限 |
### 5.3 自動恢復腳本
**來源檔案**: `scripts/auto-recovery.sh`
**Cron 排程**: 每 10 分鐘執行
```bash
# 檢查項目
1. API Health Check (HTTP 200)
2. Frontend Health Check (HTTP 200/302)
3. Disk Space (>90% 觸發清理)
4. GitHub Actions Runner 狀態
5. 服務重啟恢復
```
**日誌位置**: `/var/log/wooo/auto-recovery.log`
---
## 6. SLA 引擎與升級機制 (SLA Engine)
**來源檔案**: `src/engines/sla_engine.py`
### 6.1 SLA 門檻
| 優先級 | 回應時間 | 解決時間 |
|--------|----------|----------|
| P0 | 5 分鐘 | 30 分鐘 |
| P1 | 15 分鐘 | 2 小時 |
| P2 | 1 小時 | 8 小時 |
| P3 | 4 小時 | 24 小時 |
### 6.2 升級層級
| Level | 角色 |
|-------|------|
| L0 | L1 Support (一線支援) |
| L1 | L2 Expert (專家支援) |
| L2 | Team Lead (部門主管) |
| L3 | Director (總監) |
| L4 | C-Level (CEO, CTO, CIO, CISO, CPO) |
### 6.3 升級矩陣
| 優先級 | 升級路徑 |
|--------|----------|
| P0 | On-Call → Team Lead → CIO → CISO → CEO |
| P1 | On-Call → Team Lead → CIO |
| P2 | On-Call → Team Lead |
| P3 | On-Call 僅 |
---
## 7. 告警聚合與去重 (Alert Aggregation)
**來源檔案**: `src/services/alert_aggregator.py`
### 功能
| 功能 | 說明 |
|------|------|
| 指紋去重 | 相同告警精確比對 |
| 時間窗口去重 | 5 分鐘內相同告警 |
| 告警風暴偵測 | > 10 告警/分鐘 |
| 標籤分組 | 相似標籤聚合 |
### Prometheus Metrics
```promql
wooo_alerts_received_total{severity, source}
wooo_alerts_deduplicated_total{reason}
wooo_alerts_aggregated_total{group_key}
wooo_alert_groups_active
wooo_alert_storm_detected_total
```
---
## 8. Grafana 儀表板盤點 (Dashboards)
| 儀表板 | 路徑 | 用途 |
|--------|------|------|
| AIOPS Brain | `infrastructure/grafana/dashboards/aiops-brain.json` | AI 大腦狀態 |
| API Performance | `docker/grafana/dashboards/api-performance.json` | API 效能 |
| Container Health | `docker/grafana/dashboards/container-health.json` | 容器健康 |
| System Overview | `docker/grafana/dashboards/system-overview.json` | 系統總覽 |
| DevOps KPIs | `infrastructure/grafana/dashboards/devops-kpis.json` | DevOps 指標 |
| Pipeline Health | `infrastructure/grafana/dashboards/pipeline-health.json` | Pipeline 健康 |
---
## 9. 告警工單整合 (Alert-to-Ticket)
**來源檔案**: `src/services/alert_ticket_service.py`
| 功能 | 說明 |
|------|------|
| 自動建票 | 所有告警自動建立工單 |
| 去重機制 | 防止相同告警重複建票 |
| 嚴重度對映 | P0/P1/P2 → 工單優先級 |
| 自動關閉 | 告警解除時自動關閉工單 |
---
## 10. 自訂 Metrics 匯出 (Custom Metrics)
### 10.1 部署追蹤
```promql
wooo_deployment_version_drift # 1 = 版本漂移
wooo_pipeline_status{status} # failed = 1
wooo_pipeline_duration_seconds
wooo_job_queued_duration_seconds
wooo_job_duration_seconds
wooo_gitlab_runner_status # 0 = offline
```
### 10.2 自動修復
```promql
wooo_repair_total{app_id, action, status}
wooo_repair_duration_seconds{app_id, action}
wooo_repair_in_progress{app_id}
```
---
## 11. 告警流程圖 (Notification Flow)
```
┌─────────────────────────────────────────────────────────────┐
│ Alert Triggered │
│ (Prometheus / SignOz) │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Alertmanager Webhook │
└──────────────────────┬──────────────────────────────────────┘
┌──────────────┼──────────────┬──────────────┐
↓ ↓ ↓ ↓
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐
│ Ticket │ │ Telegram │ │ PagerDuty│ │ Slack │
│ System │ │ Bot │ │ (On-Call)│ │ Channels │
└─────────┘ └──────────┘ └──────────┘ └───────────┘
┌──────────────────────────────────────┐
│ Auto-Remediation Engine │
├──────────────────────────────────────┤
│ 1. Validate target (whitelist) │
│ 2. Execute repair action │
│ 3. Record result in DB │
│ 4. Notify outcome (Telegram/NATS) │
└──────────────────────────────────────┘
```
---
## 12. 遷移至 AWOOOI 建議
### 12.1 必須遷移 (P0)
| 元件 | 原路徑 | 建議新路徑 |
|------|--------|------------|
| OpenTelemetry 初始化 | `clawbot/app/core/telemetry.py` | `apps/api/src/core/telemetry.py` |
| Prometheus Client | `src/services/prometheus_client.py` | `apps/api/src/services/` |
| Health Routes | `src/api/routes/health.py` | `apps/api/src/routes/health.py` |
| Alert Rules | `docker/prometheus/rules/*.yml` | `ops/prometheus/rules/` |
| Alertmanager Config | `docker/alertmanager/*.yml` | `ops/alertmanager/` |
### 12.2 可選遷移 (P1)
| 元件 | 說明 |
|------|------|
| Grafana Dashboards | 6 個儀表板 JSON |
| Loki + Promtail | Log 聚合 |
| SLA Engine | 升級機制 |
| Alert Aggregator | 告警去重 |
### 12.3 需重構 (P2)
| 元件 | 原因 |
|------|------|
| Remediation Engine | 需適配新的 Multi-Sig 審批流程 |
| On-Call Service | 需整合新的 OpenClaw 通知 |
---
## 附錄: 關鍵設定檔清單
| 設定檔 | 路徑 |
|--------|------|
| Alertmanager 主設定 | `docker/alertmanager/alertmanager.yml` |
| Alertmanager 生產設定 | `infrastructure/alertmanager/alertmanager.yml` |
| Prometheus Alert Rules | `docker/prometheus/rules/alerts.yml` |
| Service Health Rules | `docker/prometheus/rules/service-health-rules.yml` |
| SignOz Alert Rules | `infrastructure/signoz/alert-rules.yaml` |
| Prometheus Scrape Config | `docker/prometheus/prometheus.yml` |
| K8s API Deployment | `infrastructure/kubernetes/base/api-deployment.yaml` |
| Monitoring Cron Jobs | `infrastructure/cron/monitoring-jobs.cron` |
| Auto-Recovery Script | `scripts/auto-recovery.sh` |
---
**盤點完成**: 2026-03-22
**盤點人員**: Claude Code (Monitoring Inventory Agent)

View File

@@ -0,0 +1,131 @@
# Phase 2 Technical Debt - i18n 違憲代碼清單
> **Phase 3 首要清理任務**
> 掃描日期: 2026-03-20
> 總計違規: 40+ 處
---
## 🔴 高優先級 (紅燈)
### 1. agent/approval-card.tsx - 風險等級與資料影響標籤
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 63-81 | `'LOW RISK'`, `'MEDIUM RISK'`, `'HIGH RISK'`, `'CRITICAL'` | 改為 `tRisk('low')` 等 |
| 92-95 | `'NONE'`, `'READ ONLY'`, `'WRITE'`, `'DESTRUCTIVE'` | 改為 `tBlast('none')` 等 |
| 174-251 | `'SIGNATURES'`, `'BLAST RADIUS'`, `'AFFECTED PODS'`, `'EST. DOWNTIME'`, `'RELATED SERVICES'`, `'DATA IMPACT'`, `'DRY-RUN VALIDATION'` | 改為 `t('approval.xxx')` |
| 292 | `'Requested by '` | 改為 `t('requestedBy')` |
### 2. agent/data-pincer.tsx - 狀態標籤
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 50-78 | `'STANDBY'`, `'ANALYZING'`, `'EXECUTING'`, `'AWAITING APPROVAL'`, `'ERROR'` | 改為 `t('status.xxx')` |
### 3. status-orb.tsx - 狀態標籤
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 16-31 | `'Idle'`, `'Thinking'`, `'Executing'`, `'Awaiting Approval'` | 改為 `t('status.xxx')` |
### 4. layout/header.tsx - 連線狀態
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 55-61 | `connectionLabel` 物件: `'Offline'`, `'Connecting...'`, `'LIVE'` 等 | 移至 i18n |
### 5. dashboard/connection-status.tsx - 連線狀態
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 35-41 | `connectionLabels` 物件中英文字串 | 改為 `useTranslations('connection')` |
---
## 🟡 中優先級 (黃燈)
### 6. agent/thinking-terminal.tsx - 終端機 UI
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 58 | `'[ BLAST RADIUS ]'` | 改為 `t('graphRag.blastRadius')` |
| 93-122 | `'[ ROOT CAUSE CHAIN ]'`, `'[ UPSTREAM IMPACT ]'`, `'[ DOWNSTREAM DEPENDENCIES ]'` | 改為對應 i18n keys |
| 162-182 | `'[ FINOPS ANALYSIS ]'`, `'Wasted/mo'`, `'Realizable'`, `'Freed'` | 改為 `t('finops.xxx')` |
| 334-382 | `'AWOOOI Terminal'`, `'v0.1.0 | SSE'`, `'>_ EXECUTING...'`, `'INITIATE SYNC'`, `'Waiting for command...'` | 改為 `t('terminal.xxx')` |
### 7. dashboard/live-host-card.tsx - Baseline 標籤
| 行號 | 違規內容 | 修復方式 |
|------|----------|----------|
| 285 | `'基準線'` (中文硬寫) | 改為 `baselineLabel` prop 或 `t('dashboard.baseline')` |
---
## 🟢 低優先級 (綠燈)
### 8. Locale Hardcoding
| 檔案 | 行號 | 違規內容 | 修復方式 |
|------|------|----------|----------|
| `dashboard/host-card.tsx` | 220-223 | `toLocaleTimeString('zh-TW', ...)` | 改為動態 `params.locale` |
| `dashboard/live-host-card.tsx` | 252-256 | `toLocaleTimeString('zh-TW', ...)` | 改為動態 locale |
### 9. 技術識別符 (保持原樣)
以下為技術識別符,不需 i18n 化:
- 服務名稱: `'Harbor'`, `'GH Runner'`, `'Docker'`
- IP 地址: `'192.168.0.xxx'`
- API 路徑: `/api/v1/xxx`
---
## 修復優先順序
```
Phase 3 Week 1:
├── [P0] agent/approval-card.tsx (20+ 違規)
├── [P0] agent/data-pincer.tsx (5 違規)
├── [P0] status-orb.tsx (4 違規)
└── [P0] connection-status.tsx + header.tsx (10 違規)
Phase 3 Week 2:
├── [P1] agent/thinking-terminal.tsx (15+ 違規)
└── [P1] live-host-card.tsx baseline (1 違規)
Phase 3 Bug Bash:
└── [P2] Locale hardcoding (2 違規)
```
---
## 已修復清單 ✅
| 檔案 | 修復內容 |
|------|----------|
| `sidebar.tsx` | Logo 已套用 `mix-blend-multiply` |
| `sidebar.tsx` | `v1.0.0` / `Production` 改為 `tBrand('version')` / `tBrand('environment')` |
| `demo/page.tsx` | `useMockApprovalData` 全面 i18n 化 |
| `demo/page.tsx` | `createTestApprovalWithConfig` 使用 i18n config |
| `approval-card.tsx` (新版) | 已完全 i18n 化 |
| `host-card.tsx` | CPU/Memory 標籤已 i18n 化 |
---
---
## Phase 6 架構債 (新增 2026-03-22)
> **來源**: `docs/ARCHITECTURE_CODE_REVIEW.md`
| 類別 | 項目 | 現狀 | 目標 |
|------|------|------|------|
| 狀態持久化 | MultiSigEngine | In-Memory | Redis Hash |
| 狀態持久化 | TopologyGraph | In-Memory | Neo4j |
| 水平擴展 | SSE 廣播 | 單實例 | Redis Pub/Sub |
詳見 `docs/ARCHITECTURE_CODE_REVIEW.md` 第 5 章。
---
*最後更新: 2026-03-22*

View File

@@ -0,0 +1,198 @@
# AWOOOI 技術文檔完整清單
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CTO
> **用途**: 追蹤各團隊必須產出的技術文檔
---
## 文檔分類
| 類別 | 說明 | 主要負責人 |
|------|------|-----------|
| **ADR** | 架構決策記錄 | CTO |
| **SOP** | 標準作業程序 | 各單位 |
| **SPEC** | 技術規格文件 | CTO / CPO |
| **DIAGRAM** | 架構圖 / 流程圖 | CTO / CIO |
| **RUNBOOK** | 運維手冊 | CIO |
| **SECURITY** | 安全文檔 | CISO |
---
## CTO 必須產出文檔
### 架構決策記錄 (ADR)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| ADR-001 | MCP Protocol 採用 | ✅ | `docs/adr/ADR-001-mcp-protocol-adoption.md` |
| ADR-002 | Nothing.tech 設計系統 | ✅ | `docs/adr/ADR-002-nothing-tech-design-system.md` |
| ADR-003 | leWOOOgo 模組架構 | ✅ | `docs/adr/ADR-003-lewooogo-module-architecture.md` |
| ADR-004 | Zustand 狀態管理 | ✅ | `docs/adr/ADR-004-state-management.md` |
| ADR-005 | BFF 閘道架構 | ✅ | `docs/adr/ADR-005-bff-architecture.md` |
| ADR-006 | AI 降級備援策略 | ⏳ | `docs/adr/ADR-006-ai-fallback-strategy.md` |
| ADR-007 | 資料保留策略 | ⏳ | `docs/adr/ADR-007-data-retention-policy.md` |
### 技術規格 (SPEC)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| SPEC-001 | API 開發 SOP | ✅ | `docs/api/API_DEVELOPMENT_SOP.md` |
| SPEC-002 | OpenAPI 規格 | ✅ | `docs/api/api-contract.yaml` |
| SPEC-003 | SSE 串流規格 | ⏳ | `docs/api/SSE_SPECIFICATION.md` |
| SPEC-004 | 快取策略規格 | ⏳ | `docs/api/CACHE_STRATEGY.md` |
| SPEC-005 | 資料庫 Schema | ⏳ | `docs/database/SCHEMA.md` |
### 架構圖 (DIAGRAM)
| ID | 圖表名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| DIAG-001 | 系統架構總覽圖 | ⏳ | `docs/diagrams/system-architecture.png` |
| DIAG-002 | 資料流程圖 | ⏳ | `docs/diagrams/data-flow.png` |
| DIAG-003 | API 序列圖 | ⏳ | `docs/diagrams/api-sequence.png` |
| DIAG-004 | 部署架構圖 | ⏳ | `docs/diagrams/deployment-architecture.png` |
| DIAG-005 | AI 降級流程圖 | ⏳ | `docs/diagrams/ai-fallback-flow.png` |
---
## CPO 必須產出文檔
### 設計規格 (SPEC)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| SPEC-UI-001 | 原子組件庫規格 | ✅ | `docs/design/COMPONENT_LIBRARY.md` |
| SPEC-UI-002 | Design Tokens 定義 | ⏳ | `docs/design/DESIGN_TOKENS.md` |
| SPEC-UI-003 | 頁面線稿清單 | ⏳ | `docs/design/WIREFRAMES.md` |
| SPEC-UI-004 | i18n 字典檔結構 | ⏳ | `docs/design/I18N_STRUCTURE.md` |
| SPEC-UI-005 | 無障礙規範 | ⏳ | `docs/design/ACCESSIBILITY.md` |
### 流程圖 (DIAGRAM)
| ID | 圖表名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| DIAG-UI-001 | 用戶流程圖 | ⏳ | `docs/diagrams/user-flow.png` |
| DIAG-UI-002 | 頁面導航圖 | ⏳ | `docs/diagrams/navigation-map.png` |
| DIAG-UI-003 | 組件關係圖 | ⏳ | `docs/diagrams/component-hierarchy.png` |
---
## CIO 必須產出文檔
### 基礎設施規格 (SPEC)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| SPEC-INFRA-001 | 四主機架構說明 | ⏳ | `docs/infrastructure/HOSTS.md` |
| SPEC-INFRA-002 | K8s Namespace 規格 | ⏳ | `docs/infrastructure/K8S_NAMESPACES.md` |
| SPEC-INFRA-003 | Nginx 路由配置 | ⏳ | `docs/infrastructure/NGINX_CONFIG.md` |
| SPEC-INFRA-004 | NetworkPolicy 規格 | ⏳ | `docs/infrastructure/NETWORK_POLICY.md` |
| SPEC-INFRA-005 | 資源配額設定 | ⏳ | `docs/infrastructure/RESOURCE_QUOTAS.md` |
### 運維手冊 (RUNBOOK)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| RUNBOOK-001 | 部署操作手冊 | ⏳ | `docs/runbook/DEPLOYMENT.md` |
| RUNBOOK-002 | 回滾操作手冊 | ⏳ | `docs/runbook/ROLLBACK.md` |
| RUNBOOK-003 | 災難恢復手冊 | ⏳ | `docs/runbook/DISASTER_RECOVERY.md` |
| RUNBOOK-004 | 監控告警手冊 | ⏳ | `docs/runbook/MONITORING.md` |
| RUNBOOK-005 | 日誌查詢手冊 | ⏳ | `docs/runbook/LOGGING.md` |
### 架構圖 (DIAGRAM)
| ID | 圖表名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| DIAG-INFRA-001 | 網路拓撲圖 | ⏳ | `docs/diagrams/network-topology.png` |
| DIAG-INFRA-002 | K8s 部署圖 | ⏳ | `docs/diagrams/k8s-deployment.png` |
| DIAG-INFRA-003 | 監控架構圖 | ⏳ | `docs/diagrams/monitoring-architecture.png` |
---
## CISO 必須產出文檔
### 安全文檔 (SECURITY)
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| SEC-001 | RBAC 權限架構 | ✅ | `docs/security/RBAC_SCHEMA.md` |
| SEC-002 | 機密參考指南 | ✅ | `docs/security/SECRETS_REFERENCE.md` |
| SEC-003 | 威脅模型分析 | ⏳ | `docs/security/THREAT_MODEL.md` |
| SEC-004 | 滲透測試報告 | ⏳ | `docs/security/PENTEST_REPORT.md` |
| SEC-005 | 安全稽核清單 | ⏳ | `docs/security/AUDIT_CHECKLIST.md` |
| SEC-006 | 日誌脫敏規範 | ⏳ | `docs/security/LOG_SANITIZATION.md` |
### 流程圖 (DIAGRAM)
| ID | 圖表名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| DIAG-SEC-001 | 認證流程圖 | ⏳ | `docs/diagrams/auth-flow.png` |
| DIAG-SEC-002 | 簽核流程圖 | ⏳ | `docs/diagrams/approval-flow.png` |
| DIAG-SEC-003 | 資料脫敏流程 | ⏳ | `docs/diagrams/data-masking-flow.png` |
---
## 共用文檔
### 專案管理
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| PM-001 | WBS 工作分解 | ✅ | `docs/architecture/WBS.md` |
| PM-002 | LOGBOOK 進度軌跡 | ✅ | `docs/LOGBOOK.md` |
| PM-003 | 依賴清單 | ✅ | `docs/DEPENDENCIES.md` |
| PM-004 | 架構盤點清單 | ✅ | `docs/ARCHITECTURE_INVENTORY.md` |
### 會議記錄
| ID | 文檔名稱 | 狀態 | 路徑 |
|----|---------|------|------|
| MTG-001 | Phoenix Rising 戰略會議 | ✅ | `docs/meetings/2026-03-20_PHOENIX_RISING_STRATEGY.md` |
| MTG-002 | 前端重構戰略會議 | ✅ | `docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md` |
---
## 配置版本控制清單
> **CEO 指示 #8**: 所有服務、監控、工具、網路配置必須版本控制
| 配置類型 | 路徑 | 負責人 |
|---------|------|--------|
| K8s Deployment | `k8s/deployments/` | CIO |
| K8s Services | `k8s/services/` | CIO |
| K8s ConfigMaps | `k8s/configmaps/` | CIO |
| K8s Secrets (模板) | `k8s/secrets/` | CIO |
| K8s NetworkPolicy | `k8s/network-policies/` | CIO |
| K8s ResourceQuota | `k8s/quotas/` | CIO |
| Nginx 配置 | `k8s/nginx/` | CIO |
| Prometheus Rules | `k8s/monitoring/prometheus/` | CIO |
| Alertmanager 配置 | `k8s/monitoring/alertmanager/` | CIO |
| GitHub Actions | `.github/workflows/` | CTO |
| Dockerfile | `apps/*/Dockerfile` | CTO |
| Docker Compose | `docker-compose.*.yml` | CTO |
---
## 文檔完成度統計
| 單位 | 總數 | 完成 | 進行中 | 完成率 |
|------|------|------|--------|--------|
| CTO | 17 | 6 | 11 | 35% |
| CPO | 8 | 1 | 7 | 13% |
| CIO | 13 | 0 | 13 | 0% |
| CISO | 9 | 2 | 7 | 22% |
| 共用 | 6 | 6 | 0 | 100% |
| **總計** | **53** | **15** | **38** | **28%** |
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此文件由 CTO 維護,每週 Review 更新文檔完成進度。*

View File

@@ -0,0 +1,130 @@
# ADR-001: MCP Protocol 採用
> **狀態**: Accepted
> **日期**: 2026-03-19
> **決策者**: CTO + CEO
## 背景
AWOOOI 的 leWOOOgo Engine 需要與大量外部工具整合 (K8s, SSH, AWS/GCP, Database, Notification 等)。傳統做法是針對每個服務寫專屬 Adapter耗時且難以維護。
Anthropic 的 **Model Context Protocol (MCP)** 提供標準化的 AI-Tool 溝通協議,已有數百個社群 MCP Server 可直接使用。
## 決策
**採用 MCP 作為 leWOOOgo BRAIN ↔ ACTION 的標準通訊協議**
```
┌─────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────┤
│ │
│ 🧱 INPUT ──→ 🧠 BRAIN ──→ 📢 OUTPUT │
│ │ │
│ ↓ (MCP Protocol) │
│ 🔧 ACTION ←→ [MCP Servers] │
│ │ │
│ ↓ │
│ 📊 DATA │
│ │
└─────────────────────────────────────────────────────────────┘
```
### MCP Server 分類
| 類別 | 範例 MCP Server | 用途 |
|------|----------------|------|
| **Infrastructure** | kubernetes, docker, ssh | 基礎設施操作 |
| **Cloud** | aws, gcp, azure | 雲端資源管理 |
| **Database** | postgres, redis, mongodb | 資料存取 |
| **Notification** | slack, telegram, email | 訊息發送 |
| **Monitoring** | prometheus, grafana | 監控查詢 |
| **Security** | vault, trivy | 安全掃描 |
### leWOOOgo 整合方式
```typescript
// packages/lewooogo-brain/src/mcp-bridge.ts
interface MCPBridge {
// 動態載入 MCP Server
loadServer(serverName: string): Promise<MCPServer>
// 執行 MCP Tool
callTool(server: string, tool: string, params: object): Promise<MCPResult>
// 列出可用工具
listTools(server: string): Promise<MCPToolDefinition[]>
}
```
## 理由
### 1. 生態系統成熟
| 指標 | 數值 |
|------|------|
| 社群 MCP Server | 300+ |
| 官方維護 Server | 20+ |
| 協議版本 | Stable (2024-11) |
### 2. 與 Claude 深度整合
AWOOOI 使用 Claude 作為主要 LLMMCP 是 Anthropic 原生協議,整合最順暢。
### 3. 節省開發時間
| 方案 | 預估工時 |
|------|---------|
| 自建 50 個 Adapter | 500+ 小時 |
| 採用 MCP + 自訂 5 個 | 50 小時 |
### 4. 標準化介面
所有工具使用相同的 JSON-RPC 介面,簡化 BRAIN 邏輯。
## 後果
### 優點
- **即時獲得** 數百種工具能力
- **社群維護** 減輕維護負擔
- **標準協議** 簡化架構設計
- **Claude 原生** 最佳 LLM 整合體驗
### 缺點
- **依賴外部** 需信任社群 MCP Server 品質
- **協議鎖定** 若 MCP 標準改變需跟進
### 風險
| 風險 | 緩解措施 |
|------|---------|
| MCP Server 品質不一 | 建立內部審核清單,只允許白名單 Server |
| 安全漏洞 | 所有 MCP 調用經過 Privacy Shield 脫敏 |
| 效能瓶頸 | 關鍵路徑自建 Adapter非關鍵走 MCP |
### 例外情況
以下場景**不使用**社群 MCP Server改自建 leWOOOgo Adapter
1. **核心業務邏輯** - 如 ClawBot Triage Engine
2. **高頻調用** - 如 Redis Cache (效能考量)
3. **機敏操作** - 如 K8s Delete (需額外授權)
## 實施計畫
| Phase | 任務 | 時程 |
|-------|------|------|
| 0 | 定義 MCPBridge 介面 | Week 1 |
| 1 | 整合 5 個核心 MCP Server | Week 2-3 |
| 2 | 建立 MCP Server 白名單機制 | Week 3 |
| 3 | Privacy Shield 整合 | Week 4 |
## 參考
- [MCP Official Spec](https://spec.modelcontextprotocol.io/)
- [MCP Server Registry](https://github.com/modelcontextprotocol/servers)
- [Anthropic MCP Announcement](https://www.anthropic.com/news/model-context-protocol)
- 會議記錄: `docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md`

View File

@@ -0,0 +1,191 @@
# ADR-002: Nothing.tech 設計系統採用
> **狀態**: Accepted
> **日期**: 2026-03-19
> **決策者**: CPO + CTO
## 背景
AWOOOI 需要統一的視覺語言,區隔於傳統 Dashboard 風格。CEO 在戰略會議中指定採用 **Nothing.tech** 風格:點陣字體 + 毛玻璃效果 + 極簡黑白。
此風格強調「科技感」與「未來感」,符合 AI-First 運維平台定位。
## 決策
**採用 Nothing.tech 風格作為 AWOOOI 設計系統基礎**
### 色彩系統
```css
:root {
/* 主色 */
--nothing-black: #000000;
--nothing-white: #FFFFFF;
--nothing-red: #D71921; /* 告警、錯誤、Critical */
/* 灰階 */
--nothing-gray-50: #FAFAFA;
--nothing-gray-100: #F5F5F5;
--nothing-gray-200: #E5E5E5;
--nothing-gray-300: #D4D4D4;
--nothing-gray-400: #A3A3A3;
--nothing-gray-500: #737373;
--nothing-gray-600: #525252;
--nothing-gray-700: #404040;
--nothing-gray-800: #1A1A1A;
--nothing-gray-900: #0A0A0A;
/* 語意色 */
--status-healthy: #22C55E; /* Green - 正常 */
--status-warning: #F59E0B; /* Amber - 警告 */
--status-critical: #D71921; /* Nothing Red - 嚴重 */
--status-unknown: #6B7280; /* Gray - 未知 */
}
```
### 字體系統
| 用途 | 字體 | Fallback |
|------|------|----------|
| **AI 介面** | NDot 57 | JetBrains Mono, monospace |
| **標題** | NDot 47 | Inter, system-ui |
| **內文** | Inter | system-ui, sans-serif |
| **程式碼** | JetBrains Mono | Fira Code, monospace |
```css
:root {
--font-display: "NDot", "JetBrains Mono", monospace;
--font-heading: "NDot", "Inter", system-ui;
--font-body: "Inter", system-ui, sans-serif;
--font-mono: "JetBrains Mono", "Fira Code", monospace;
}
```
### 毛玻璃效果 (Glassmorphism)
```css
.glass-panel {
background: rgba(255, 255, 255, 0.05);
backdrop-filter: blur(20px);
-webkit-backdrop-filter: blur(20px);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 16px;
}
.glass-panel-dark {
background: rgba(0, 0, 0, 0.6);
backdrop-filter: blur(20px);
border: 1px solid rgba(255, 255, 255, 0.05);
}
```
### 動效規範
| 效果 | 用途 | Duration |
|------|------|----------|
| **呼吸燈** | AI 狀態指示 | 2s ease-in-out |
| **打字機** | ClawBot 回應 | 30ms/字元 |
| **淡入** | 卡片載入 | 200ms ease-out |
| **滑入** | 側邊欄 | 300ms cubic-bezier |
```css
@keyframes breathe {
0%, 100% { opacity: 0.4; }
50% { opacity: 1; }
}
.ai-status-indicator {
animation: breathe 2s ease-in-out infinite;
}
```
## 理由
### 1. 品牌差異化
傳統運維 Dashboard 使用 Material/Ant Design視覺同質化嚴重。Nothing.tech 風格能立即建立品牌辨識度。
### 2. AI-First 視覺語言
點陣字體與極簡風格傳達「精準」與「科技感」,符合 AI 運維平台定位。
### 3. 技術可行性
| 需求 | 實現方式 |
|------|---------|
| 點陣字體 | NDot (需購買) 或 Dot Matrix (免費替代) |
| 毛玻璃 | CSS backdrop-filter (現代瀏覽器支援) |
| 深色主題 | Tailwind dark mode |
## 後果
### 優點
- **品牌辨識度** 強烈視覺風格
- **AI 定位** 符合 Agent-Centric 理念
- **現代感** 吸引科技用戶
### 缺點
- **字體成本** NDot 需商業授權
- **相容性** 舊瀏覽器不支援 backdrop-filter
### 風險
| 風險 | 緩解措施 |
|------|---------|
| NDot 授權費用 | 初期用 JetBrains Mono 替代,驗證後再購買 |
| Safari 毛玻璃問題 | 加入 `-webkit-backdrop-filter` prefix |
| 可讀性 | 限制點陣字體於標題,內文用 Inter |
## Tailwind 配置
```javascript
// tailwind.config.js
module.exports = {
theme: {
extend: {
colors: {
nothing: {
black: '#000000',
white: '#FFFFFF',
red: '#D71921',
gray: {
50: '#FAFAFA',
100: '#F5F5F5',
200: '#E5E5E5',
300: '#D4D4D4',
400: '#A3A3A3',
500: '#737373',
600: '#525252',
700: '#404040',
800: '#1A1A1A',
900: '#0A0A0A',
}
},
status: {
healthy: '#22C55E',
warning: '#F59E0B',
critical: '#D71921',
unknown: '#6B7280',
}
},
fontFamily: {
display: ['NDot', 'JetBrains Mono', 'monospace'],
heading: ['NDot', 'Inter', 'system-ui'],
body: ['Inter', 'system-ui', 'sans-serif'],
mono: ['JetBrains Mono', 'Fira Code', 'monospace'],
},
backdropBlur: {
glass: '20px',
}
}
}
}
```
## 參考
- [Nothing.tech Official](https://nothing.tech/)
- [NDot Font](https://pangrampangram.com/products/ndot)
- 會議記錄: `docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md`

View File

@@ -0,0 +1,244 @@
# ADR-003: leWOOOgo 模組化架構
> **狀態**: Accepted
> **日期**: 2026-03-19
> **決策者**: CTO + CEO
## 背景
AWOOOI 需要高度模組化的架構讓開發者能像組樂高一樣快速組合功能。CEO 命名此引擎為 **leWOOOgo** (樂高 + WOOO)。
傳統 monolithic 架構難以擴展plugin 架構則能支援生態系統發展。
## 決策
**採用六大積木類別的 Plugin 架構**
```
┌─────────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 🧱 INPUT ──────→ 🧠 BRAIN ──────→ 📢 OUTPUT │
│ (觸發器) (AI 處理) (通知) │
│ │ │ │ │
│ │ ↓ │ │
│ │ 🔧 ACTION │ │
│ │ (執行器) │ │
│ │ │ │ │
│ └───────→ 📊 DATA ←───────────────┘ │
│ (儲存) │
│ │
│ 🎨 UI │
│ (介面元件) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### 六大積木類別
| 類別 | 介面 | 用途 | 範例 |
|------|------|------|------|
| **INPUT** | `TriggerPlugin` | 觸發工作流 | Webhook, Cron, Alert, Email |
| **BRAIN** | `AgentProvider` | AI 處理決策 | LLM Router, RAG, Triage, MCP |
| **OUTPUT** | `NotificationChannel` | 發送通知 | Telegram, Slack, LINE, Email |
| **ACTION** | `ActionExecutor` | 執行操作 | K8s, SSH, Docker, API Call |
| **DATA** | `DataAdapter` | 資料存取 | PostgreSQL, Redis, S3, Vector |
| **UI** | `WidgetComponent` | 介面元件 | Card, Chart, Timeline, Status |
### 核心介面定義
```typescript
// packages/lewooogo-core/src/interfaces/plugin.ts
/** 所有 Plugin 的基礎介面 */
interface LeWOOOgoPlugin {
readonly id: string
readonly name: string
readonly version: string
readonly category: 'INPUT' | 'BRAIN' | 'OUTPUT' | 'ACTION' | 'DATA' | 'UI'
initialize(): Promise<void>
healthCheck(): Promise<HealthStatus>
shutdown(): Promise<void>
}
/** INPUT 觸發器 */
interface TriggerPlugin extends LeWOOOgoPlugin {
category: 'INPUT'
subscribe(handler: TriggerHandler): Unsubscribe
getSchema(): TriggerSchema
}
/** BRAIN AI 處理器 */
interface AgentProvider extends LeWOOOgoPlugin {
category: 'BRAIN'
process(input: AgentInput): Promise<AgentOutput>
getCapabilities(): AgentCapability[]
}
/** OUTPUT 通知頻道 */
interface NotificationChannel extends LeWOOOgoPlugin {
category: 'OUTPUT'
send(message: NotificationMessage): Promise<SendResult>
getTemplates(): NotificationTemplate[]
}
/** ACTION 執行器 */
interface ActionExecutor extends LeWOOOgoPlugin {
category: 'ACTION'
execute(action: ActionRequest): Promise<ActionResult>
dryRun(action: ActionRequest): Promise<DryRunResult>
rollback(executionId: string): Promise<RollbackResult>
}
/** DATA 資料適配器 */
interface DataAdapter extends LeWOOOgoPlugin {
category: 'DATA'
connect(): Promise<void>
query<T>(request: QueryRequest): Promise<T>
disconnect(): Promise<void>
}
/** UI 介面元件 */
interface WidgetComponent extends LeWOOOgoPlugin {
category: 'UI'
render(props: WidgetProps): ReactNode
getConfigSchema(): JSONSchema
}
```
### 資料夾結構
```
packages/
├── lewooogo-core/ # 核心引擎
│ ├── src/
│ │ ├── interfaces/ # 六大介面定義
│ │ ├── registry/ # Plugin 註冊中心
│ │ ├── pipeline/ # 工作流引擎
│ │ └── utils/ # 共用工具
│ └── package.json
├── lewooogo-input/ # INPUT 積木
│ ├── src/
│ │ ├── webhook/
│ │ ├── cron/
│ │ ├── prometheus-alert/
│ │ └── email-trigger/
│ └── package.json
├── lewooogo-brain/ # BRAIN 積木
│ ├── src/
│ │ ├── llm-router/ # LLM 路由器
│ │ ├── mcp-bridge/ # MCP 整合 (ADR-001)
│ │ ├── triage-engine/ # 告警分級
│ │ └── rag-provider/ # RAG 檢索
│ └── package.json
├── lewooogo-output/ # OUTPUT 積木
│ ├── src/
│ │ ├── telegram/
│ │ ├── slack/
│ │ ├── line/
│ │ └── email/
│ └── package.json
├── lewooogo-action/ # ACTION 積木
│ ├── src/
│ │ ├── kubernetes/
│ │ ├── ssh/
│ │ ├── docker/
│ │ └── http-api/
│ └── package.json
├── lewooogo-data/ # DATA 積木
│ ├── src/
│ │ ├── postgres/
│ │ ├── redis/
│ │ ├── s3/
│ │ └── vector-db/
│ └── package.json
└── lewooogo-ui/ # UI 積木
├── src/
│ ├── cards/
│ ├── charts/
│ ├── timeline/
│ └── status-indicators/
└── package.json
```
## 理由
### 1. 開發者體驗 (DX)
| 傳統方式 | leWOOOgo 方式 |
|---------|--------------|
| 修改核心程式碼 | npm install + 註冊 |
| 重新部署整體 | 熱插拔 Plugin |
| 閱讀大量文檔 | 統一介面 + TypeScript |
### 2. 生態系統潛力
標準介面允許第三方開發 Plugin形成市場。
### 3. 測試隔離
每個 Plugin 獨立測試,不影響核心引擎。
## 後果
### 優點
- **模組化** 功能獨立開發部署
- **可擴展** 第三方生態系統
- **可測試** 單元測試隔離
- **可維護** 責任分離清晰
### 缺點
- **初期成本** 需建立完整介面規範
- **效能開銷** Plugin 動態載入有成本
- **版本管理** 多 package 需 monorepo 工具
### 風險
| 風險 | 緩解措施 |
|------|---------|
| 介面設計錯誤 | Phase 0 充分討論 + 早期 POC 驗證 |
| Plugin 衝突 | Plugin Registry 管理 + 命名空間隔離 |
| 效能問題 | 關鍵路徑避免過度抽象,效能測試 |
## Monorepo 工具
採用 **pnpm workspace** + **Turborepo**
```yaml
# pnpm-workspace.yaml
packages:
- 'apps/*'
- 'packages/*'
```
```json
// turbo.json
{
"pipeline": {
"build": {
"dependsOn": ["^build"],
"outputs": ["dist/**"]
},
"test": {
"dependsOn": ["build"]
}
}
}
```
## 參考
- [Turborepo](https://turbo.build/)
- [pnpm Workspaces](https://pnpm.io/workspaces)
- ADR-001: MCP Protocol 採用
- 會議記錄: `docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md`

View File

@@ -0,0 +1,268 @@
# ADR-004: 前端狀態管理統一採用 Zustand
> **狀態**: Accepted
> **日期**: 2026-03-19
> **更新日期**: 2026-03-20 (Gate 0 驗證完成)
> **決策者**: CTO + CPO
---
## Gate 0 里程碑驗證 (2026-03-20)
**Tracer Bullet 測試通過!** 以下實作已驗證:
| 元件 | Store | 狀態 |
|------|-------|------|
| Dashboard SSE | `dashboard.store.ts` | ✅ 即時同步 |
| Approval Multi-Sig | `approval.store.ts` | ✅ 狀態機運作正常 |
| HITL 簽核流程 | 整合 API `/approvals/{id}/approve` | ✅ TOCTOU 防護驗證 |
---
## 背景
AWOOOI 的前端 (Agent Hub) 需要處理高度頻繁的狀態更新,包括:
- ClawBot 的 SSE 思考串流 (`/agent/thinking`)
- 即時狀態燈 (Data Pincer 呼吸動畫)
- 待授權卡片的佇列管理 (`/approvals`)
- Plugin 健康狀態即時更新
我們需要一個輕量、無需過度樣板代碼 (Boilerplate),且能與 React 18 完美協作的狀態管理庫。
## 決策
**全面採用 Zustand 作為全域狀態管理工具**
```typescript
// stores/agent.store.ts
import { create } from 'zustand'
import { subscribeWithSelector } from 'zustand/middleware'
interface AgentState {
status: 'idle' | 'thinking' | 'executing' | 'waiting_approval'
thinkingStream: string[]
pendingApprovals: Approval[]
// Actions
setStatus: (status: AgentState['status']) => void
appendThinking: (chunk: string) => void
addApproval: (approval: Approval) => void
}
export const useAgentStore = create<AgentState>()(
subscribeWithSelector((set) => ({
status: 'idle',
thinkingStream: [],
pendingApprovals: [],
setStatus: (status) => set({ status }),
appendThinking: (chunk) => set((s) => ({
thinkingStream: [...s.thinkingStream, chunk]
})),
addApproval: (approval) => set((s) => ({
pendingApprovals: [...s.pendingApprovals, approval]
})),
}))
)
```
### 狀態分層策略
| 層級 | 工具 | 用途 |
|------|------|------|
| **全域 UI 狀態** | Zustand | Agent 狀態、Sidebar 開關、Theme |
| **伺服器資料快取** | TanStack Query | API 回應快取、自動重新驗證 |
| **表單狀態** | React Hook Form | 表單驗證、欄位狀態 |
| **元件局部狀態** | useState | 簡單 UI 切換 |
### 禁止事項
```typescript
// ❌ 禁止Redux
import { createStore } from 'redux'
// ❌ 禁止Context API 做複雜狀態管理
const GlobalContext = createContext<ComplexState>(...)
// ❌ 禁止:單一巨大 Store
const useGodStore = create(() => ({
agent: ...,
plugins: ...,
pipelines: ..., // 太多!
}))
// ✅ 正確Slice Pattern 分拆
const useAgentStore = create(...)
const usePluginStore = create(...)
const usePipelineStore = create(...)
```
## 理由
### 1. 效能優勢
| 特性 | Redux | Zustand |
|------|-------|---------|
| Bundle Size | ~7KB | ~1KB |
| Boilerplate | 高 | 極低 |
| Re-render 控制 | 需 memo/selector | 內建 selector |
| SSE/WebSocket | 需 middleware | 原生支援 |
### 2. SSE 整合範例
```typescript
// hooks/useAgentThinking.ts
export function useAgentThinking() {
const appendThinking = useAgentStore((s) => s.appendThinking)
useEffect(() => {
const eventSource = new EventSource('/v1/agent/thinking')
eventSource.onmessage = (event) => {
appendThinking(event.data) // 直接更新 Zustand
}
return () => eventSource.close()
}, [appendThinking])
}
```
### 3. TanStack Query 協作
```typescript
// hooks/useApprovals.ts
export function useApprovals() {
return useQuery({
queryKey: ['approvals', 'pending'],
queryFn: () => api.listApprovals({ status: 'pending' }),
refetchInterval: 5000, // 每 5 秒輪詢
})
}
```
## 後果
### 優點
- **極度輕量** 不增加 bundle 負擔
- **高頻更新** 完美處理 SSE/WebSocket 串流
- **簡單 API** 降低學習曲線
- **TypeScript 友善** 完整型別推導
### 缺點
- **生態較小** 相比 Redux 社群資源較少
- **DevTools** 功能不如 Redux DevTools 強大
### 風險
| 風險 | 緩解措施 |
|------|---------|
| Store 肥大化 | 強制執行 Slice PatternCode Review 把關 |
| 狀態同步錯誤 | 搭配 TanStack Query 管理伺服器狀態 |
---
## Gate 0 實作細節
### 1. Dashboard SSE Store
```typescript
// stores/dashboard.store.ts
interface DashboardState {
hosts: HostStatus[]
connectionStatus: 'connecting' | 'connected' | 'disconnected' | 'error'
lastUpdate: Date | null
// SSE 控制
connect: (apiUrl: string) => void
disconnect: () => void
}
export const useDashboardStore = create<DashboardState>((set, get) => ({
hosts: [],
connectionStatus: 'disconnected',
lastUpdate: null,
connect: (apiUrl) => {
const eventSource = new EventSource(`${apiUrl}/api/v1/dashboard/stream`)
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data)
set({ hosts: data.hosts, lastUpdate: new Date() })
}
eventSource.onerror = () => set({ connectionStatus: 'error' })
eventSource.onopen = () => set({ connectionStatus: 'connected' })
},
disconnect: () => {
// AbortController cleanup
}
}))
```
### 2. Approval Multi-Sig 狀態機
```typescript
// stores/approval.store.ts
interface ApprovalState {
pendingApprovals: Approval[]
selectedApproval: Approval | null
signingStatus: 'idle' | 'signing' | 'success' | 'error'
// Actions
signApproval: (id: string, userId: string, role: string) => Promise<void>
refreshApprovals: () => Promise<void>
}
// 狀態機轉換圖
// pending → (簽核) → pending (需更多簽章)
// pending → (簽核) → approved (達到閾值)
// pending → (拒絕) → rejected
// pending → (TOCTOU) → voided (資源狀態改變)
```
### 3. SSE + Zustand 整合模式
**企業級 SSE 最佳實踐:**
| 特性 | 實作 |
|------|------|
| **Buffer** | 累積 5 秒內的更新,批次 setState |
| **AbortController** | 元件 unmount 時正確關閉連線 |
| **Reconnection** | 指數退避重連 (1s → 2s → 4s → max 30s) |
| **Heartbeat** | 每 30 秒 ping超時則重連 |
```typescript
// 企業級 SSE Hook 範例
function useSSE(url: string) {
const abortControllerRef = useRef<AbortController>()
const bufferRef = useRef<HostStatus[]>([])
useEffect(() => {
abortControllerRef.current = new AbortController()
const flushBuffer = setInterval(() => {
if (bufferRef.current.length > 0) {
useDashboardStore.setState({ hosts: bufferRef.current })
bufferRef.current = []
}
}, 5000)
return () => {
abortControllerRef.current?.abort()
clearInterval(flushBuffer)
}
}, [url])
}
```
---
## 參考
- [Zustand](https://zustand-demo.pmnd.rs/)
- [TanStack Query](https://tanstack.com/query)
- ADR-002: Nothing.tech 設計系統 (動畫需求)
- [approvals-contract.yaml](../api/approvals-contract.yaml) - API 契約定義

View File

@@ -0,0 +1,178 @@
# ADR-005: 導入 BFF (Backend-For-Frontend) API 閘道模式
> **狀態**: Accepted
> **日期**: 2026-03-19
> **決策者**: CTO + CIO
## 背景
AWOOOI 的底層是由 leWOOOgo Engine 驅動的微服務/Plugin 架構。如果讓 Next.js 前端直接呼叫:
- 多個分散的 Plugin API
- Ollama / Claude API
- PostgreSQL / Redis
- K8s API
會導致:
1. 前端邏輯過於肥大
2. 極高的資安外洩風險
3. 難以實施統一的身分驗證與權限控制
## 決策
**強制實施 BFF (Backend-For-Frontend) 架構**
```
┌─────────────────────────────────────────────────────────────────┐
│ AWOOOI 架構 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────────────┐ │
│ │ Next.js │ ──────→ │ FastAPI BFF │ │
│ │ 前端 │ HTTPS │ Gateway │ │
│ └─────────┘ └────────┬────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ↓ ↓ ↓ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ leWOOOgo │ │ ClawBot │ │ PostgreSQL │ │
│ │ Plugins │ │ (Ollama) │ │ Redis │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ════════════════════════════════════════════════════════════ │
│ DMZ (前端無法直達) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### 核心規則
| 規則 | 說明 |
|------|------|
| **單一入口** | 前端只能打 `https://api.awoooi.wooo.work/v1/*` |
| **禁止直連** | 前端禁止直連 PostgreSQL、Redis、K8s、Ollama |
| **身分驗證** | 所有請求經 BFF JWT 驗證 |
| **資料脫敏** | Privacy Shield 在 BFF 層攔截機敏資料 |
### BFF 層職責
```python
# apps/api/src/routes/agent.py
from fastapi import APIRouter, Depends
from src.auth import require_auth
from src.privacy import PrivacyShield
from src.services import clawbot_client, approval_service
router = APIRouter(prefix="/agent", tags=["Agent"])
@router.post("/chat")
async def chat_with_agent(
request: ChatRequest,
user: User = Depends(require_auth), # 1. 身分驗證
):
# 2. 資料脫敏
sanitized = PrivacyShield.sanitize(request.message)
# 3. 聚合多個後端服務
response = await clawbot_client.chat(sanitized, user_id=user.id)
# 4. 判斷是否需要 Approval
if response.requires_action:
approval = await approval_service.create(
action=response.suggested_action,
user_id=user.id,
)
response.approval_id = approval.id
return response
```
### 禁止事項
```typescript
// ❌ 禁止:前端直連資料庫
const client = new Client({ connectionString: 'postgresql://...' })
// ❌ 禁止:前端直接呼叫 Ollama
const response = await fetch('http://192.168.0.188:11434/api/generate')
// ❌ 禁止:前端直接操作 K8s
const k8s = new KubeConfig()
// ✅ 正確:透過 BFF API
const response = await fetch('https://api.awoooi.wooo.work/v1/agent/chat', {
method: 'POST',
headers: { 'Authorization': `Bearer ${token}` },
body: JSON.stringify({ message: '...' }),
})
```
## 理由
### 1. Zero Trust 網路隔離
| 元件 | 網路可達性 |
|------|-----------|
| Next.js (前端) | Public Internet |
| FastAPI BFF | DMZ (僅接受前端) |
| PostgreSQL | Internal Only |
| Redis | Internal Only |
| Ollama | Internal Only |
| K8s API | Internal Only |
### 2. 統一關注點
| 關注點 | 處理位置 |
|--------|---------|
| 身分驗證 | BFF Middleware |
| 權限檢查 | BFF Dependency |
| 請求限流 | BFF / Nginx |
| 資料脫敏 | BFF Privacy Shield |
| 審計日誌 | BFF Logger |
### 3. 資料聚合
```python
# 一個 API 呼叫 = 多個後端服務聚合
@router.get("/dashboard")
async def get_dashboard(user: User = Depends(require_auth)):
# 平行取得多個資料源
agent_status, pending_approvals, recent_alerts = await asyncio.gather(
clawbot_client.get_status(),
approval_service.list_pending(user.id),
alert_service.list_recent(limit=10),
)
return DashboardResponse(
agent=agent_status,
approvals=pending_approvals,
alerts=recent_alerts,
)
```
## 後果
### 優點
- **Zero Trust** 真正的網路隔離
- **前端精簡** 只負責渲染 UI
- **統一治理** 所有安全策略集中管理
- **可觀測性** 單一入口易於監控
### 缺點
- **開發成本** 新功能需在 BFF 層多寫一層
- **延遲增加** 多一層網路跳躍 (~1-5ms)
### 風險
| 風險 | 緩解措施 |
|------|---------|
| BFF 成為瓶頸 | 水平擴展 + Redis 快取 |
| 開發速度下降 | OpenAPI 自動生成 Client SDK |
## 參考
- [BFF Pattern](https://samnewman.io/patterns/architectural/bff/)
- api-contract.yaml (BFF 對外契約)
- ADR-001: MCP Protocol (內部服務整合)

View File

@@ -0,0 +1,297 @@
# ADR-006: AI 降級備援策略
> **狀態**: 已接受
> **日期**: 2026-03-20
> **決策者**: CTO, CEO
---
## 背景
AWOOOI 系統高度依賴 AI 功能,包括 AI Copilot、異常偵測、智能摘要等。
當本地 Ollama 服務不可用時,需要有完善的降級備援機制,同時嚴格控制雲端 API 成本。
### CEO 指示 #2
> 雲端備援的順序採 **Gemini API 然後才是 Claude API**,並且要有效控管、監控,
> API Token 使用的數量,要搭配告警機制,避免費用暴增!
---
## 決策
### 1. AI 服務優先順序
```
┌─────────────────────────────────────────────────────┐
│ 優先級 1: Ollama (本地) │
│ 192.168.0.188:11434 │
│ 成本: $0 / 延遲: ~200ms │
└─────────────────────────────────────────────────────┘
│ 失敗
┌─────────────────────────────────────────────────────┐
│ 優先級 2: Gemini API (雲端備援 - 優先) │
│ 成本: ~$0.001/1K tokens │
└─────────────────────────────────────────────────────┘
│ 失敗
┌─────────────────────────────────────────────────────┐
│ 優先級 3: Claude API (雲端備援 - 次選) │
│ 成本: ~$0.008/1K tokens │
└─────────────────────────────────────────────────────┘
│ 失敗
┌─────────────────────────────────────────────────────┐
│ 優先級 4: 靜態回應 (完全降級) │
│ 返回預設訊息,不調用任何 AI │
└─────────────────────────────────────────────────────┘
```
### 2. Circuit Breaker 機制
```python
# apps/api/app/services/ai/circuit_breaker.py
from enum import Enum
from datetime import datetime, timedelta
import asyncio
class CircuitState(Enum):
CLOSED = "closed" # 正常
OPEN = "open" # 熔斷
HALF_OPEN = "half_open" # 試探
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5, # 連續失敗 5 次觸發熔斷
recovery_timeout: int = 60, # 熔斷後 60 秒嘗試恢復
half_open_max_calls: int = 3 # 半開狀態最多 3 次試探
):
self.state = CircuitState.CLOSED
self.failure_count = 0
self.last_failure_time = None
self.half_open_calls = 0
# ...
async def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if self._should_try_recovery():
self.state = CircuitState.HALF_OPEN
self.half_open_calls = 0
else:
raise CircuitOpenError("Circuit is open")
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
```
### 3. Token 使用量監控與告警
#### 每日/每月配額
| API | 每日上限 | 每月上限 | 告警閾值 |
|-----|---------|---------|---------|
| Gemini | 100K tokens | 2M tokens | 70% |
| Claude | 50K tokens | 500K tokens | 70% |
#### 監控 Schema
```python
# apps/api/app/models/ai_usage.py
class AIUsageLog(Base):
__tablename__ = "ai_usage_logs"
id = Column(UUID, primary_key=True)
provider = Column(String) # ollama, gemini, claude
model = Column(String)
input_tokens = Column(Integer)
output_tokens = Column(Integer)
latency_ms = Column(Integer)
success = Column(Boolean)
error_message = Column(String, nullable=True)
user_id = Column(UUID, ForeignKey("users.id"))
created_at = Column(DateTime, default=func.now())
```
#### 告警規則
```yaml
# k8s/monitoring/prometheus/ai-usage-alerts.yaml
groups:
- name: ai-usage-alerts
rules:
# Gemini 每日用量 70% 告警
- alert: GeminiDailyUsageWarning
expr: |
sum(increase(ai_tokens_total{provider="gemini"}[24h])) > 70000
labels:
severity: warning
annotations:
summary: "Gemini API 每日用量已達 70%"
description: "今日 Gemini 已使用 {{ $value | humanize }} tokens"
# Gemini 每日用量 90% 嚴重告警
- alert: GeminiDailyUsageCritical
expr: |
sum(increase(ai_tokens_total{provider="gemini"}[24h])) > 90000
labels:
severity: critical
annotations:
summary: "Gemini API 每日用量已達 90%,即將觸發限流"
# Claude 每日用量 70% 告警
- alert: ClaudeDailyUsageWarning
expr: |
sum(increase(ai_tokens_total{provider="claude"}[24h])) > 35000
labels:
severity: warning
annotations:
summary: "Claude API 每日用量已達 70%"
# Ollama 連續失敗告警
- alert: OllamaConsecutiveFailures
expr: |
increase(ai_requests_failed_total{provider="ollama"}[5m]) > 5
labels:
severity: critical
annotations:
summary: "Ollama 服務可能已離線"
description: "過去 5 分鐘 Ollama 請求失敗超過 5 次,已啟動雲端備援"
# 月度預算 50% 提醒
- alert: MonthlyAIBudgetWarning
expr: |
(
sum(increase(ai_tokens_total{provider="gemini"}[30d])) * 0.000001 +
sum(increase(ai_tokens_total{provider="claude"}[30d])) * 0.000008
) > 5
labels:
severity: warning
annotations:
summary: "AI 月度成本已達 $5 (預算 50%)"
```
### 4. 成本預估
| 場景 | Gemini | Claude | 月成本 |
|------|--------|--------|--------|
| **正常** (Ollama 100%) | 0 | 0 | $0 |
| **輕度降級** (Ollama 90%, Gemini 10%) | ~200K | 0 | ~$0.20 |
| **中度降級** (Gemini 80%, Claude 20%) | ~1.6M | ~400K | ~$5 |
| **完全降級** (雲端 100%) | ~2M | ~500K | ~$10 |
### 5. 實作範例
```python
# apps/api/app/services/ai/router.py
from app.services.ai.providers import OllamaProvider, GeminiProvider, ClaudeProvider
from app.services.ai.circuit_breaker import CircuitBreaker
from app.services.ai.usage_tracker import UsageTracker
class AIRouter:
def __init__(self):
self.ollama = OllamaProvider()
self.gemini = GeminiProvider()
self.claude = ClaudeProvider()
self.ollama_circuit = CircuitBreaker(failure_threshold=3, recovery_timeout=30)
self.gemini_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
self.claude_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
self.usage_tracker = UsageTracker()
async def generate(self, prompt: str, user_id: str) -> AIResponse:
providers = [
("ollama", self.ollama, self.ollama_circuit),
("gemini", self.gemini, self.gemini_circuit),
("claude", self.claude, self.claude_circuit),
]
for name, provider, circuit in providers:
# 檢查配額
if name in ["gemini", "claude"]:
if await self.usage_tracker.is_quota_exceeded(name):
logger.warning(f"{name} daily quota exceeded, skipping")
continue
try:
result = await circuit.call(provider.generate, prompt)
# 記錄使用量
await self.usage_tracker.log(
provider=name,
input_tokens=result.input_tokens,
output_tokens=result.output_tokens,
user_id=user_id,
success=True
)
return result
except CircuitOpenError:
logger.info(f"{name} circuit is open, trying next provider")
continue
except Exception as e:
logger.error(f"{name} failed: {e}, trying next provider")
await self.usage_tracker.log(
provider=name,
error_message=str(e),
user_id=user_id,
success=False
)
continue
# 所有 AI 都失敗,返回靜態回應
return AIResponse(
content="抱歉AI 服務暫時不可用。請稍後再試,或聯繫管理員。",
provider="fallback",
tokens=0
)
```
### 6. Dashboard 展示
AI 用量監控面板應顯示:
- 今日各 Provider 使用量 (tokens)
- 本月累計成本 (USD)
- 各 Provider 健康狀態 (綠/黃/紅)
- 平均延遲 (ms)
- 成功率 (%)
---
## 影響
### 正面
- 確保 AI 功能高可用性
- 成本可控、可預測
- 即時告警避免帳單爆炸
### 需要注意
- 需維護多個 API Key
- 不同 Provider 回應品質可能有差異
- 需要處理 API 格式轉換
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此 ADR 記錄 AI 降級備援策略的決策過程與實作規範。*

View File

@@ -0,0 +1,234 @@
# ADR-007: 資料保留策略
> **狀態**: 已接受
> **日期**: 2026-03-20
> **決策者**: CEO, CTO, CIO
---
## 背景
需要定義各類型資料的保留時間 (TTL),確保:
1. 系統效能不因資料累積而下降
2. 重要資料有足夠的回溯時間
3. 儲存成本可控
### CEO 指示 #7
> 熱資料 (Redis/即時查詢) TTL 7 天 => 初期是否也保留 6 個月?要確認數據量有多大?
---
## 決策
### 資料分層策略
```
┌─────────────────────────────────────────────────────────┐
│ Layer 1: 熱資料 (Redis) │
│ TTL: 7-30 天 │
│ 用途: 即時查詢、快取、Session │
│ 預估容量: ~500MB │
└─────────────────────────────────────────────────────────┘
│ 過期後
┌─────────────────────────────────────────────────────────┐
│ Layer 2: 溫資料 (PostgreSQL) │
│ TTL: 6 個月 (CEO 指示) │
│ 用途: 歷史查詢、報表、分析 │
│ 預估容量: ~5GB/月 │
└─────────────────────────────────────────────────────────┘
│ 過期後
┌─────────────────────────────────────────────────────────┐
│ Layer 3: 冷資料 (歸檔) │
│ TTL: 永久 (審計日誌) / 1 年 (一般) │
│ 用途: 合規、稽核、法律要求 │
│ 預估容量: ~10GB/年 │
└─────────────────────────────────────────────────────────┘
```
### 各資料類型 TTL 定義
#### Redis 熱資料 (Layer 1)
| 資料類型 | TTL | 說明 | 預估大小 |
|---------|-----|------|---------|
| Session Token | 7 天 | 用戶登入狀態 | ~1KB/session |
| Dashboard 快取 | 5 分鐘 | 即時指標聚合 | ~10KB |
| 主機狀態快取 | 30 秒 | 即時健康狀態 | ~2KB/host |
| AI 回應快取 | 1 小時 | 相同問題快取 | ~5KB/entry |
| 限流計數器 | 1 分鐘 | Rate Limiting | ~100B/user |
**Redis 容量評估**:
- 4 台主機 × 2KB = 8KB (即時狀態)
- 100 用戶 × 1KB = 100KB (Session)
- Dashboard 快取 = 50KB
- AI 快取 (1000 條) = 5MB
- **總計: ~10MB (遠低於 Redis 16GB 容量)**
> ✅ **結論**: Redis 熱資料保持短 TTL (7-30 天) 是合理的,不需要延長至 6 個月。
> Redis 用於快取和即時查詢,歷史資料應存放在 PostgreSQL。
#### PostgreSQL 溫資料 (Layer 2)
| 資料類型 | TTL | 說明 | 預估大小 |
|---------|-----|------|---------|
| 監控指標 | 6 個月 | CPU/Memory/Disk 歷史 | ~1GB/月 |
| 告警記錄 | 6 個月 | 歷史告警 | ~100MB/月 |
| 部署記錄 | 6 個月 | Pipeline 執行歷史 | ~50MB/月 |
| 工單記錄 | 6 個月 | 處理歷史 | ~20MB/月 |
| AI 對話記錄 | 6 個月 | Copilot 歷史 | ~500MB/月 |
| 用戶操作記錄 | 6 個月 | 行為追蹤 | ~200MB/月 |
**PostgreSQL 容量評估**:
- 每月增量: ~2GB
- 6 個月累計: ~12GB
- **總計 (含索引): ~20GB**
> ✅ **結論**: PostgreSQL 溫資料保留 6 個月,符合 CEO 指示,容量可控。
#### 冷資料歸檔 (Layer 3)
| 資料類型 | TTL | 說明 |
|---------|-----|------|
| 審計日誌 | 永久 | 合規要求,不可刪除 |
| 財務記錄 | 7 年 | 法律要求 |
| 安全事件 | 3 年 | 資安稽核 |
| 系統設定變更 | 1 年 | 變更追蹤 |
---
### 資料清理機制
#### 自動清理 Job
```python
# apps/api/app/jobs/data_cleanup.py
from datetime import datetime, timedelta
from app.database import get_db
from app.models import Metric, Alert, Deployment, AIConversation
async def cleanup_expired_data():
"""每日凌晨 3:00 執行"""
six_months_ago = datetime.utcnow() - timedelta(days=180)
async with get_db() as db:
# 清理過期監控指標
deleted_metrics = await db.execute(
delete(Metric).where(Metric.created_at < six_months_ago)
)
logger.info(f"Deleted {deleted_metrics.rowcount} expired metrics")
# 清理過期告警 (保留 acknowledged 狀態)
deleted_alerts = await db.execute(
delete(Alert).where(
Alert.created_at < six_months_ago,
Alert.status != 'archived' # 保留歸檔的告警
)
)
logger.info(f"Deleted {deleted_alerts.rowcount} expired alerts")
# ... 其他資料類型
await db.commit()
# 更新 Prometheus 指標
cleanup_records_deleted.labels(type="metrics").inc(deleted_metrics.rowcount)
cleanup_records_deleted.labels(type="alerts").inc(deleted_alerts.rowcount)
```
#### K8s CronJob
```yaml
# k8s/jobs/data-cleanup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: awoooi-data-cleanup
namespace: awoooi-prod
spec:
schedule: "0 3 * * *" # 每日凌晨 3:00
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: awoooi-api:latest
command: ["python", "-m", "app.jobs.data_cleanup"]
restartPolicy: OnFailure
```
---
### 資料量監控
#### Prometheus 指標
```yaml
# k8s/monitoring/prometheus/data-alerts.yaml
groups:
- name: data-storage-alerts
rules:
# PostgreSQL 容量警告
- alert: PostgreSQLHighUsage
expr: |
pg_database_size_bytes{datname="awoooi"} > 15 * 1024 * 1024 * 1024
labels:
severity: warning
annotations:
summary: "PostgreSQL 容量已達 15GB"
description: "目前使用 {{ $value | humanize1024 }},建議檢查資料清理 Job"
# Redis 容量警告
- alert: RedisHighMemory
expr: |
redis_memory_used_bytes{db="10"} > 1 * 1024 * 1024 * 1024
labels:
severity: warning
annotations:
summary: "Redis DB 10 記憶體使用超過 1GB"
```
---
### 儲存成本評估
| 層級 | 6 個月容量 | 儲存類型 | 成本 |
|------|-----------|---------|------|
| Redis (熱) | ~10MB | 內存 | 包含在伺服器 |
| PostgreSQL (溫) | ~20GB | SSD | 包含在伺服器 |
| 歸檔 (冷) | ~10GB/年 | HDD/S3 | ~$0.5/月 |
**結論**: 採用自建伺服器,儲存成本基本為 $0 (已攤提)。
---
## 影響
### 正面
- 資料保留策略明確,符合 CEO 6 個月要求
- Redis 維持高效能 (短 TTL)
- 歷史資料可追溯
- 儲存成本可控
### 需要注意
- 清理 Job 需要監控,確保正常執行
- 歸檔資料需要定期備份
- 審計日誌不可刪除,需永久保留
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此 ADR 記錄資料保留策略的決策過程與實作規範。*

View File

@@ -0,0 +1,583 @@
# ADR-009: OpenClaw Agent Teams 架構
**狀態**: 提議中 → 研究完成
**日期**: 2026-03-23
**決策者**: 統帥 + AI 架構師
**Phase**: 9.1-9.2 (SDK 研究 + 架構設計)
## 背景
AWOOOI 的核心價值是 "AI Sees. AI Acts. You Approve."
目前 OpenClaw 是單一 AI 大腦,面對複雜告警時:
- 單一視角可能遺漏問題
- 無法並行分析多個面向
- 決策品質依賴單一模型
Claude 推出了 **Claude Agent SDK** (原 Claude Code SDK2026-03-20 發布 v0.1.50),支援多 Agent 協調。我們評估將此概念整合進 AWOOOI 產品。
### SDK 研究結論 (2026-03-23)
| 項目 | 研究結果 |
|------|---------|
| **SDK 名稱** | `claude-agent-sdk` (PyPI) |
| **最新版本** | v0.1.50 (2026-03-20) |
| **Python 版本** | ≥ 3.10 |
| **核心 API** | `query()`, `ClaudeSDKClient` |
| **Subagent 支援** | ✅ 原生支援 (`AgentDefinition`) |
| **自訂 Tools** | ✅ `@tool` 裝飾器 + MCP 整合 |
## 決策
**採用 Claude Agent SDK 實作 OpenClaw Agent Teams升級為多專家共識決策架構。**
### 為何選擇 Claude Agent SDK (而非自建)
| 考量 | 自建方案 | Claude Agent SDK |
|------|---------|------------------|
| 開發時間 | 2-3 週 | 2-3 天 |
| Tool 執行 | 需自行實作 | 內建 (Read, Edit, Bash...) |
| Subagent | 需自行設計 | 原生支援 |
| Session 管理 | 需自行實作 | 內建 (resume, fork) |
| MCP 整合 | 需橋接 | 原生支援 |
| 維護成本 | 高 | 低 (跟隨 Anthropic 更新) |
### 架構設計
```
┌─────────────────────────────────────────────────────────────┐
│ OpenClaw Coordinator │
│ (Team Lead Agent) │
├─────────────────────────────────────────────────────────────┤
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ BlastRadius │ │ Action │ │
│ │ Agent │ │ Agent │ │ Planner │ │
│ │ (資安評估) │ │ (影響範圍) │ │ (行動方案) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Consensus Engine │ │
│ │ (共識引擎 - 加權投票) │ │
│ └─────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Final Proposal │ │
│ │ (統一提案 → 人類審批) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Agent 職責
| Agent | 職責 | 輸出 |
|-------|------|------|
| **Coordinator** | 分配任務、彙整共識 | Final Proposal |
| **SecurityAgent** | 評估安全風險、權限影響 | Risk Score (0-10) |
| **BlastRadiusAgent** | 分析影響範圍、相依服務 | Affected Services List |
| **ActionPlannerAgent** | 規劃修復步驟、回滾方案 | Action Steps + Rollback |
### 共識機制
```python
class ConsensusEngine:
weights = {
"security": 0.4, # 資安權重最高
"blast_radius": 0.3, # 影響範圍次之
"action_plan": 0.3, # 行動方案
}
def calculate_confidence(self, results: dict) -> float:
"""加權計算整體信心分數"""
score = 0
for agent, weight in self.weights.items():
score += results[agent].confidence * weight
return score
def should_auto_approve(self, confidence: float) -> bool:
"""信心分數 > 0.9 且無高風險 → 可自動執行"""
return confidence > 0.9 and not self.has_high_risk()
```
## 技術實作
### 依賴 (Phase 9.2 研究結果)
```toml
# apps/api/pyproject.toml
[project.dependencies]
# Phase 9: OpenClaw Agent Teams
claude-agent-sdk = ">=0.1.50" # Claude Agent SDK (原 Claude Code SDK)
# Note: SDK 自動包含 Claude Code CLI無需額外安裝
```
#### 安裝指令
```bash
# 使用 uv (推薦)
uv add claude-agent-sdk
# 使用 pip
pip install claude-agent-sdk
# 驗證安裝
python -c "from claude_agent_sdk import query; print('OK')"
```
#### 環境變數
```bash
# 必須
export ANTHROPIC_API_KEY=sk-ant-...
# 可選 (雲端備援,參考 ADR-006)
export CLAUDE_CODE_USE_BEDROCK=1 # AWS Bedrock
export CLAUDE_CODE_USE_VERTEX=1 # Google Vertex AI
```
### 核心類別 (使用 Claude Agent SDK)
```python
# apps/api/src/services/openclaw_team.py
import asyncio
from claude_agent_sdk import (
query,
ClaudeAgentOptions,
AgentDefinition,
ClaudeSDKClient,
AssistantMessage,
ResultMessage,
)
from dataclasses import dataclass
from typing import AsyncIterator
@dataclass
class AgentResult:
agent: str
analysis: str
confidence: float
risk_score: float | None = None
affected_services: list[str] | None = None
action_steps: list[str] | None = None
@dataclass
class Proposal:
incident_id: str
summary: str
agent_results: list[AgentResult]
consensus_score: float
recommended_action: str
auto_approvable: bool
class OpenClawTeam:
"""
使用 Claude Agent SDK 實作多專家協調分析
符合 leWOOOgo BRAIN 積木介面
"""
def __init__(self):
# 定義專家 Subagents
self.agents = {
"security-expert": AgentDefinition(
description="資安專家,評估安全風險與權限影響",
prompt="""你是 AWOOOI 的資安專家。
分析告警的安全風險,評估:
1. 是否涉及敏感資料
2. 是否可能被利用
3. 權限邊界是否被突破
輸出 JSON: {"risk_score": 0-10, "analysis": "...", "confidence": 0-1}""",
tools=["Read", "Grep"], # 只讀權限
),
"blast-radius": AgentDefinition(
description="影響範圍分析師,評估相依服務與影響範圍",
prompt="""你是 AWOOOI 的影響範圍分析師。
分析告警的影響範圍:
1. 直接影響的服務
2. 間接相依的服務
3. 使用者影響人數估計
輸出 JSON: {"affected_services": [...], "blast_radius": "low|medium|high", "confidence": 0-1}""",
tools=["Read", "Glob", "Grep"],
),
"action-planner": AgentDefinition(
description="行動規劃師,制定修復步驟與回滾方案",
prompt="""你是 AWOOOI 的行動規劃師。
根據告警制定修復計畫:
1. 立即修復步驟 (kubectl 指令)
2. 驗證步驟
3. 回滾方案
注意: 所有 kubectl 必須帶 -n awoooi-prod
輸出 JSON: {"action_steps": [...], "rollback_steps": [...], "confidence": 0-1}""",
tools=["Read", "Glob"],
),
}
self.options = ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep", "Agent"], # Agent 用於調用 Subagent
agents=self.agents,
system_prompt="""你是 OpenClaw CoordinatorAWOOOI 的 AI 決策引擎。
你的任務是協調多個專家 Agent 分析告警,彙整共識並產出最終提案。
呼叫順序: security-expert → blast-radius → action-planner
最終輸出統一提案供人類審批。""",
)
async def analyze_incident(self, incident: dict) -> Proposal:
"""
並行呼叫多個 Subagent 分析告警
"""
prompt = f"""
分析以下告警並產出修復提案:
```json
{json.dumps(incident, ensure_ascii=False, indent=2)}
```
請依序呼叫以下 Agent:
1. security-expert - 評估安全風險
2. blast-radius - 分析影響範圍
3. action-planner - 規劃修復步驟
收集所有分析結果後,使用 ConsensusEngine 邏輯 (security 40%, blast_radius 30%, action 30%)
計算整體信心分數,並產出最終提案。
輸出格式:
```json
{{
"summary": "一句話摘要",
"agent_results": [...],
"consensus_score": 0-1,
"recommended_action": "建議的 kubectl 指令",
"auto_approvable": true/false (>0.9 且無高風險)
}}
```
"""
result_json = None
async for message in query(prompt=prompt, options=self.options):
if isinstance(message, ResultMessage):
# 解析最終結果
result_json = self._extract_json(message.result)
if not result_json:
raise ValueError("Agent Team 未能產出有效提案")
return Proposal(
incident_id=incident.get("id", "unknown"),
summary=result_json.get("summary", ""),
agent_results=self._parse_agent_results(result_json.get("agent_results", [])),
consensus_score=result_json.get("consensus_score", 0),
recommended_action=result_json.get("recommended_action", ""),
auto_approvable=result_json.get("auto_approvable", False),
)
def _extract_json(self, text: str) -> dict:
"""從回應中提取 JSON"""
import json
import re
match = re.search(r'```json\s*(.*?)\s*```', text, re.DOTALL)
if match:
return json.loads(match.group(1))
return json.loads(text)
def _parse_agent_results(self, results: list) -> list[AgentResult]:
"""解析各 Agent 結果"""
return [
AgentResult(
agent=r.get("agent", "unknown"),
analysis=r.get("analysis", ""),
confidence=r.get("confidence", 0),
risk_score=r.get("risk_score"),
affected_services=r.get("affected_services"),
action_steps=r.get("action_steps"),
)
for r in results
]
```
### 替代方案: ClaudeSDKClient (互動式)
```python
# 適用於需要人機互動的場景
async def interactive_analysis(incident: dict):
async with ClaudeSDKClient(options=options) as client:
# 第一輪: 安全分析
await client.query(f"使用 security-expert 分析: {json.dumps(incident)}")
security_result = await collect_response(client)
# 人類可在此介入調整
# 第二輪: 影響範圍
await client.query("繼續使用 blast-radius 分析影響範圍")
blast_result = await collect_response(client)
# ...
```
### API 端點
```python
# apps/api/src/routes/incidents.py
@router.post("/api/v1/incidents/{incident_id}/analyze")
async def analyze_with_team(incident_id: str):
"""使用 Agent Team 分析告警"""
incident = await get_incident(incident_id)
team = OpenClawTeam()
proposal = await team.analyze_incident(incident)
return {
"proposal": proposal,
"agent_results": proposal.agent_results,
"consensus_score": proposal.consensus_score,
"auto_approvable": proposal.auto_approvable
}
```
### UI 呈現
```tsx
// apps/web/src/components/incident/agent-team-analysis.tsx
export function AgentTeamAnalysis({ proposal }: Props) {
return (
<GlassCard>
<h3>{t('incident.teamAnalysis')}</h3>
{/* 各 Agent 分析結果 */}
<div className="grid grid-cols-3 gap-4">
{proposal.agentResults.map(result => (
<AgentResultCard
key={result.agent}
agent={result.agent}
confidence={result.confidence}
summary={result.summary}
/>
))}
</div>
{/* 共識分數 */}
<ConsensusScore score={proposal.consensusScore} />
{/* 最終提案 */}
<ProposalCard proposal={proposal} />
</GlassCard>
)
}
```
## 對應 leWOOOgo 積木
| 積木類別 | 新增模組 |
|---------|---------|
| **BRAIN** | `SecurityAgent` |
| **BRAIN** | `BlastRadiusAgent` |
| **BRAIN** | `ActionPlannerAgent` |
| **BRAIN** | `CoordinatorAgent` |
| **BRAIN** | `ConsensusEngine` |
## 後果
### 優點
1. **多視角分析** - 不同專家 Agent 各司其職
2. **共識決策** - 加權投票提高決策品質
3. **可解釋性** - 每個 Agent 的分析過程透明
4. **彈性擴展** - 可新增更多專家 Agent
5. **差異化** - 競品無此功能
### 缺點
1. **成本增加** - 多 Agent 呼叫增加 API 費用
2. **延遲增加** - 並行分析仍需等待最慢的 Agent
3. **複雜度** - 共識機制需要調優
### 風險
| 風險 | 緩解措施 |
|------|---------|
| API 成本爆炸 | 設定 Token 上限、快取策略 |
| Agent 意見衝突 | 共識引擎加權投票 |
| SDK 不穩定 | 先用 Anthropic SDK 模擬 |
## 與 leWOOOgo 整合 (ADR-003)
OpenClaw Agent Teams 作為 **BRAIN 積木** 整合進 leWOOOgo 架構:
```
┌─────────────────────────────────────────────────────────────────┐
│ leWOOOgo Engine │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 🧱 INPUT ──────→ 🧠 BRAIN ──────────────→ 📢 OUTPUT │
│ (Prometheus) │ (Telegram) │
│ │ │
│ ┌──────┴──────┐ │
│ │ OpenClawTeam │ ← NEW: Agent Teams │
│ │ (SDK-based) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ 🔧 ACTION │ │
│ │ K8sExecutor │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### BRAIN 積木介面實作
```python
# packages/lewooogo-brain/src/openclaw_team_plugin.py
from lewooogo_core.interfaces import AgentProvider, AgentInput, AgentOutput
class OpenClawTeamPlugin(AgentProvider):
"""
leWOOOgo BRAIN 積木: OpenClaw Agent Teams
符合 ADR-003 定義的 AgentProvider 介面
"""
id = "openclaw-agent-team"
name = "OpenClaw Agent Team"
version = "0.1.0"
category = "BRAIN"
def __init__(self):
self.team = OpenClawTeam()
async def initialize(self) -> None:
# 驗證 API Key
assert os.environ.get("ANTHROPIC_API_KEY"), "Missing ANTHROPIC_API_KEY"
async def process(self, input: AgentInput) -> AgentOutput:
proposal = await self.team.analyze_incident(input.payload)
return AgentOutput(
result=proposal,
confidence=proposal.consensus_score,
metadata={"agent_count": 3, "sdk_version": "0.1.50"},
)
def get_capabilities(self) -> list[str]:
return [
"security-analysis",
"blast-radius-analysis",
"action-planning",
"consensus-decision",
]
async def health_check(self) -> dict:
return {"status": "healthy", "sdk": "claude-agent-sdk"}
async def shutdown(self) -> None:
pass
```
## 與 ADR-006 整合 (AI 備援)
Agent Teams 整合現有 AI Fallback 策略:
```
優先級 1: Ollama (本地) → 簡單告警走 Ollama
優先級 2: Claude Agent SDK → 複雜告警走 Agent Teams
優先級 3: Gemini API → SDK 失敗時備援
優先級 4: 靜態回應
```
### 路由邏輯
```python
class OpenClawRouter:
async def route(self, incident: dict) -> Proposal:
# 根據告警複雜度選擇處理器
if self._is_simple_alert(incident):
# 簡單告警: Ollama 足夠
return await self.ollama_handler.analyze(incident)
else:
# 複雜告警: 使用 Agent Teams
try:
return await self.agent_team.analyze_incident(incident)
except ClaudeSDKError:
# SDK 失敗,降級到 Gemini
return await self.gemini_fallback.analyze(incident)
def _is_simple_alert(self, incident: dict) -> bool:
# 判斷邏輯: P3/P4 且影響單一服務 → 簡單
severity = incident.get("severity", "P3")
affected = incident.get("affected_services", [])
return severity in ["P3", "P4"] and len(affected) <= 1
```
## 實作計劃 (更新版)
| Phase | 內容 | 狀態 | 預估 |
|-------|------|------|------|
| 9.1 | ADR 審核 + SDK 研究 | ✅ 完成 | 0.5 天 |
| 9.2 | SDK 整合 + POC | 🔜 下一步 | 1 天 |
| 9.3 | 3 專家 Agent 實作 | | 2 天 |
| 9.4 | ConsensusEngine + leWOOOgo 整合 | | 1.5 天 |
| 9.5 | API 端點 + UI 呈現 | | 1.5 天 |
| 9.6 | 測試 + 文檔 + ADR-006 整合 | | 1 天 |
**總計: 7.5 天** (原估 10 天,因 SDK 簡化減少)
### Phase 9.2 POC 驗證項目
```bash
# 1. 安裝 SDK
cd apps/api && uv add claude-agent-sdk
# 2. 建立測試腳本
cat > scripts/test-agent-team.py << 'EOF'
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
async def main():
# 簡單 Subagent 測試
options = ClaudeAgentOptions(
allowed_tools=["Agent"],
agents={
"test-agent": AgentDefinition(
description="測試 Agent",
prompt="回答問題並回傳 JSON",
tools=[],
)
},
)
async for msg in query(
prompt="使用 test-agent 回答: 2+2=?",
options=options,
):
print(msg)
asyncio.run(main())
EOF
# 3. 執行測試
python scripts/test-agent-team.py
```
## 相關 ADR
- ADR-003: leWOOOgo 模組架構 (BRAIN 積木)
- ADR-006: AI 備援策略 (Fallback 整合)
- ADR-001: MCP Protocol 採用 (SDK 支援 MCP)
## 參考資料
- [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview)
- [Claude Agent SDK Quickstart](https://platform.claude.com/docs/en/agent-sdk/quickstart)
- [Claude Agent SDK Python GitHub](https://github.com/anthropics/claude-agent-sdk-python)
- [Claude Agent SDK Demos](https://github.com/anthropics/claude-agent-sdk-demos)
- [LangGraph + Claude Agent SDK 整合](https://www.mager.co/blog/2026-03-07-langgraph-claude-agent-sdk-ultimate-guide/)
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-23 | v0.1 | 初稿提議 | AI 架構師 |
| 2026-03-23 | v0.2 | SDK 研究完成,加入具體整合方案 | AI 架構師 |

View File

@@ -0,0 +1,414 @@
# AWOOOI API 開發 SOP
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CTO
> **狀態**: Phase 0 草稿
---
## 概述
此文件定義 AWOOOI 所有 API 端點的開發標準流程,確保 Contract-First 原則與 CI 強制檢查能夠有效執行。
---
## API 開發流程
### 1. 設計階段 (Design)
```
┌─────────────────────────────────────────────────────────────┐
│ Step 1: OpenAPI 定義 │
├─────────────────────────────────────────────────────────────┤
│ 位置: docs/api/openapi.yaml │
│ 工具: Stoplight Studio / VS Code OpenAPI Editor │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Step 2: 端點文檔 (Markdown) │
├─────────────────────────────────────────────────────────────┤
│ 位置: docs/api/endpoints/{module}/{endpoint}.md │
│ 內容: 用途說明、請求範例、回應範例、錯誤碼 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Step 3: PR Review + Approval │
├─────────────────────────────────────────────────────────────┤
│ 審核者: CTO / CISO (安全相關 API) │
│ 檢查項: 命名規範、版本策略、錯誤處理、安全考量 │
└─────────────────────────────────────────────────────────────┘
```
### 2. 實作階段 (Implementation)
```python
# apps/api/app/routers/{module}.py
from fastapi import APIRouter, Depends, HTTPException, status
from app.schemas.{module} import {RequestModel}, {ResponseModel}
from app.services.{module} import {ServiceClass}
from app.core.deps import get_current_user, rate_limit
from app.core.cache import cache_response
router = APIRouter(prefix="/v1/{module}", tags=["{module}"])
@router.get(
"/{endpoint}",
response_model={ResponseModel},
summary="端點摘要",
description="詳細說明",
responses={
200: {"description": "成功"},
400: {"description": "請求格式錯誤"},
401: {"description": "未授權"},
403: {"description": "權限不足"},
404: {"description": "資源不存在"},
429: {"description": "請求過於頻繁"},
500: {"description": "伺服器錯誤"},
}
)
@cache_response(ttl=60) # 快取策略
@rate_limit(requests=100, window=60) # 限流策略
async def endpoint_name(
request: {RequestModel},
current_user: User = Depends(get_current_user),
service: {ServiceClass} = Depends()
):
"""
端點實作邏輯
"""
pass
```
### 3. 測試階段 (Testing)
```python
# apps/api/tests/test_{module}.py
import pytest
from httpx import AsyncClient
from app.main import app
class Test{Module}API:
"""
測試命名規範: test_{method}_{endpoint}_{scenario}
"""
@pytest.mark.asyncio
async def test_get_endpoint_success(self, client: AsyncClient, auth_headers: dict):
response = await client.get("/v1/{module}/{endpoint}", headers=auth_headers)
assert response.status_code == 200
assert "data" in response.json()
@pytest.mark.asyncio
async def test_get_endpoint_unauthorized(self, client: AsyncClient):
response = await client.get("/v1/{module}/{endpoint}")
assert response.status_code == 401
@pytest.mark.asyncio
async def test_get_endpoint_not_found(self, client: AsyncClient, auth_headers: dict):
response = await client.get("/v1/{module}/nonexistent", headers=auth_headers)
assert response.status_code == 404
```
---
## API 版本策略
### URL 版本控制
```
https://api.awoooi.wooo.work/v1/... # 目前版本
https://api.awoooi.wooo.work/v2/... # 未來版本 (重大變更時)
```
### 版本升級規則
| 變更類型 | 版本影響 | 範例 |
|---------|---------|------|
| **新增端點** | 不變 | 新增 `GET /v1/metrics/custom` |
| **新增可選欄位** | 不變 | Response 新增 `metadata` 欄位 |
| **移除/重命名欄位** | 升版 | `user_id``userId` |
| **變更回應結構** | 升版 | 陣列 → 分頁物件 |
| **變更認證方式** | 升版 | Bearer → OAuth2 |
### 棄用流程
1. **通知** (v1.x): 在回應 Header 加入 `Deprecation: true` + `Sunset: 2026-06-01`
2. **文檔標記**: OpenAPI 加入 `deprecated: true`
3. **過渡期**: 至少 3 個月
4. **移除**: 完全移除舊端點
---
## 命名規範
### URL 路徑
```
# ✅ 正確
GET /v1/hosts # 列表
GET /v1/hosts/{id} # 單一資源
POST /v1/hosts # 建立
PUT /v1/hosts/{id} # 完整更新
PATCH /v1/hosts/{id} # 部分更新
DELETE /v1/hosts/{id} # 刪除
# ✅ 子資源
GET /v1/hosts/{id}/metrics # 主機的指標
POST /v1/hosts/{id}/actions/scan # 動作 (非 CRUD)
# ❌ 錯誤
GET /v1/getHosts # 動詞命名
GET /v1/host_list # 底線 + 複數不一致
POST /v1/hosts/create # 冗餘動詞
```
### 查詢參數
```
# 分頁
?page=1&limit=20
# 排序
?sort=created_at&order=desc
# 篩選
?status=active&host_id=h-001
# 搜尋
?q=keyword
# 時間範圍
?start_time=2026-03-01T00:00:00Z&end_time=2026-03-20T23:59:59Z
```
### 請求/回應欄位
```json
// ✅ 正確: camelCase (JSON 標準)
{
"hostId": "h-001",
"hostName": "web-server-01",
"createdAt": "2026-03-20T10:00:00Z",
"isActive": true
}
// ❌ 錯誤: snake_case (Python 內部使用)
{
"host_id": "h-001",
"host_name": "web-server-01"
}
```
---
## 回應格式
### 成功回應
```json
// 單一資源
{
"data": {
"id": "h-001",
"name": "web-server-01",
"status": "healthy"
},
"meta": {
"requestId": "req-abc123",
"timestamp": "2026-03-20T10:00:00Z"
}
}
// 列表 (分頁)
{
"data": [
{"id": "h-001", "name": "web-server-01"},
{"id": "h-002", "name": "web-server-02"}
],
"pagination": {
"page": 1,
"limit": 20,
"total": 45,
"totalPages": 3
},
"meta": {
"requestId": "req-abc123",
"timestamp": "2026-03-20T10:00:00Z"
}
}
```
### 錯誤回應
```json
{
"error": {
"code": "VALIDATION_ERROR",
"message": "請求參數驗證失敗",
"details": [
{
"field": "email",
"message": "必須為有效的電子郵件格式"
}
]
},
"meta": {
"requestId": "req-abc123",
"timestamp": "2026-03-20T10:00:00Z"
}
}
```
### 錯誤碼對照表
| HTTP Status | Error Code | 說明 |
|-------------|-----------|------|
| 400 | `VALIDATION_ERROR` | 請求參數驗證失敗 |
| 400 | `INVALID_FORMAT` | 請求格式錯誤 |
| 401 | `UNAUTHORIZED` | 未提供認證資訊 |
| 401 | `TOKEN_EXPIRED` | Token 已過期 |
| 403 | `FORBIDDEN` | 權限不足 |
| 404 | `NOT_FOUND` | 資源不存在 |
| 409 | `CONFLICT` | 資源衝突 |
| 422 | `UNPROCESSABLE_ENTITY` | 語意錯誤 |
| 429 | `RATE_LIMITED` | 請求過於頻繁 |
| 500 | `INTERNAL_ERROR` | 伺服器內部錯誤 |
| 503 | `SERVICE_UNAVAILABLE` | 服務暫時不可用 |
---
## 安全規範
### 認證
```http
Authorization: Bearer <jwt_token>
```
### 必要 Headers
```http
Content-Type: application/json
Accept: application/json
X-Request-ID: <uuid> #
X-Source: awoooi # (Kali Scanner )
```
### 敏感資料處理
```python
# ❌ 禁止: 回應中包含密碼、Token
{
"user": {
"password": "hashed_value", # 禁止
"apiKey": "sk-xxx" # 禁止
}
}
# ✅ 正確: 脫敏處理
{
"user": {
"hasPassword": true,
"apiKeyPrefix": "sk-***xxx"
}
}
```
### 日誌脫敏
```python
# 自動脫敏欄位
SENSITIVE_FIELDS = [
"password", "token", "api_key", "secret",
"authorization", "cookie", "credit_card"
]
# 日誌輸出
# ✅ {"user": "admin", "password": "[REDACTED]"}
# ❌ {"user": "admin", "password": "actual_password"}
```
---
## 快取策略
### TTL 分層
| 資料類型 | TTL | 說明 |
|---------|-----|------|
| 靜態配置 | 1 小時 | 系統設定、選單 |
| 聚合數據 | 5 分鐘 | Dashboard 統計 |
| 即時數據 | 30 秒 | 主機狀態、指標 |
| 無快取 | 0 | 審計日誌、敏感操作 |
### 快取 Key 規範
```
awoooi:{version}:{module}:{resource}:{id}:{params_hash}
# 範例
awoooi:v1:hosts:list:abc123
awoooi:v1:hosts:detail:h-001
awoooi:v1:metrics:dashboard:user-001:7d
```
---
## CI 檢查項目
### 自動化檢查
| 檢查項 | 工具 | 失敗處理 |
|-------|------|---------|
| OpenAPI Schema 驗證 | spectral | 阻擋合併 |
| OpenAPI ↔ 程式碼一致性 | openapi-diff | 阻擋合併 |
| 端點文檔存在性 | 自訂腳本 | 阻擋合併 |
| 測試覆蓋率 > 80% | pytest-cov | 阻擋合併 |
| 安全掃描 | bandit | 警告 |
### PR Checklist
```markdown
## API 變更 Checklist
- [ ] OpenAPI 規格已更新 (`docs/api/openapi.yaml`)
- [ ] 端點文檔已更新 (`docs/api/endpoints/...`)
- [ ] 單元測試已撰寫 (覆蓋率 > 80%)
- [ ] 錯誤處理已實作
- [ ] 快取策略已定義
- [ ] 限流規則已設定
- [ ] 日誌已加入 (不含敏感資料)
- [ ] CISO 已審核 (安全相關 API)
```
---
## 文檔工具
### 內部開發
- **Swagger UI**: `http://localhost:8000/docs`
- 用途: 開發階段測試、除錯
### 對外文檔
- **Scalar**: `https://api.awoooi.wooo.work/scalar`
- 風格: Nothing.tech 純白主題
- 用途: 外部開發者文檔
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此文件由 CTO 維護API 開發者必須遵守此 SOP。*

View File

@@ -0,0 +1,151 @@
# AWOOOI 架構文檔
> 統帥鐵律:嚴禁臨時方案,所有架構決策必須符合長期維護性
## 核心架構原則
### Four Iron Laws (四大鐵律)
1. **Async-First** - 所有 Handler 必須是 `async def`
2. **CORS Whitelist** - 嚴格來源控制,禁止 wildcard (*)
3. **Pydantic Config** - 類型安全的設定驗證
4. **structlog** - 結構化 JSON 日誌
## HTTP Client 架構 (2026-03-21 架構回歸)
### 問題背景
原始實作使用 `subprocess.run(["curl", ...])` 作為 httpx 404 問題的臨時解法。
統帥明令禁止此類臨時方案,要求回歸原生 httpx AsyncClient。
### 永久解決方案
```
src/core/http_client.py - Lifespan 管理的連線池
├── get_clickhouse_client() - ClickHouse 專用 Client
├── get_general_client() - Ollama/Gemini/Claude 通用 Client
├── init_all_http_clients() - 啟動時初始化
└── close_all_http_clients() - 關閉時清理
```
### 關鍵配置
```python
httpx.AsyncClient(
base_url=settings.CLICKHOUSE_URL,
timeout=httpx.Timeout(30.0, connect=10.0),
trust_env=False, # 🔧 禁止 HTTP_PROXY 干擾
limits=httpx.Limits(max_connections=100, max_keepalive_connections=20),
)
```
### Lifespan 整合
```python
# src/main.py
@asynccontextmanager
async def lifespan(_app: FastAPI):
# Startup
await init_all_http_clients() # ✅ 連線池建立
yield
# Shutdown
await close_all_http_clients() # ✅ 連線池回收
```
### 驗證結果
```
Status: 200
Elapsed: 28.71ms (< 50ms 目標)
Method: httpx_native
```
## 五主機架構
| 主機 | IP | 角色 | 服務 |
|-----|-----|------|------|
| DevOps | 192.168.0.110 | CI/CD | Harbor, GH Runner |
| Security | 192.168.0.112 | 安全掃描 | Kali Scanner |
| K3s Master | 192.168.0.120 | 容器編排 | K3s API Server |
| K3s Worker | 192.168.0.121 | 工作負載 | App Pods |
| AI+Web | 192.168.0.188 | AI/DB/Web | Ollama, PostgreSQL, Redis, SignOz |
## SignOz 整合架構
```
┌─────────────────────────────────────────────┐
│ AWOOOI API │
│ (port 8000) │
├─────────────────────────────────────────────┤
│ signoz_client.py │
│ └── get_clickhouse_client() │
│ └── httpx.AsyncClient (Lifespan) │
└─────────────────┬───────────────────────────┘
│ HTTP POST (< 50ms)
┌─────────────────────────────────────────────┐
│ ClickHouse HTTP API │
│ 192.168.0.188:8123 │
├─────────────────────────────────────────────┤
│ signoz_metrics.distributed_samples_v4 │
│ - signoz_calls_total (RPS) │
│ - signoz_latency_count (P99) │
└─────────────────────────────────────────────┘
```
## AI Fallback 策略 (ADR-006)
```
Ollama (local) → Gemini (cloud) → Claude (cloud) → mock_fallback
↓ ↓ ↓ ↓
免費 $0.001/1K $0.003/1K 開發用
188:11434 API Key API Key 無 LLM
```
## Phase 7: 視覺主權組件
### 已完成組件
| 組件 | 路徑 | 功能 |
|-----|------|------|
| GlobalPulseChart | `components/charts/global-pulse-chart.tsx` | 4 指標卡片 + Sparkline |
| AIProcessStepper | `components/charts/ai-process-stepper.tsx` | 5 步 AI 決策流程 |
| TimeSeriesChart | `components/charts/time-series-chart.tsx` | 通用趨勢圖 |
### Nothing.tech 設計語言
```css
/* 主色調 */
--nothing-white: #FFFFFF;
--nothing-gray-50: #FAFAFA;
--nothing-gray-900: #171717;
--nothing-red: #EF4444;
/* 玻璃效果 */
.glass-card {
background: rgba(255, 255, 255, 0.7);
backdrop-filter: blur(16px);
border: 1px solid rgba(0, 0, 0, 0.05);
}
```
## Phase 6: 架構硬化 Roadmap (規劃中)
> **來源**: `docs/ARCHITECTURE_CODE_REVIEW.md` 技術債審查
| 項目 | 現狀 | 目標 | 優先級 |
|------|------|------|--------|
| Multi-Sig 持久化 | In-Memory dict | Redis Hash + Redlock | 🔴 P0 |
| GraphRAG 遷移 | In-Memory dict | Neo4j / Redis Graph | 🔴 P0 |
| SSE 容錯驗證 | ADR-004 已規劃 | 驗證實作 | 🟢 P2 |
| 水平擴展 | 單實例 | Redis Pub/Sub + Sticky Session | 🟡 P1 |
**依賴**: Phase 5 (OpenClaw 實體化) 完成後執行
## 變更紀錄
| 日期 | 版本 | 變更 |
|-----|------|------|
| 2026-03-22 | 1.1 | 新增 Phase 6 架構硬化 Roadmap (Code Review 來源) |
| 2026-03-21 | 1.0 | 架構回歸:移除 subprocess+curl實作 httpx Lifespan |
| 2026-03-21 | 1.0 | Phase 7 視覺組件GlobalPulseChart, AIProcessStepper |

241
docs/architecture/WBS.md Normal file
View File

@@ -0,0 +1,241 @@
# AWOOOI 工作分解結構 (Work Breakdown Structure)
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CTO
> **狀態**: Phase 0 ✅ 完成 (已部署至 K3s)
---
## 專案總覽
| 項目 | 數值 |
|------|------|
| 總週數 | 24 週 |
| 總頁面 | 45 頁 (原 63 頁精簡) |
| 團隊規模 | 14 人 |
| MVP 交付 | Week 8 |
---
## Phase 0: 基建隔離 (Week 0-2)
### CIO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CIO-001 | K8s Namespace 建立 (awoooi-prod) | 2h | - | ✅ Script Ready |
| CIO-002 | Nginx 路由配置 (awoooi.wooo.work) | 4h | CIO-001 | ✅ YAML Ready |
| CIO-003 | NetworkPolicy 設定 | 4h | CIO-002 | ✅ Script Ready |
| CIO-004 | PgBouncer 部署與配置 | 4h | CIO-001 | ⏳ |
| CIO-005 | Redis DB Index 分配 (10-15) | 2h | - | ⏳ |
| CIO-006 | Harbor Project 建立 (awoooi/) | 2h | - | ⏳ |
| CIO-007 | GH Runner Label 配置 | 2h | - | ⏳ |
### CTO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CTO-001 | API 開發 SOP 文件 | 4h | - | ✅ |
| CTO-002 | OpenAPI 基礎規格 v1.0 | 8h | CTO-001 | ✅ |
| CTO-003 | ClawBot API 分離 (:8089) | 8h | - | ⏳ |
| CTO-004 | CI/CD API 契約檢查 | 4h | CTO-001 | ⏳ |
### CPO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CPO-001 | Tailwind 純白配置 (v2.0) | 4h | - | ✅ |
| CPO-002 | 原子組件規格文件 | 8h | CPO-001 | ✅ |
| CPO-003 | i18n 框架設定 (next-intl) | 4h | - | ✅ |
| CPO-004 | 字典檔結構 (zh-TW/en) | 4h | CPO-003 | ✅ |
### CISO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CISO-001 | RBAC Schema 設計 | 8h | - | ✅ |
| CISO-002 | 審計日誌規格 | 4h | - | ⏳ |
| CISO-003 | 威脅模型初版 | 8h | - | ⏳ |
---
## Phase 1: MVP 戰情室 (Week 3-8)
### CTO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CTO-101 | BFF Gateway 骨架 | 16h | CIO-001 | ✅ |
| CTO-102 | 四主機資料聚合服務 | 24h | CTO-101 | ✅ (Mock) |
| CTO-103 | SSE 即時推送實作 | 16h | CTO-102 | ✅ (骨架) |
| CTO-104 | AI Copilot 後端 API | 24h | CTO-003 | ⏳ |
| CTO-105 | Redis 快取層 (TTL 分層) | 8h | CIO-005 | ⏳ |
| CTO-106 | Blast Radius 計算引擎 | 16h | CTO-101 | ⏳ |
| CTO-107 | Multi-Sig 簽核後端 | 16h | CISO-001 | ⏳ |
### CPO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CPO-101 | GlassCard 組件 | 8h | CPO-001, CPO-002 | ✅ |
| CPO-102 | StatusOrb 呼吸燈 | 8h | CPO-101 | ✅ |
| CPO-103 | DotMatrixBg 背景 | 4h | CPO-001 | ✅ |
| CPO-104 | MetricValue 數值顯示 | 4h | CPO-101 | ✅ |
| CPO-105 | HostCard 主機卡片 | 8h | CPO-102, CPO-104 | ✅ |
| CPO-106 | AlertPanel 告警面板 | 8h | CPO-101 | ⏳ |
| CPO-107 | ApprovalCard HITL 卡片 | 16h | CPO-101 | ⏳ |
| CPO-108 | CommandPalette 快捷面板 | 16h | CPO-101 | ⏳ |
| CPO-109 | 戰情室頁面整合 | 24h | CTO-103, CPO-105 | ⏳ |
| CPO-110 | i18n 字典完善 | 8h | CPO-109 | ⏳ |
### CIO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CIO-101 | Prometheus 指標整合 | 8h | CIO-001 | ⏳ |
| CIO-102 | SigNoz 服務標籤配置 | 4h | CIO-001 | ⏳ |
| CIO-103 | Harbor Webhook 整合 | 4h | CIO-006 | ⏳ |
### CISO 工作項
| ID | 任務 | 預估 | 前置 | 狀態 |
|----|------|------|------|------|
| CISO-101 | JWT 認證整合 | 16h | CISO-001 | ⏳ |
| CISO-102 | Zero Trust NetworkPolicy | 8h | CIO-003 | ⏳ |
| CISO-103 | AI 行為審計日誌 | 8h | CTO-104, CISO-002 | ⏳ |
| CISO-104 | MVP 安全審查 | 16h | All MVP | ⏳ |
---
## Phase 2: 功能重構 (Week 9-16)
### Monitor 模組 (8 頁)
| ID | 任務 | 預估 | 負責人 |
|----|------|------|--------|
| MON-001 | Monitor Dashboard | 24h | CPO |
| MON-002 | 服務健康頁 | 16h | CPO |
| MON-003 | 指標詳情頁 | 16h | CPO |
| MON-004 | 告警列表頁 | 16h | CPO |
| MON-005 | 告警詳情頁 | 8h | CPO |
| MON-006 | AI 異常偵測 API | 24h | CTO |
| MON-007 | 即時圖表組件 (D3.js) | 24h | CPO |
### Security 模組 (15 頁,含 Compliance 整合)
| ID | 任務 | 預估 | 負責人 |
|----|------|------|--------|
| SEC-001 | Security Dashboard | 24h | CPO |
| SEC-002 | 漏洞列表頁 | 16h | CPO |
| SEC-003 | 掃描報告頁 | 16h | CPO |
| SEC-004 | AI 漏洞分析 API | 24h | CTO + CISO |
| SEC-005 | 合規報告頁 (整合) | 16h | CPO |
| SEC-006 | RBAC 管理頁 | 16h | CPO + CISO |
### Deploy 模組 (6 頁)
| ID | 任務 | 預估 | 負責人 |
|----|------|------|--------|
| DEP-001 | Deploy Dashboard | 24h | CPO |
| DEP-002 | Pipeline 詳情頁 | 16h | CPO |
| DEP-003 | Dry-Run 預演頁 | 24h | CPO + CTO |
| DEP-004 | Blast Radius 視覺化 | 24h | CPO + CTO |
---
## Phase 3: 剩餘功能 + GA (Week 17-24)
### 剩餘模組
| 模組 | 頁數 | 負責人 |
|------|------|--------|
| Tickets 工單 | 6 | CPO |
| Billing 帳單 | 4 | CPO |
| Settings 設定 | 6 | CPO |
| Plugin 管理 | 2 | CPO + CTO |
| AI Copilot 設定 | 1 | CPO |
### GA 準備
| ID | 任務 | 預估 | 負責人 |
|----|------|------|--------|
| GA-001 | E2E 測試完整 | 40h | QA |
| GA-002 | 滲透測試 | 24h | CISO |
| GA-003 | 效能測試 | 16h | CTO |
| GA-004 | 文檔完善 | 24h | 全員 |
| GA-005 | 遷移腳本執行 | 8h | CTO |
| GA-006 | 舊系統凍結 | 4h | CIO |
---
## 依賴圖 (關鍵路徑)
```
Week 0-2 (基建)
═══════════════════════════════════════════════════════════════
CIO-001 ──→ CIO-002 ──→ CIO-003 ──→ CISO-102
│ │
│ └──→ CIO-004
└──→ CTO-101 (Phase 1 關鍵)
CPO-001 ──→ CPO-002 ──→ CPO-101 (Phase 1 關鍵)
CISO-001 ──→ CISO-101 ──→ CTO-107
Week 3-8 (MVP 關鍵路徑)
═══════════════════════════════════════════════════════════════
CTO-101 ──→ CTO-102 ──→ CTO-103 ──┐
CPO-101 ──→ CPO-102 ──→ CPO-105 ──┼──→ CPO-109 (戰情室)
CTO-003 ──→ CTO-104 ───────────────┘
Week 8 MVP
```
---
## RACI 矩陣
| 工作項 | CTO | CIO | CPO | CISO |
|--------|:---:|:---:|:---:|:----:|
| K8s 基建 | C | **R** | I | I |
| API 設計 | **R** | C | C | C |
| BFF Gateway | **R** | C | I | I |
| UI 組件 | C | I | **R** | I |
| 頁面開發 | I | I | **R** | I |
| 認證授權 | C | I | I | **R** |
| 網路安全 | I | **R** | I | **A** |
| i18n | I | I | **R** | I |
| 遷移腳本 | **R** | C | I | **A** |
| 文檔維護 | **R** | C | C | C |
> R = Responsible (執行), A = Accountable (負責), C = Consulted (諮詢), I = Informed (知會)
---
## 風險登記
| 風險 | 機率 | 影響 | 緩解措施 | Owner |
|------|------|------|---------|-------|
| BFF 效能瓶頸 | 中 | 高 | Redis 快取 + 連線池 | CTO |
| 遷移資料遺失 | 低 | 極高 | 事務性遷移 + 驗證 | CTO |
| 安全漏洞 | 中 | 極高 | MVP 滲透測試 | CISO |
| 進度延遲 | 中 | 中 | 每週 Review | CTO |
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此文件由 CTO 維護,每週更新進度狀態。*

View File

@@ -0,0 +1,655 @@
# AWOOOI 原子組件庫規格
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CPO
> **設計系統**: Nothing.tech 純白工業風
---
## 概述
本文件定義 AWOOOI 前端組件庫的設計規格,採用 Atomic Design 原則,確保視覺一致性與開發效率。
---
## 設計 Token
### 色彩系統
```typescript
// packages/lewooogo-ui/src/tokens/colors.ts
export const colors = {
// 基底色 (Pure White Base)
white: '#FFFFFF',
snow: '#FAFAFA', // 主背景
cloud: '#F5F5F5', // 次背景/卡片
mist: '#E5E5E5', // 邊框/分隔線
// 文字色 (High Contrast)
ink: '#0A0A0A', // 主文字
gray: {
600: '#6B7280', // 次文字
400: '#9CA3AF', // 輔助文字
200: '#E5E7EB', // 禁用文字
},
// 功能色
status: {
success: '#10B981',
warning: '#F59E0B',
error: '#EF4444',
info: '#3B82F6',
thinking: '#8B5CF6', // AI 思考中
},
// 品牌色
brand: {
primary: '#FF6B35', // AWOOOI 橘
nothingRed: '#D71921', // Nothing 品牌紅
},
} as const;
```
### 間距系統
```typescript
// packages/lewooogo-ui/src/tokens/spacing.ts
export const spacing = {
0: '0',
1: '4px',
2: '8px',
3: '12px',
4: '16px',
5: '20px',
6: '24px',
8: '32px',
10: '40px',
12: '48px',
16: '64px',
} as const;
```
### 字體系統
```typescript
// packages/lewooogo-ui/src/tokens/typography.ts
export const typography = {
fontFamily: {
display: '"NDot", monospace', // AI 介面/數字
body: '"Inter", system-ui', // 一般文字
mono: '"JetBrains Mono", monospace', // 程式碼
},
fontSize: {
xs: '12px',
sm: '14px',
base: '16px',
lg: '18px',
xl: '20px',
'2xl': '24px',
'3xl': '30px',
'4xl': '36px',
},
fontWeight: {
normal: 400,
medium: 500,
semibold: 600,
bold: 700,
},
lineHeight: {
tight: 1.25,
normal: 1.5,
relaxed: 1.75,
},
} as const;
```
### 效果系統
```typescript
// packages/lewooogo-ui/src/tokens/effects.ts
export const effects = {
// 白玻璃效果 (White Glassmorphism)
glass: {
background: 'rgba(255, 255, 255, 0.7)',
blur: 'blur(20px)',
border: 'rgba(0, 0, 0, 0.05)',
},
// 點陣紋理 (Dot Matrix)
dotMatrix: {
pattern: 'radial-gradient(circle, rgba(0, 0, 0, 0.03) 1px, transparent 1px)',
size: '16px 16px',
},
// 陰影
shadow: {
sm: '0 1px 2px rgba(0, 0, 0, 0.05)',
md: '0 4px 6px rgba(0, 0, 0, 0.05)',
lg: '0 10px 15px rgba(0, 0, 0, 0.05)',
glow: '0 0 20px rgba(255, 107, 53, 0.2)', // 品牌光暈
},
// 圓角
radius: {
sm: '4px',
md: '8px',
lg: '12px',
xl: '16px',
full: '9999px',
},
// 過渡
transition: {
fast: '150ms ease',
normal: '250ms ease',
slow: '350ms ease',
},
} as const;
```
---
## Atoms (原子組件)
### StatusOrb - 狀態呼吸燈
```tsx
// packages/lewooogo-ui/src/atoms/StatusOrb.tsx
interface StatusOrbProps {
status: 'healthy' | 'warning' | 'critical' | 'unknown' | 'thinking';
size?: 'sm' | 'md' | 'lg';
pulse?: boolean;
label?: string;
}
/**
* 狀態呼吸燈
* - 即時反映系統/主機狀態
* - 支援脈衝動畫 (告警/思考中)
*
* @example
* <StatusOrb status="healthy" size="md" />
* <StatusOrb status="thinking" pulse label="AI 處理中" />
*/
```
**視覺規格**:
| Size | 直徑 | 光暈半徑 |
|------|------|---------|
| sm | 8px | 12px |
| md | 12px | 18px |
| lg | 16px | 24px |
**狀態色彩**:
| Status | 色彩 | 脈衝 |
|--------|------|------|
| healthy | `#10B981` | 無 |
| warning | `#F59E0B` | 慢 (2s) |
| critical | `#EF4444` | 快 (0.5s) |
| unknown | `#9CA3AF` | 無 |
| thinking | `#8B5CF6` | 中 (1s) |
---
### MetricValue - 數值顯示
```tsx
// packages/lewooogo-ui/src/atoms/MetricValue.tsx
interface MetricValueProps {
value: number | string;
unit?: string;
trend?: 'up' | 'down' | 'stable';
trendValue?: string;
size?: 'sm' | 'md' | 'lg' | 'xl';
format?: 'number' | 'percent' | 'bytes' | 'duration';
}
/**
* 數值顯示組件
* - NDot 字體呈現數字
* - 支援趨勢箭頭與變化值
*
* @example
* <MetricValue value={99.9} unit="%" trend="up" trendValue="+0.1%" />
* <MetricValue value={1024} format="bytes" /> // 顯示 "1 KB"
*/
```
**視覺規格**:
| Size | 字體大小 | 權重 |
|------|---------|------|
| sm | 18px | 500 |
| md | 24px | 600 |
| lg | 36px | 700 |
| xl | 48px | 700 |
---
### IconButton - 圖示按鈕
```tsx
// packages/lewooogo-ui/src/atoms/IconButton.tsx
interface IconButtonProps {
icon: ReactNode;
variant?: 'ghost' | 'outline' | 'solid';
size?: 'sm' | 'md' | 'lg';
color?: 'default' | 'primary' | 'danger';
disabled?: boolean;
loading?: boolean;
tooltip?: string;
onClick?: () => void;
}
/**
* 圖示按鈕
* - 用於工具列、操作區
* - 必須有 tooltip 說明
*/
```
---
### Badge - 標籤徽章
```tsx
// packages/lewooogo-ui/src/atoms/Badge.tsx
interface BadgeProps {
children: ReactNode;
variant?: 'solid' | 'outline' | 'subtle';
color?: 'gray' | 'green' | 'yellow' | 'red' | 'blue' | 'purple' | 'orange';
size?: 'sm' | 'md';
dot?: boolean;
}
/**
* 標籤徽章
* - 用於狀態標示、分類
*
* @example
* <Badge color="green">Active</Badge>
* <Badge color="red" dot>3 Alerts</Badge>
*/
```
---
## Molecules (分子組件)
### GlassCard - 玻璃卡片
```tsx
// packages/lewooogo-ui/src/molecules/GlassCard.tsx
interface GlassCardProps {
children: ReactNode;
variant?: 'default' | 'elevated' | 'bordered';
padding?: 'sm' | 'md' | 'lg';
interactive?: boolean;
selected?: boolean;
onClick?: () => void;
}
/**
* 白玻璃卡片
* - Nothing.tech 核心視覺元素
* - 支援點擊交互與選中狀態
*
* CSS:
* - background: rgba(255, 255, 255, 0.7)
* - backdrop-filter: blur(20px)
* - border: 1px solid rgba(0, 0, 0, 0.05)
*/
```
**視覺規格**:
| Variant | 背景 | 邊框 | 陰影 |
|---------|------|------|------|
| default | glass | subtle | sm |
| elevated | glass | subtle | lg |
| bordered | white | solid | none |
---
### HostCard - 主機卡片
```tsx
// packages/lewooogo-ui/src/molecules/HostCard.tsx
interface HostCardProps {
host: {
id: string;
name: string;
ip: string;
role: string;
status: 'healthy' | 'warning' | 'critical' | 'unknown';
metrics?: {
cpu: number;
memory: number;
disk: number;
};
lastSeen?: string;
};
variant?: 'compact' | 'detailed';
showMetrics?: boolean;
onClick?: () => void;
}
/**
* 主機狀態卡片
* - 戰情室核心組件
* - 整合 StatusOrb + MetricValue
*
* @example
* <HostCard host={hostData} variant="detailed" showMetrics />
*/
```
**佈局**:
```
┌──────────────────────────────────────┐
│ ● web-server-01 [healthy] │
│ 192.168.0.188 · AI+Web 中心 │
├──────────────────────────────────────┤
│ CPU Memory Disk │
│ 45% 72% 58% │
│ ████░ ███████░ █████░ │
└──────────────────────────────────────┘
```
---
### AlertPanel - 告警面板
```tsx
// packages/lewooogo-ui/src/molecules/AlertPanel.tsx
interface AlertPanelProps {
alerts: Alert[];
maxVisible?: number;
showTimestamp?: boolean;
onAlertClick?: (alert: Alert) => void;
onAcknowledge?: (alertId: string) => void;
}
interface Alert {
id: string;
severity: 'info' | 'warning' | 'critical';
title: string;
message: string;
source: string;
timestamp: string;
acknowledged?: boolean;
}
/**
* 告警列表面板
* - 即時更新 (SSE)
* - 支援確認操作
*/
```
---
### ApprovalCard - HITL 審批卡片
```tsx
// packages/lewooogo-ui/src/molecules/ApprovalCard.tsx
interface ApprovalCardProps {
approval: {
id: string;
type: 'deploy' | 'rollback' | 'config' | 'security';
title: string;
description: string;
requester: string;
blastRadius: 'low' | 'medium' | 'high' | 'critical';
signaturesRequired: number;
signaturesCollected: Signature[];
expiresAt: string;
aiSummary?: string;
aiConfidence?: number;
};
currentUser: string;
onApprove?: () => void;
onReject?: () => void;
onRequestInfo?: () => void;
}
/**
* Human-In-The-Loop 審批卡片
* - 顯示 Blast Radius 風險等級
* - 支援 Multi-Sig 簽核進度
* - AI 摘要與信心度
*/
```
**佈局**:
```
┌──────────────────────────────────────────────────┐
│ 🚀 部署請求: web-api v2.3.1 │
│ ════════════════════════════════════════════════ │
│ │
│ 📊 Blast Radius: ████████░░ HIGH │
│ 影響: 3 服務 · 12 Pods · ~5000 用戶 │
│ │
│ 🤖 AI 摘要: 此更新包含 API 破壞性變更,建議 │
│ 先通知下游服務團隊。信心度: 87% │
│ │
│ ✍️ 簽核進度: 1/2 │
│ ✅ CTO (2026-03-20 10:30) │
│ ⏳ CISO │
│ │
│ [詢問更多] [拒絕] [批准] │
└──────────────────────────────────────────────────┘
```
---
## Organisms (組織組件)
### CommandPalette - 快捷命令面板
```tsx
// packages/lewooogo-ui/src/organisms/CommandPalette.tsx
interface CommandPaletteProps {
isOpen: boolean;
onClose: () => void;
commands: Command[];
recentCommands?: string[];
onCommandSelect: (command: Command) => void;
}
interface Command {
id: string;
label: string;
description?: string;
icon?: ReactNode;
shortcut?: string;
category: 'navigation' | 'action' | 'ai' | 'settings';
action: () => void;
}
/**
* Cmd+K 快捷命令面板
* - 全站快速導航與操作
* - 支援模糊搜尋
* - 顯示快捷鍵提示
*/
```
**快捷鍵**:
| 快捷鍵 | 功能 |
|-------|------|
| `Cmd+K` | 開啟面板 |
| `Esc` | 關閉面板 |
| `↑/↓` | 選擇項目 |
| `Enter` | 執行命令 |
---
### ThinkingTerminal - AI 思考終端
```tsx
// packages/lewooogo-ui/src/organisms/ThinkingTerminal.tsx
interface ThinkingTerminalProps {
isOpen: boolean;
stream: ThinkingStream | null;
position?: 'bottom' | 'right';
collapsible?: boolean;
}
interface ThinkingStream {
status: 'idle' | 'thinking' | 'completed' | 'error';
steps: ThinkingStep[];
result?: string;
error?: string;
}
interface ThinkingStep {
id: string;
type: 'input' | 'process' | 'output' | 'tool_call';
content: string;
timestamp: string;
duration?: number;
}
/**
* AI 思考過程終端
* - SSE 即時串流
* - 打字機效果
* - 支援摺疊
*/
```
---
### DataPincer - 數據鉗視覺化
```tsx
// packages/lewooogo-ui/src/organisms/DataPincer.tsx
interface DataPincerProps {
data: {
hosts: HostData[];
connections: Connection[];
flows: DataFlow[];
};
viewMode?: '2d' | '3d';
interactive?: boolean;
highlightHost?: string;
onHostClick?: (hostId: string) => void;
}
/**
* 數據鉗視覺化組件
* - AWOOOI 品牌視覺符號
* - 四主機架構拓撲圖
* - 即時資料流動畫
*/
```
---
## Templates (模板)
### DashboardLayout - 儀表板佈局
```tsx
// packages/lewooogo-ui/src/templates/DashboardLayout.tsx
interface DashboardLayoutProps {
children: ReactNode;
sidebar?: ReactNode;
header?: ReactNode;
background?: 'default' | 'dotMatrix';
}
/**
* 儀表板頁面佈局
* - 支援側邊欄
* - 點陣背景紋理
* - 響應式設計
*/
```
**斷點**:
| 斷點 | 寬度 | 佈局 |
|------|------|------|
| mobile | < 768px | 單欄 + 抽屜選單 |
| tablet | 768-1024px | 摺疊側欄 |
| desktop | 1024-1440px | 展開側欄 |
| wide | > 1440px | 展開側欄 + 更多欄位 |
---
## 無障礙規範 (WCAG 2.1 AA)
### 對比度要求
| 元素 | 最小對比度 | 實際值 |
|------|-----------|--------|
| 正文文字 | 4.5:1 | 17.3:1 (ink/snow) |
| 大標題 | 3:1 | 17.3:1 |
| 非文字元素 | 3:1 | 符合 |
| 狀態色 (綠) | 3:1 | 4.5:1 (success/snow) |
### 互動元素
- 所有可點擊元素必須有 `focus-visible` 樣式
- 鍵盤可操作 (Tab 導航)
- 適當的 `aria-label`
- 螢幕閱讀器支援
### Focus 樣式
```css
.focus-visible {
outline: 2px solid var(--brand-primary);
outline-offset: 2px;
}
```
---
## 組件狀態
| 組件 | 狀態 | 負責人 |
|------|------|--------|
| StatusOrb | ⏳ 待開發 | CPO |
| MetricValue | ⏳ 待開發 | CPO |
| IconButton | ⏳ 待開發 | CPO |
| Badge | ⏳ 待開發 | CPO |
| GlassCard | ⏳ 待開發 | CPO |
| HostCard | ⏳ 待開發 | CPO |
| AlertPanel | ⏳ 待開發 | CPO |
| ApprovalCard | ⏳ 待開發 | CPO |
| CommandPalette | ⏳ 待開發 | CPO |
| ThinkingTerminal | ⏳ 待開發 | CPO |
| DataPincer | ⏳ 待開發 | CPO |
| DashboardLayout | ⏳ 待開發 | CPO |
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CPO |
---
*此文件由 CPO 維護,前端開發者必須遵守此規格。*

View File

@@ -0,0 +1,425 @@
# AWOOOI i18n 字典檔結構規範
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CPO
> **框架**: next-intl
---
## 概述
> 🎯 **顧問深度討論 #4**: 語意化 Key 命名學
此文件定義 AWOOOI 國際化字典檔的結構規範,採用語意化樹狀命名,
確保 500+ 個翻譯項目在工程師與翻譯員之間有明確的上下文。
---
## 命名規範
### Key 命名格式
```
[頁面].[組件].[元素]_[動作/狀態]
```
### 正確與錯誤範例
```json
// ❌ 錯誤: 無上下文,容易混淆
{
"approve": "批准",
"cancel": "取消",
"error": "發生錯誤",
"warning": "警告"
}
// ✅ 正確: 語意化命名,清楚上下文
{
"dashboard": {
"approval_card": {
"btn_approve": "批准執行",
"btn_reject": "拒絕",
"btn_request_info": "詢問更多",
"label_blast_radius": "爆炸半徑",
"status_pending": "等待簽核",
"status_approved": "已批准"
}
}
}
```
---
## 字典檔結構
### 目錄結構
```
apps/web/messages/
├── zh-TW.json # 繁體中文 (主要)
└── en.json # English
```
### 頂層分類
```json
{
"common": {}, // 共用元素 (按鈕、狀態、錯誤)
"layout": {}, // 佈局元素 (導航、側邊欄、頁尾)
"dashboard": {}, // 戰情室頁面
"monitor": {}, // 監控模組
"security": {}, // 安全模組
"deploy": {}, // 部署模組
"tickets": {}, // 工單模組
"billing": {}, // 帳單模組
"settings": {}, // 設定模組
"ai_copilot": {}, // AI 助手
"errors": {}, // 錯誤訊息
"validation": {} // 表單驗證
}
```
---
## 完整字典檔範本
### zh-TW.json
```json
{
"common": {
"btn": {
"save": "儲存",
"cancel": "取消",
"confirm": "確認",
"delete": "刪除",
"edit": "編輯",
"view": "查看",
"back": "返回",
"next": "下一步",
"previous": "上一步",
"submit": "送出",
"reset": "重設",
"search": "搜尋",
"filter": "篩選",
"export": "匯出",
"import": "匯入",
"refresh": "重新整理"
},
"status": {
"loading": "載入中...",
"success": "成功",
"error": "失敗",
"pending": "處理中",
"active": "啟用",
"inactive": "停用",
"healthy": "健康",
"warning": "警告",
"critical": "嚴重",
"unknown": "未知"
},
"time": {
"just_now": "剛剛",
"minutes_ago": "{count} 分鐘前",
"hours_ago": "{count} 小時前",
"days_ago": "{count} 天前",
"today": "今天",
"yesterday": "昨天"
},
"pagination": {
"page": "第 {current} 頁,共 {total} 頁",
"showing": "顯示 {from}-{to},共 {total} 筆",
"per_page": "每頁顯示"
}
},
"layout": {
"nav": {
"dashboard": "戰情室",
"monitor": "監控",
"security": "安全",
"deploy": "部署",
"tickets": "工單",
"billing": "帳單",
"settings": "設定"
},
"sidebar": {
"collapse": "收合側邊欄",
"expand": "展開側邊欄"
},
"header": {
"search_placeholder": "搜尋... (⌘K)",
"notifications": "通知",
"profile": "個人檔案",
"logout": "登出"
},
"footer": {
"copyright": "© {year} 岑洋國際行銷有限公司",
"version": "版本 {version}"
}
},
"dashboard": {
"page_title": "戰情室",
"page_description": "系統健康狀態總覽",
"host_card": {
"title": "主機狀態",
"label_ip": "IP 位址",
"label_role": "角色",
"label_cpu": "CPU",
"label_memory": "記憶體",
"label_disk": "磁碟",
"label_last_seen": "最後更新",
"status_online": "上線",
"status_offline": "離線"
},
"alert_panel": {
"title": "即時告警",
"btn_acknowledge": "確認",
"btn_view_all": "查看全部",
"empty_state": "目前沒有告警",
"severity_info": "資訊",
"severity_warning": "警告",
"severity_critical": "嚴重"
},
"approval_card": {
"title": "待簽核項目",
"btn_approve": "批准執行",
"btn_reject": "拒絕",
"btn_request_info": "詢問更多",
"label_requester": "申請人",
"label_blast_radius": "爆炸半徑",
"label_signatures": "簽核進度",
"label_expires_in": "剩餘時間",
"label_ai_summary": "AI 摘要",
"label_confidence": "信心度",
"status_pending": "等待簽核",
"status_approved": "已批准",
"status_rejected": "已拒絕",
"status_expired": "已過期",
"blast_low": "低",
"blast_medium": "中",
"blast_high": "高",
"blast_critical": "嚴重"
},
"metrics": {
"total_hosts": "主機總數",
"active_alerts": "活躍告警",
"pending_approvals": "待簽核",
"deployments_today": "今日部署"
}
},
"ai_copilot": {
"title": "AI 助手",
"placeholder": "輸入問題或指令...",
"btn_send": "送出",
"btn_stop": "停止",
"btn_clear": "清除對話",
"thinking": "AI 思考中...",
"error_offline": "AI 服務暫時不可用",
"error_timeout": "回應超時,請重試",
"suggestion_prefix": "建議",
"action_prefix": "建議執行",
"warning_destructive": "此操作具有破壞性,請謹慎執行"
},
"command_palette": {
"placeholder": "輸入指令...",
"category_navigation": "導航",
"category_action": "操作",
"category_ai": "AI 功能",
"category_settings": "設定",
"no_results": "沒有符合的結果",
"hint_shortcut": "快捷鍵"
},
"errors": {
"generic": "發生錯誤,請稍後再試",
"network": "網路連線失敗",
"unauthorized": "您沒有權限執行此操作",
"not_found": "找不到請求的資源",
"validation": "輸入資料驗證失敗",
"server": "伺服器錯誤",
"timeout": "請求超時",
"rate_limited": "請求過於頻繁,請稍後再試"
},
"validation": {
"required": "此欄位為必填",
"email": "請輸入有效的電子郵件",
"min_length": "至少需要 {min} 個字元",
"max_length": "最多 {max} 個字元",
"pattern": "格式不正確"
}
}
```
### en.json
```json
{
"common": {
"btn": {
"save": "Save",
"cancel": "Cancel",
"confirm": "Confirm",
"delete": "Delete",
"edit": "Edit",
"view": "View",
"back": "Back",
"next": "Next",
"previous": "Previous",
"submit": "Submit",
"reset": "Reset",
"search": "Search",
"filter": "Filter",
"export": "Export",
"import": "Import",
"refresh": "Refresh"
},
"status": {
"loading": "Loading...",
"success": "Success",
"error": "Failed",
"pending": "Processing",
"active": "Active",
"inactive": "Inactive",
"healthy": "Healthy",
"warning": "Warning",
"critical": "Critical",
"unknown": "Unknown"
}
},
"dashboard": {
"page_title": "War Room",
"page_description": "System Health Overview",
"approval_card": {
"title": "Pending Approvals",
"btn_approve": "Approve",
"btn_reject": "Reject",
"btn_request_info": "Request Info",
"label_blast_radius": "Blast Radius",
"label_signatures": "Signatures",
"status_pending": "Pending"
}
},
"ai_copilot": {
"title": "AI Assistant",
"placeholder": "Enter a question or command...",
"thinking": "AI is thinking..."
},
"errors": {
"generic": "An error occurred. Please try again later.",
"network": "Network connection failed",
"unauthorized": "You don't have permission to perform this action"
}
}
```
---
## 使用方式
### 組件中使用
```tsx
// apps/web/src/components/ApprovalCard.tsx
import { useTranslations } from 'next-intl';
export function ApprovalCard({ approval }: Props) {
const t = useTranslations('dashboard.approval_card');
return (
<div>
<h3>{t('title')}</h3>
<p>{t('label_blast_radius')}: {t(`blast_${approval.blastRadius}`)}</p>
<button>{t('btn_approve')}</button>
<button>{t('btn_reject')}</button>
</div>
);
}
```
### 動態參數
```tsx
// 使用變數
const t = useTranslations('common.time');
t('minutes_ago', { count: 5 }); // "5 分鐘前"
const t = useTranslations('common.pagination');
t('showing', { from: 1, to: 20, total: 100 }); // "顯示 1-20共 100 筆"
```
---
## CI 檢查規則
### 翻譯完整性檢查
```typescript
// scripts/check-i18n.ts
import zhTW from '../messages/zh-TW.json';
import en from '../messages/en.json';
function getAllKeys(obj: object, prefix = ''): string[] {
return Object.entries(obj).flatMap(([key, value]) => {
const fullKey = prefix ? `${prefix}.${key}` : key;
return typeof value === 'object'
? getAllKeys(value, fullKey)
: [fullKey];
});
}
const zhKeys = new Set(getAllKeys(zhTW));
const enKeys = new Set(getAllKeys(en));
const missingInEn = [...zhKeys].filter(k => !enKeys.has(k));
const missingInZh = [...enKeys].filter(k => !zhKeys.has(k));
if (missingInEn.length > 0) {
console.error('❌ Missing in en.json:', missingInEn);
process.exit(1);
}
if (missingInZh.length > 0) {
console.error('❌ Missing in zh-TW.json:', missingInZh);
process.exit(1);
}
console.log('✅ All translations are complete');
```
### PR Checklist
```markdown
## i18n Checklist
- [ ] 新增的 UI 文字已加入 zh-TW.json
- [ ] 新增的 UI 文字已加入 en.json
- [ ] Key 命名遵循 `[頁面].[組件].[元素]` 格式
- [ ] CI i18n 檢查通過
```
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CPO |
---
*此文件由 CPO 維護,前端開發者新增 UI 文字時必須遵守。*

View File

@@ -0,0 +1,480 @@
# AWOOOI 部署契約 (Deployment Contracts)
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CIO
> **強制等級**: 施工前必須遵守
---
## 概述
此文件定義 AWOOOI 部署的「鐵律級」配置規範。
**施工前必須確認此契約,否則禁止開始基建工作。**
---
## 環境架構 (CEO 指示 #3)
> ⚠️ **重要**: AWOOOI 只有兩個環境,不設 UAT
| 環境 | 用途 | 域名 | K8s Namespace | 備註 |
|------|------|------|---------------|------|
| **Dev** | 本機開發 | `localhost:3000` | - | 開發者本機 |
| **Prod** | 生產環境 | `awoooi.wooo.work` | `awoooi-prod` | 唯一線上環境 |
### 與舊系統完全隔離
| 項目 | AWOOOI (新) | Legacy (舊) |
|------|-------------|-------------|
| 域名 | `awoooi.wooo.work` | `aiops.wooo.work` |
| Namespace | `awoooi-prod` | `wooo-aiops` |
| Frontend Port | 32335 | 31235 |
| API Port | 32334 | 31234 |
| ClawBot Port | 8089 | 8088 |
| Redis DB | 10-15 | 0-9 |
---
## CIO-001: K8s Namespace 資源配額
> 🎯 **顧問深度討論 #3**: 防止 Memory Leak 拖垮叢集
### ResourceQuota 配置
```yaml
# k8s/quotas/awoooi-prod-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: awoooi-prod-quota
namespace: awoooi-prod
spec:
hard:
# 計算資源上限 (叢集 40%)
requests.cpu: "4" # 4 cores
requests.memory: "8Gi" # 8GB
limits.cpu: "8" # 8 cores
limits.memory: "16Gi" # 16GB
# Pod 數量限制
pods: "50"
# 儲存限制
persistentvolumeclaims: "10"
requests.storage: "100Gi"
```
### LimitRange 配置
```yaml
# k8s/quotas/awoooi-prod-limits.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: awoooi-prod-limits
namespace: awoooi-prod
spec:
limits:
# 預設容器限制
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "2"
memory: "4Gi"
min:
cpu: "50m"
memory: "64Mi"
# Pod 總限制
- type: Pod
max:
cpu: "4"
memory: "8Gi"
```
### 強制規則
1. **新 Deployment 必須指定 resources**: 沒有 requests/limits 將被拒絕
2. **禁止 BestEffort QoS**: 所有 Pod 必須有明確資源定義
3. **定期檢查**: 每週檢查資源使用率,超過 70% 發出告警
---
## CIO-002: Nginx SSE 長連線配置
> 🎯 **顧問深度討論 #2**: 防止 SSE 每 60 秒斷線
### Nginx 配置範本
```nginx
# k8s/nginx/awoooi-prod.conf
# 上游服務定義
upstream awoooi-api {
server awoooi-api-service:8000;
keepalive 32;
}
upstream awoooi-web {
server awoooi-web-service:3000;
keepalive 16;
}
server {
listen 443 ssl http2;
server_name awoooi.wooo.work;
# SSL 配置
ssl_certificate /etc/nginx/ssl/awoooi.crt;
ssl_certificate_key /etc/nginx/ssl/awoooi.key;
# === SSE 專用路由 (AI 思考串流) ===
location ~ ^/api/v1/(agent|dashboard)/stream {
proxy_pass http://awoooi-api;
# ⚠️ 關鍵: SSE 必要配置
proxy_buffering off; # 禁用緩衝 (打字機效果零延遲)
proxy_read_timeout 3600s; # 1 小時長連線
proxy_send_timeout 3600s;
proxy_connect_timeout 60s;
# HTTP/1.1 長連線
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Accel-Buffering no;
# 標準 Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# === 一般 API 路由 ===
location /api/ {
proxy_pass http://awoooi-api;
proxy_http_version 1.1;
proxy_set_header Connection "keep-alive";
proxy_read_timeout 60s;
proxy_send_timeout 60s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# === 前端靜態資源 ===
location / {
proxy_pass http://awoooi-web;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# 健康檢查 (不經認證)
location /health {
proxy_pass http://awoooi-api/health;
proxy_read_timeout 5s;
}
}
```
### SSE 測試腳本
```bash
#!/bin/bash
# scripts/test-sse.sh
# 測試 SSE 連線是否正常
echo "Testing SSE connection to awoooi.wooo.work..."
timeout 120 curl -N \
-H "Accept: text/event-stream" \
-H "Authorization: Bearer $TOKEN" \
"https://awoooi.wooo.work/api/v1/agent/stream?prompt=test"
if [ $? -eq 124 ]; then
echo "✅ SSE connection held for 2 minutes (test passed)"
else
echo "❌ SSE connection dropped unexpectedly"
exit 1
fi
```
---
## CIO-003: NetworkPolicy 零信任邊界
> 🎯 **顧問深度討論 #1**: Default Deny All 策略
### 預設拒絕策略
```yaml
# k8s/network-policies/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: awoooi-prod
spec:
podSelector: {} # 套用到所有 Pod
policyTypes:
- Ingress
- Egress
```
### 允許清單 - Ingress
```yaml
# k8s/network-policies/allow-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-nginx-to-services
namespace: awoooi-prod
spec:
podSelector:
matchLabels:
app: awoooi-api
policyTypes:
- Ingress
ingress:
# 只允許 Nginx Ingress Controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-nginx-to-web
namespace: awoooi-prod
spec:
podSelector:
matchLabels:
app: awoooi-web
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 3000
```
### 允許清單 - Egress
```yaml
# k8s/network-policies/allow-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-to-services
namespace: awoooi-prod
spec:
podSelector:
matchLabels:
app: awoooi-api
policyTypes:
- Egress
egress:
# 允許訪問 PostgreSQL (192.168.0.188:5432)
- to:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 5432
# 允許訪問 Redis DB 10-15 (192.168.0.188:6380)
- to:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 6380
# 允許訪問 Ollama (192.168.0.188:11434)
- to:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 11434
# 允許訪問 ClawBot AWOOOI (192.168.0.188:8089)
- to:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 8089
# 允許訪問 Kali Scanner (192.168.0.112:8080)
- to:
- ipBlock:
cidr: 192.168.0.112/32
ports:
- protocol: TCP
port: 8080
# 允許 DNS 解析
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# 允許訪問外部 AI API (雲端備援)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443
```
### 禁止訪問 Legacy Namespace
```yaml
# k8s/network-policies/deny-legacy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-access-to-legacy
namespace: awoooi-prod
spec:
podSelector: {}
policyTypes:
- Egress
egress:
# 明確拒絕 Legacy Namespace
- to:
- namespaceSelector:
matchLabels:
name: wooo-aiops
# 沒有 ports = 全部拒絕
```
---
## 監控與告警配置
### Prometheus 告警規則
```yaml
# k8s/monitoring/prometheus/awoooi-alerts.yaml
groups:
- name: awoooi-resource-alerts
rules:
# CPU 使用率告警
- alert: AWOOOIHighCPUUsage
expr: |
sum(rate(container_cpu_usage_seconds_total{namespace="awoooi-prod"}[5m]))
/ sum(kube_resourcequota{namespace="awoooi-prod", resource="limits.cpu"})
> 0.7
for: 5m
labels:
severity: warning
annotations:
summary: "AWOOOI CPU 使用率超過 70%"
description: "Namespace awoooi-prod 的 CPU 使用率已達 {{ $value | humanizePercentage }}"
# Memory 使用率告警
- alert: AWOOOIHighMemoryUsage
expr: |
sum(container_memory_working_set_bytes{namespace="awoooi-prod"})
/ sum(kube_resourcequota{namespace="awoooi-prod", resource="limits.memory"})
> 0.7
for: 5m
labels:
severity: warning
annotations:
summary: "AWOOOI Memory 使用率超過 70%"
# Pod 重啟告警
- alert: AWOOOIPodRestarting
expr: |
increase(kube_pod_container_status_restarts_total{namespace="awoooi-prod"}[1h]) > 3
for: 5m
labels:
severity: critical
annotations:
summary: "AWOOOI Pod 頻繁重啟"
description: "Pod {{ $labels.pod }} 在過去 1 小時重啟超過 3 次"
```
---
## 驗收清單
### 施工前確認
- [ ] ResourceQuota 已套用
- [ ] LimitRange 已套用
- [ ] Default Deny NetworkPolicy 已套用
- [ ] 允許清單 NetworkPolicy 已套用
- [ ] Nginx SSE 配置已驗證
- [ ] 告警規則已部署
### 施工後驗證
```bash
# 驗證 ResourceQuota
kubectl describe quota awoooi-prod-quota -n awoooi-prod
# 驗證 LimitRange
kubectl describe limitrange awoooi-prod-limits -n awoooi-prod
# 驗證 NetworkPolicy
kubectl get networkpolicy -n awoooi-prod
# 測試 SSE 連線
./scripts/test-sse.sh
# 測試 Legacy 隔離
kubectl exec -it deploy/awoooi-api -n awoooi-prod -- \
curl -s http://wooo-aiops-api.wooo-aiops:8000/health
# 預期: 連線失敗 (被 NetworkPolicy 阻擋)
```
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CIO |
---
*此文件由 CIO 維護,基建施工前必須完整遵守。*

View File

@@ -0,0 +1,570 @@
# AWOOOI 部署拓撲與服務位置定義
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CIO
> **強制等級**: 絕對遵守
---
## 概述
**每個服務必須明確定義其部署位置**
- **Host (主機直裝)**: 直接安裝在主機上的服務
- **Docker**: 使用 Docker / Docker Compose 運行的容器
- **K3s**: 部署在 K3s 叢集中的 Pod
---
## 四主機部署總覽
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AWOOOI 部署拓撲圖 │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┐ ┌─────────────────────────┐
│ 192.168.0.110 │ │ 192.168.0.112 │
│ DevOps 金庫 │ │ Kali Security │
├─────────────────────────┤ ├─────────────────────────┤
│ [Docker] │ │ [Docker] │
│ ├─ Harbor :5000 │ │ └─ Scanner API :8080 │
│ └─ GH Runner │ │ │
└─────────────────────────┘ └─────────────────────────┘
│ │
└──────────────┬───────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ 192.168.0.188 │
│ AI + Web 中心 (Gateway) │
├─────────────────────────────────────────────────────────────────────────────┤
│ [Host 直裝] │
│ ├─ Nginx (SSL Gateway) :443 │
│ └─ PostgreSQL :5432 │
│ │
│ [Docker] │
│ ├─ Ollama :11434 │
│ ├─ ClawBot AWOOOI :8089 │
│ ├─ ClawBot Legacy :8088 (凍結) │
│ ├─ Redis Stack :6380 │
│ └─ SigNoz :3301 │
└─────────────────────────────────────────────────────────────────────────────┘
│ Nginx Proxy
┌─────────────────────────────────────────────────────────────────────────────┐
│ K3s 叢集 (192.168.0.120 + 121) │
├─────────────────────────────────────────────────────────────────────────────┤
│ [K3s - awoooi-prod Namespace] │
│ ├─ awoooi-web (Frontend) → NodePort :32335 │
│ ├─ awoooi-api (Backend) → NodePort :32334 │
│ └─ (未來擴充服務) │
│ │
│ [K3s - wooo-aiops Namespace] (凍結) │
│ ├─ Legacy Frontend → NodePort :31235 │
│ └─ Legacy API → NodePort :31234 │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 服務部署位置詳細定義
### 192.168.0.110 (DevOps 金庫)
| 服務 | 部署方式 | Port | 說明 |
|------|---------|------|------|
| **Harbor** | Docker | 5000 | 映像倉庫Project: `awoooi/` |
| **GitHub Runner** | Docker | - | CI/CD 執行器Label: `awoooi-runner` |
```yaml
# docker-compose.yaml (110)
services:
harbor:
image: goharbor/harbor:v2.x
ports:
- "5000:5000"
volumes:
- /data/harbor:/data
gh-runner:
image: myoung34/github-runner:latest
labels:
- "awoooi-runner"
```
---
### 192.168.0.112 (Kali Security)
| 服務 | 部署方式 | Port | 說明 |
|------|---------|------|------|
| **Scanner API** | Docker | 8080 | 安全掃描 APIHeader: `X-Source: awoooi` |
```yaml
# docker-compose.yaml (112)
services:
scanner-api:
image: kali-scanner:latest
ports:
- "8080:8080"
environment:
- ALLOWED_SOURCES=awoooi,wooo-aiops
```
---
### 192.168.0.188 (AI + Web 中心)
| 服務 | 部署方式 | Port | 說明 |
|------|---------|------|------|
| **Nginx** | **Host 直裝** | 443 | SSL Gateway路由分流 |
| **PostgreSQL** | **Host 直裝** | 5432 | 主資料庫 |
| **Ollama** | Docker | 11434 | 本地 LLM 推理 |
| **ClawBot AWOOOI** | Docker | 8089 | AI Agent (新) |
| **ClawBot Legacy** | Docker | 8088 | AI Agent (舊,凍結) |
| **Redis Stack** | Docker | 6380 | 快取 + 向量搜尋 |
| **SigNoz** | Docker | 3301 | APM / 觀測平台 |
#### Nginx (Host 直裝)
```bash
# 安裝方式
sudo apt install nginx
sudo systemctl enable nginx
# 配置檔位置
/etc/nginx/conf.d/awoooi-prod.conf
```
#### PostgreSQL (Host 直裝)
```bash
# 安裝方式
sudo apt install postgresql-15
sudo systemctl enable postgresql
# 資料庫
awoooi_prod # AWOOOI 專用
wooo_aiops # Legacy (凍結)
```
#### Docker 服務
```yaml
# docker-compose.yaml (188)
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- /data/ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
clawbot-awoooi:
image: 192.168.0.110:5000/awoooi/clawbot:latest
ports:
- "8089:8089"
environment:
- OLLAMA_URL=http://localhost:11434
- REDIS_URL=redis://localhost:6380/10
clawbot-legacy:
image: 192.168.0.110:5000/wooo-aiops/clawbot:frozen
ports:
- "8088:8088"
# 凍結版本,不再更新
redis-stack:
image: redis/redis-stack:latest
ports:
- "6380:6379"
volumes:
- /data/redis:/data
signoz:
image: signoz/signoz:latest
ports:
- "3301:3301"
```
---
### 192.168.0.120 / 121 (K3s 叢集)
| 節點 | 角色 | 說明 |
|------|------|------|
| 192.168.0.120 | Master | K3s 控制平面 + Worker |
| 192.168.0.121 | Worker | HA 備援節點 |
#### K3s Namespace 定義
| Namespace | 用途 | 狀態 |
|-----------|------|------|
| `awoooi-prod` | AWOOOI 正式環境 | **Active** |
| `wooo-aiops` | Legacy 系統 | **凍結** |
#### AWOOOI 服務 (K3s)
| 服務 | Deployment | Service | NodePort |
|------|------------|---------|----------|
| **Frontend** | awoooi-web | awoooi-web-svc | 32335 |
| **Backend** | awoooi-api | awoooi-api-svc | 32334 |
```yaml
# k8s/awoooi-prod/03-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: awoooi-web
namespace: awoooi-prod
spec:
replicas: 2
selector:
matchLabels:
app: awoooi-web
template:
metadata:
labels:
app: awoooi-web
spec:
containers:
- name: web
image: 192.168.0.110:5000/awoooi/web:${IMAGE_TAG}
ports:
- containerPort: 3000
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: awoooi-api
namespace: awoooi-prod
spec:
replicas: 2
selector:
matchLabels:
app: awoooi-api
template:
metadata:
labels:
app: awoooi-api
spec:
containers:
- name: api
image: 192.168.0.110:5000/awoooi/api:${IMAGE_TAG}
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: awoooi-secrets
key: DATABASE_URL
- name: REDIS_URL
value: "redis://192.168.0.188:6380/10"
- name: OLLAMA_URL
value: "http://192.168.0.188:11434"
- name: CLAWBOT_URL
value: "http://192.168.0.188:8089"
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
```
---
## 環境對照表 (最終版)
| 環境 | 用途 | 域名 | 部署位置 |
|------|------|------|---------|
| **Dev** | 本機開發 | `localhost:3000` | 開發者本機 |
| **Prod** | 正式環境 | `awoooi.wooo.work` | K3s (awoooi-prod) |
> ⚠️ **無 UAT 環境**: 測試驗收在 Dev 完成後直接部署 Prod
---
## 網路流量走向
```
用戶 (Internet)
┌─────────────────────────────────────────────────────────────────┐
│ Cloudflare (CDN + WAF) │
└─────────────────────────────────────────────────────────────────┘
▼ HTTPS :443
┌─────────────────────────────────────────────────────────────────┐
│ 192.168.0.188 - Nginx (Host 直裝) │
│ server_name: awoooi.wooo.work │
└─────────────────────────────────────────────────────────────────┘
├──────────────────────────────────────┐
│ │
▼ /api/* → :32334 ▼ /* → :32335
┌─────────────────────┐ ┌─────────────────────┐
│ awoooi-api (K3s) │ │ awoooi-web (K3s) │
│ 120:32334, 121:32334│ │ 120:32335, 121:32335│
└─────────────────────┘ └─────────────────────┘
├─────────────────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ PostgreSQL │ │ Redis │ │ Ollama │
│ 188:5432 │ │ 188:6380 │ │ 188:11434 │
│ (Host) │ │ (Docker) │ │ (Docker) │
└─────────────┘ └─────────────┘ └─────────────┘
┌─────────────┐
│ ClawBot │
│ 188:8089 │
│ (Docker) │
└─────────────┘
```
---
## 部署位置決策原則
| 服務類型 | 建議部署方式 | 原因 |
|---------|-------------|------|
| **Gateway (Nginx)** | Host 直裝 | SSL 終止、效能關鍵 |
| **資料庫 (PostgreSQL)** | Host 直裝 | 資料持久性、備份策略 |
| **AI 服務 (Ollama)** | Docker | GPU 資源管理、版本切換 |
| **應用服務 (Web/API)** | K3s | 水平擴展、滾動更新 |
| **快取 (Redis)** | Docker | 簡易管理、資料可失 |
| **監控 (SigNoz)** | Docker | 獨立運行、不影響業務 |
---
## K8s 資源配置
### Namespace 資源配額
```yaml
# k8s/awoooi-prod/01-namespace-quota.yaml
apiVersion: v1
kind: Namespace
metadata:
name: awoooi-prod
labels:
environment: prod
system: awoooi
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: awoooi-prod-quota
namespace: awoooi-prod
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
pods: "20"
```
### 零信任網路策略
```yaml
# k8s/awoooi-prod/02-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: prod-isolation-policy
namespace: awoooi-prod
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
# 僅允許來自 Nginx Gateway (188) 的流量
- from:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 3000
- protocol: TCP
port: 8000
egress:
# 允許訪問 188 主機服務
- to:
- ipBlock:
cidr: 192.168.0.188/32
ports:
- protocol: TCP
port: 5432 # PostgreSQL
- protocol: TCP
port: 6380 # Redis
- protocol: TCP
port: 11434 # Ollama
- protocol: TCP
port: 8089 # ClawBot
# 允許訪問 112 安全掃描
- to:
- ipBlock:
cidr: 192.168.0.112/32
ports:
- protocol: TCP
port: 8080
# 允許 DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
```
---
## Nginx 正式環境路由
```nginx
# /etc/nginx/conf.d/awoooi-prod.conf
upstream awoooi_prod_api {
server 192.168.0.120:32334;
server 192.168.0.121:32334;
keepalive 32;
}
upstream awoooi_prod_web {
server 192.168.0.120:32335;
server 192.168.0.121:32335;
keepalive 16;
}
server {
listen 443 ssl http2;
server_name awoooi.wooo.work;
ssl_certificate /etc/nginx/ssl/awoooi.crt;
ssl_certificate_key /etc/nginx/ssl/awoooi.key;
# 系統標識
proxy_set_header X-System "awoooi-prod";
# SSE 串流優化 (關鍵!)
location ~ ^/api/v1/(agent|dashboard)/stream {
proxy_pass http://awoooi_prod_api;
proxy_buffering off;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding on;
proxy_set_header X-Accel-Buffering no;
}
# 一般 API
location /api/ {
proxy_pass http://awoooi_prod_api;
proxy_http_version 1.1;
proxy_set_header Connection "keep-alive";
}
# 前端
location / {
proxy_pass http://awoooi_prod_web;
proxy_http_version 1.1;
}
# 共用 Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
```
---
## 服務啟動順序
```
1. 192.168.0.188 (Host 服務)
└─ systemctl start nginx
└─ systemctl start postgresql
2. 192.168.0.188 (Docker 服務)
└─ docker-compose up -d redis-stack
└─ docker-compose up -d ollama
└─ docker-compose up -d clawbot-awoooi
└─ docker-compose up -d signoz
3. 192.168.0.110 (DevOps)
└─ docker-compose up -d harbor
└─ docker-compose up -d gh-runner
4. 192.168.0.112 (Security)
└─ docker-compose up -d scanner-api
5. 192.168.0.120/121 (K3s)
└─ kubectl apply -f k8s/awoooi-prod/
```
---
## 驗證清單
```bash
# 1. 驗證 Host 服務
systemctl status nginx
systemctl status postgresql
psql -U postgres -c "SELECT 1"
# 2. 驗證 Docker 服務 (188)
docker ps | grep -E "(ollama|clawbot|redis|signoz)"
curl http://localhost:11434/api/tags
curl http://localhost:8089/health
redis-cli -p 6380 PING
# 3. 驗證 K3s 服務
kubectl get pods -n awoooi-prod
kubectl get svc -n awoooi-prod
curl http://192.168.0.120:32334/health
curl http://192.168.0.120:32335
# 4. 驗證 Nginx 路由
curl -k https://awoooi.wooo.work/api/health
curl -k https://awoooi.wooo.work/
```
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立,明確定義部署位置 | CIO |
---
*此文件由 CIO 維護,所有服務部署必須遵守此拓撲定義。*

View File

@@ -0,0 +1,186 @@
# =============================================================================
# Prometheus Alertmanager → AWOOOI Webhook 對接設定
# =============================================================================
#
# 統帥戰略 C: 影子模式 (Shadow Mode) 實彈接線
#
# 此設定檔指導如何將真實的 Prometheus Alertmanager
# 指向 AWOOOI OpenClaw Webhook 端點
#
# 安全要求:
# 1. 必須設定 HMAC Secret (WEBHOOK_HMAC_SECRET)
# 2. 生產環境強制驗證簽章 (Fail-Closed)
# 3. 影子模式預設開啟 (SHADOW_MODE_ENABLED=true)
#
# =============================================================================
# -----------------------------------------------------------------------------
# alertmanager.yml 範例設定
# -----------------------------------------------------------------------------
# 位置: /etc/alertmanager/alertmanager.yml (K3s ConfigMap)
# -----------------------------------------------------------------------------
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'namespace', 'deployment']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'awoooi-openclaw'
# 路由規則: 依據嚴重度分流
routes:
# Critical 告警立即發送
- match:
severity: critical
receiver: 'awoooi-openclaw'
group_wait: 10s
repeat_interval: 1h
# Warning 告警稍微聚合
- match:
severity: warning
receiver: 'awoooi-openclaw'
group_wait: 1m
receivers:
- name: 'awoooi-openclaw'
webhook_configs:
- url: 'http://192.168.0.188:8000/api/v1/webhooks/alerts'
send_resolved: true
max_alerts: 10
# =======================================================================
# HMAC 簽章設定 (CISO 要求)
# =======================================================================
# Alertmanager 原生不支援 HMAC需透過以下方式實現:
#
# 方案 A: 使用 http_config 的 authorization (Bearer Token)
# http_config:
# authorization:
# type: Bearer
# credentials: '<your-hmac-token>'
#
# 方案 B: 使用外部轉發服務 (推薦)
# 部署一個輕量級 sidecar 來計算 HMAC 並注入 X-Signature-256 Header
# 見下方 hmac-sidecar 說明
# =======================================================================
# -----------------------------------------------------------------------------
# K3s ConfigMap 部署範例
# -----------------------------------------------------------------------------
# kubectl apply -f - <<EOF
# apiVersion: v1
# kind: ConfigMap
# metadata:
# name: alertmanager-config
# namespace: monitoring
# data:
# alertmanager.yml: |
# <上述設定內容>
# EOF
# -----------------------------------------------------------------------------
# HMAC Sidecar 範例 (Go)
# -----------------------------------------------------------------------------
# 如果需要 HMAC 簽章,可部署此 sidecar:
#
# 流程: Alertmanager → HMAC Sidecar → AWOOOI Webhook
#
# 環境變數:
# WEBHOOK_TARGET_URL: http://192.168.0.188:8000/api/v1/webhooks/alerts
# WEBHOOK_HMAC_SECRET: <your-secret>
#
# Docker Image: ghcr.io/awoooi/hmac-sidecar:latest (待建置)
# -----------------------------------------------------------------------------
# K8s Alert Rules 範例 (PrometheusRule CRD)
# -----------------------------------------------------------------------------
# apiVersion: monitoring.coreos.com/v1
# kind: PrometheusRule
# metadata:
# name: awoooi-alerts
# namespace: monitoring
# spec:
# groups:
# - name: k8s-pod-alerts
# rules:
# - alert: PodCrashLooping
# expr: |
# increase(kube_pod_container_status_restarts_total[1h]) > 3
# for: 5m
# labels:
# severity: warning
# alert_type: k8s_pod_crash
# annotations:
# summary: "Pod {{ $labels.pod }} 發生 CrashLoop"
# description: "Pod 在過去 1 小時重啟超過 3 次"
#
# - alert: PodHighCPU
# expr: |
# sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, namespace)
# / sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod, namespace) > 0.9
# for: 10m
# labels:
# severity: warning
# alert_type: high_cpu
# annotations:
# summary: "Pod {{ $labels.pod }} CPU 超過 90%"
#
# - alert: PodHighMemory
# expr: |
# sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace)
# / sum(kube_pod_container_resource_limits{resource="memory"}) by (pod, namespace) > 0.9
# for: 10m
# labels:
# severity: warning
# alert_type: high_memory
# annotations:
# summary: "Pod {{ $labels.pod }} Memory 超過 90%"
#
# - alert: NodeDiskPressure
# expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
# for: 5m
# labels:
# severity: critical
# alert_type: disk_full
# annotations:
# summary: "Node {{ $labels.node }} 磁碟壓力過高"
# -----------------------------------------------------------------------------
# 測試指令
# -----------------------------------------------------------------------------
# 1. 模擬發送告警 (無 HMAC僅限 dev 環境):
#
# curl -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts \
# -H "Content-Type: application/json" \
# -d '{
# "alert_type": "k8s_pod_crash",
# "severity": "warning",
# "source": "prometheus",
# "target_resource": "test-pod-123",
# "namespace": "default",
# "message": "Manual test alert"
# }'
#
# 2. 帶 HMAC 簽章發送 (生產環境):
#
# SECRET="your-hmac-secret"
# PAYLOAD='{"alert_type":"k8s_pod_crash","severity":"warning","source":"prometheus","target_resource":"test-pod-123","namespace":"default","message":"HMAC test"}'
# SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')
#
# curl -X POST http://192.168.0.188:8000/api/v1/webhooks/alerts \
# -H "Content-Type: application/json" \
# -H "X-Signature-256: sha256=$SIGNATURE" \
# -d "$PAYLOAD"
#
# -----------------------------------------------------------------------------
# 驗證影子模式
# -----------------------------------------------------------------------------
# 查看 AWOOOI API 日誌,確認出現:
# shadow_mode_intercept | operation=DELETE_POD | message=[SHADOW MODE]
#
# 這表示 AI 決策已觸發,但 K8s 操作被安全攔截
# =============================================================================

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,671 @@
# AWOOOI 全面重構啟動大會 - Operation Phoenix Rising
> **會議日期**: 2026-03-20
> **會議類型**: C-Level 戰略佈達 + 技術深度定義會議
> **主持人**: CEO
> **記錄人**: CTO (Claude Code)
> **會議代號**: Operation Phoenix Rising (鳳凰涅槃行動)
> **版本**: v2.0 Final (Phase VI 施工規範)
---
## 參與者
| 角色 | 出席 | 職責 |
|------|------|------|
| CEO | ✅ | 戰略佈達、最終決策 |
| 資深顧問 (Gemini) | ✅ | 架構諮詢、風險評估 |
| CTO | ✅ | 技術架構、API 契約、ClawBot |
| CIO | ✅ | 基礎設施、網路隔離、K8s |
| CPO | ✅ | 產品體驗、視覺設計、前端團隊 |
| CISO | ✅ | 安全架構、合規、RBAC |
---
## 會議背景
### CEO 四大戰略指示
| # | 戰略方向 | 核心價值 |
|---|---------|---------|
| 1 | **100% 獨立重構** | AWOOOI 完全取代舊版,不是附屬品 |
| 2 | **Nothing.tech 純白極簡** | 拋棄深色駭客風,轉向精密工業儀器風格 |
| 3 | **上帝視角戰情室** | 四主機全局可視化 + MCP 一鍵操作 |
| 4 | **API 契約優先** | OpenAPI 自動生成 + Scalar 文檔 |
### 產品定位轉變
| 維度 | 原本理解 | CEO 新指示 |
|------|---------|-----------|
| 範圍 | 只做 Agent 指揮艙 (~10 頁) | **100% 重構所有 63+ 頁面** |
| 定位 | 舊系統附屬品 | **獨立 SaaS 產品,完全取代** |
| AI 整合 | 部分頁面 | **全站 AI Copilot (Ubiquitous AI)** |
---
## CEO 六大裁定 (Executive Mandates)
| # | 決策點 | CEO 裁定 | 備註 |
|---|--------|---------|------|
| 1 | 重構範圍 | **B 分階段** | P0 戰情室 → P1 監控/安全 → 敏捷迭代 |
| 2 | 視覺轉向 | **✅ 批准** | Nothing.tech 純白極簡,全站貫徹 |
| 3 | 時程 | **24 週批准** | Week 8 必須交付 MVP能提早就提早 |
| 4 | 團隊擴編 | **✅ 批准** | CPO/CISO 需提出詳細團隊分工 |
| 5 | 過渡期 | **1-3 個月** | ⚠️ 無客戶,快速切換,非 12 個月 |
| 6 | API 文檔 | **Scalar** | 開發同步定義 + 納入 MD 規範 |
---
## 議題一:四主機隔離部署與網路流向
### 網路架構總覽
```
Internet → Cloudflare → 192.168.0.188 (Nginx SSL Gateway)
┌───────────────────┴───────────────────┐
│ │
Legacy 路由 AWOOOI 路由
aiops.wooo.work awoooi.wooo.work
→ :31235 (Frontend) → :32235 (Frontend)
→ :31234 (API) → :32234 (API)
→ :8088 (ClawBot) → :8089 (ClawBot)
```
### Port 分配表
| 系統 | 服務 | NodePort | 備註 |
|------|------|---------|------|
| Legacy | Frontend | 31235 | 凍結 |
| Legacy | API | 31234 | 凍結 |
| Legacy | ClawBot | 8088 | 共用核心 |
| AWOOOI UAT | Frontend | 32235 | 🆕 |
| AWOOOI UAT | API | 32234 | 🆕 |
| AWOOOI Prod | Frontend | 32335 | 🆕 |
| AWOOOI Prod | API | 32334 | 🆕 |
| AWOOOI | ClawBot | 8089 | 🆕 新 API 層 |
### K8s Namespace 規劃
| Namespace | 用途 | 狀態 |
|-----------|------|------|
| `wooo-aiops` | Legacy 系統 | 凍結 |
| `awoooi-uat` | AWOOOI 測試環境 | 🆕 新建 |
| `awoooi-prod` | AWOOOI 正式環境 | 🆕 新建 |
### 網路隔離策略
- **NetworkPolicy**: 禁止 AWOOOI Namespace 連接 Legacy Namespace
- **Nginx 路由**: 嚴格 server_name 分流X-System Header 標記來源
- **SigNoz 標籤**: `service.name` 區分新舊系統
---
## 議題二:共用資源衝突排查矩陣
### 衝突清單與解決方案
| 資源 | 風險 | 解決方案 |
|------|------|---------|
| **Ollama** | 🔴 高 | Redis Queue + 優先級 (AWOOOI > Legacy) |
| **PostgreSQL** | 🔴 高 | PgBouncer + 獨立 Schema |
| **Redis** | 🟡 中 | DB Index 隔離 (舊=0-9, 新=10-15) |
| **Harbor** | 🟢 低 | Project 隔離 |
| **GH Runner** | 🟡 中 | Label 隔離 |
| **Prometheus** | 🟢 低 | Job Label 區分 |
| **SigNoz** | 🟢 低 | service.name 標籤 |
### ClawBot 共用策略
**決議**: 選項 C - 共用核心API 層分離
```
ClawBot Core (共用)
├── semantic_cache.py ← 共用
├── knowledge_base.py ← 共用
├── trust_engine.py ← 共用
├── api_legacy.py ← 舊系統專用 (8088)
└── api_awoooi.py ← AWOOOI 專用 (8089) 🆕
```
---
## 議題三API 契約驅動開發
### 開發流程
1. **定義 API 規格**`docs/api/openapi.yaml`
2. **程式碼實作** → FastAPI 路由
3. **提交 PR**
4. **CI 自動檢查** → OpenAPI 一致性 + MD 文件 + 測試覆蓋率
5. **合併部署**
### CI/CD 攔截規則
- ❌ OpenAPI 規格與程式碼不一致 → 阻擋合併
- ❌ 對應 MD 文件未更新 → 阻擋合併
- ❌ 測試覆蓋率 < 80% → 阻擋合併
- ❌ Scalar 文檔渲染失敗 → 阻擋合併
### 文件結構
```
docs/api/
├── openapi.yaml # OpenAPI 3.1 主規格
├── endpoints/ # 各端點詳細文件
├── schemas/ # 資料模型文件
└── examples/ # 請求/回應範例
```
---
## 議題四:團隊分工藍圖
### CPO 團隊 (5.5 FTE)
| 角色 | 人數 | 職責 |
|------|------|------|
| CPO | 1 | 產品願景、UX 策略、優先級 |
| UI Designer | 1 | Nothing.tech 視覺設計、組件庫 |
| Frontend Lead | 1 | React/Next.js 架構、技術決策 |
| Frontend Dev | 2 | 頁面切版、D3.js 圖表 |
| i18n Lead | 0.5 | 雙語翻譯、字典維護 |
### CISO 團隊 (2.5 FTE)
| 角色 | 人數 | 職責 |
|------|------|------|
| CISO | 1 | 安全策略、合規審查 |
| Security Engineer | 1 | RBAC 遷移、認證整合、Zero Trust |
| Penetration Tester | 0.5 | 定期滲透測試 (可外包) |
### CTO + CIO 團隊 (已定義)
| 角色 | 人數 | 職責 |
|------|------|------|
| CTO | 1 | 技術架構、AI 整合、API 設計 |
| CIO | 1 | 基礎設施、K8s、CI/CD |
| Backend Dev | 2 | FastAPI 開發 |
| DevOps Engineer | 1 | 部署、監控、自動化 |
### 總團隊規模
| 部門 | 人數 |
|------|------|
| 產品/設計 (CPO) | 5.5 |
| 資安/合規 (CISO) | 2.5 |
| 技術/後端 (CTO) | 4 |
| 基礎設施 (CIO) | 2 |
| **總計** | **14** |
---
## 視覺風格定義Nothing.tech 純白工業風
### 設計語言轉變
| 元素 | 舊方向 (深色駭客) | 新方向 (純白工業) |
|------|-----------------|------------------|
| 背景 | `#0A0A12` | `#FAFAFA` / `#F5F5F5` |
| 卡片 | `#1A1A2E` | `rgba(255,255,255,0.7)` 白玻璃 |
| 字體 | 霓虹點綴 | 高對比黑字 `#0A0A0A` |
| 強調色 | 多色霓虹 | 單一橘紅 `#FF6B35` |
| 紋理 | 無 | 點陣網格 (Dot Matrix) |
| 風格 | 賽博龐克 | 精密醫療儀器 / 航太設備 |
### Tailwind 配置
```javascript
colors: {
nothing: {
white: '#FFFFFF',
snow: '#FAFAFA', // 主背景
cloud: '#F5F5F5', // 次背景
mist: '#E5E5E5', // 邊框
ink: '#0A0A0A', // 主文字
gray: '#6B7280', // 次文字
red: '#D71921', // Nothing 紅
},
status: {
success: '#10B981',
warning: '#F59E0B',
error: '#EF4444',
info: '#3B82F6',
thinking: '#8B5CF6',
},
brand: {
primary: '#FF6B35', // AWOOOI 橘
}
}
```
---
## 執行時程
### Phase 0: 基建隔離 (Week 0-2)
| 任務 | 負責人 | 產出 |
|------|--------|------|
| K8s Namespace 建立 | CIO | `awoooi-uat`, `awoooi-prod` |
| Nginx 路由分離 | CIO | `routing.conf` |
| NetworkPolicy 配置 | CIO | `network-policy.yaml` |
| Redis DB Index 分配 | CIO | 文件 + 配置 |
| PostgreSQL Schema 建立 | CIO | `awoooi_*` Schema |
| PgBouncer 配置 | CIO | `pgbouncer.ini` |
| 遷移腳本開發 | CTO | `migrate_legacy_config.py` |
### Phase 1: 全局戰情室 MVP (Week 3-8)
| 任務 | 負責人 | 產出 |
|------|--------|------|
| 四主機可視化 | CPO + CTO | 戰情室首頁 |
| ClawBot API 分離 | CTO | `:8089` 新端點 |
| Nothing.tech 視覺落地 | CPO | 組件庫 |
| i18n 框架整合 | CPO | `zh-TW.json`, `en.json` |
| RBAC 遷移 | CISO | 認證模組 |
| MVP 安全審查 | CISO | 安全報告 |
### Phase 2: 監控 + 安全頁面 (Week 9-16)
| 任務 | 負責人 | 頁面數 |
|------|--------|--------|
| Monitor Dashboard 重構 | CPO | 8 |
| Security Dashboard 重構 | CPO | 12 |
| AI Copilot 整合 | CTO | 全站 |
### Phase 3: 剩餘頁面 + GA (Week 17-24)
| 任務 | 負責人 | 頁面數 |
|------|--------|--------|
| Deploy, Tickets, Compliance... | CPO | 43+ |
| 舊系統下線準備 | CIO | - |
| GA 發布 | 全員 | - |
### 過渡期 (Week 25-28)
| 任務 | 負責人 |
|------|--------|
| 舊系統凍結 | CIO |
| 使用者遷移 | CTO |
| 舊系統正式下線 | CIO |
---
## 待產出文件清單
| 文件 | 負責人 | 完成時間 |
|------|--------|---------|
| 《四主機隔離部署計畫》 | CIO | Week 0 Day 2 |
| 《共用資源衝突矩陣》 | CTO + CIO | Week 0 Day 3 |
| 《API 開發 SOP》 | CTO | Week 0 Day 2 |
| 《RBAC Schema 設計》 | CISO | Week 1 |
| 《Nothing.tech 組件規範》 | CPO | Week 1 |
| 《配置遷移腳本規格》 | CTO | Week 1 |
| 《i18n 開發指南》 | CPO | Week 1 |
---
## 會議結論
### 已達成共識
1. **產品定位**: AWOOOI 為 100% 獨立重構的 SaaS 產品,完全取代舊版
2. **視覺風格**: Nothing.tech 純白極簡工業風,全站貫徹
3. **網路隔離**: 新舊系統完全分離,透過 Nginx + NetworkPolicy 實現
4. **共用策略**: Ollama/SigNoz/Redis 共用但隔離ClawBot 共用核心但 API 分離
5. **開發紀律**: API 契約優先CI/CD 強制檢查
6. **團隊規模**: 總計 14 人
### 行動項目
| # | 行動 | 負責人 | 期限 |
|---|------|--------|------|
| 1 | 建立 K8s Namespace | CIO | Day 1 |
| 2 | 配置 Nginx 路由分離 | CIO | Day 2 |
| 3 | 產出 API 開發 SOP | CTO | Day 2 |
| 4 | 產出隔離部署計畫 | CIO | Day 2 |
| 5 | 產出衝突矩陣 | CTO + CIO | Day 3 |
| 6 | 更新 Tailwind 配置 | CPO | Day 3 |
| 7 | 啟動 Phase 0 | 全員 | 立即 |
---
## 變更記錄
| 日期 | 版本 | 變更內容 | 作者 |
|------|------|----------|------|
| 2026-03-20 | v1.0 | 會議記錄初版 | CTO (Claude Code) |
| 2026-03-20 | v2.0 | 新增 Phase III 深度定義 | CTO (Claude Code) |
| 2026-03-20 | v3.0 | 新增 Phase IV CEO 13 大指示 + 架構地雷排查 | CTO (Claude Code) |
| 2026-03-20 | v4.0 | **最終版** Phase V CEO 10 大補充 + 顧問 4 大盲點 + C-Level 補充 | CTO (Claude Code) |
---
## Phase III 深度定義會議 (續)
### CEO 三大補充指示
| # | 指示 | 核心精神 |
|---|------|---------|
| 1 | 功能增減評估 | 各 C-Level 專業分析,非 CEO 獨斷 |
| 2 | 完整分工定義 | 避免重工、工作衝突 |
| 3 | 完整文檔記錄 | 防止記憶中斷、重複工作 |
---
### C-Level 功能評估共識
#### 頁面重組 (63 → 45 頁)
| 變更 | 原頁數 | 新頁數 | 理由 |
|------|--------|--------|------|
| Compliance → Security | 8 | 3 | 整合合規至安全模組 |
| Settings 精簡 | 12 | 6 | 合併重複設定 |
| Reports 整合 | 5 | 0 | 就地生成報告 |
| 新增功能 | 0 | 4 | 戰情室 + AI + Plugin |
#### P0 必做功能 (MVP)
| 功能 | 提案人 | 負責人 |
|------|--------|--------|
| 四主機戰情室 | CEO | CTO + CPO |
| AI Copilot 側邊欄 | CTO | CTO |
| HITL 授權卡片 | CTO | CPO |
| Blast Radius 預演 | CTO | CTO |
| Multi-Sig 簽核 | CTO | CTO + CISO |
| Command Palette | CPO | CPO |
| Zero Trust 網路 | CISO | CIO + CISO |
---
### 工作分解結構 (WBS)
詳見: `docs/architecture/WBS.md` (待建立)
#### 關鍵依賴路徑
```
CIO-001 (K8s) → CTO-101 (BFF) → CTO-103 (SSE) → CPO-103 (戰情室頁面)
```
---
### 技術深潛議題
#### 議題 A: BFF 閘道架構
- 單一 SSE 連線,後端多工聚合
- Redis 快取分層 (5s/30s/300s TTL)
- 四主機資料聚合服務
#### 議題 B: 原子組件庫
- Atomic Design 五層架構
- GlassCard / StatusOrb / CommandPalette
- i18n 原生支援 (next-intl)
#### 議題 C: 遷移腳本
- 密碼 bcrypt 相容,無需重設
- Webhook → leWOOOgo OUTPUT 轉換
- 事務性遷移 + 驗證
---
### 文檔記錄系統
```
docs/
├── LOGBOOK.md # 每日更新
├── meetings/ # 會議記錄
├── architecture/ # WBS + 依賴圖 + RACI
├── api/ # OpenAPI + 端點文件
├── design/ # 組件庫 + 色彩 + i18n
├── security/ # RBAC + 審計 + 威脅模型
└── migration/ # ETL + 映射 + 回滾
```
---
*Operation Phoenix Rising Phase III 完成。*
*深度技術定義已就緒Phase 0 可正式動工!*
🎖️ **C-Level 團隊已完成功能評估與分工定義!**
---
## Phase VI: CEO 最終施工規範 (2026-03-20 16:00)
### CEO 9 大指示
| # | 指示 | 負責人 | 產出文件 |
|---|------|--------|---------|
| 1 | **依賴清單版本控制**: 所有工具、套件必須記錄並隨版本更新 | CTO | `docs/DEPENDENCIES.md` ✅ |
| 2 | **AI 備援順序調整**: Gemini API 優先 → Claude API 次選Token 用量監控+告警 | CTO | `docs/adr/ADR-006-ai-fallback-strategy.md` ✅ |
| 3 | **環境簡化**: 移除 UAT只保留 Dev (localhost) + Prod (awoooi.wooo.work) | CIO | `docs/infrastructure/DEPLOYMENT_CONTRACTS.md` ✅ |
| 4 | **Token 用量評估**: 需評估 AI 功能增加的 Token 消耗量 | CTO | 已納入 ADR-006 |
| 5 | **瀏覽器測試截圖/錄影**: Playwright E2E 必須啟用截圖與錄影加速除錯 | CPO | Playwright 配置更新 |
| 6 | **週報自動化**: 每週五系統自動生成週報發送 Email | CTO | `docs/operations/WEEKLY_REPORT_SOP.md` ✅ |
| 7 | **Redis TTL 確認**: 熱資料維持 7 天 (快取用途),歷史資料存 PostgreSQL 6 個月 | CTO | `docs/adr/ADR-007-data-retention-policy.md` ✅ |
| 8 | **配置版本控制**: 所有服務、監控、網路配置必須納入 Git | CIO | 已納入技術文檔清單 |
| 9 | **技術文檔清單**: 列出各單位必須產出的文檔、架構圖、流程圖 | CTO | `docs/TECHNICAL_DOCUMENTATION_CHECKLIST.md` ✅ |
### 顧問 4 大深度討論
| # | 議題 | 決策 | 產出 |
|---|------|------|------|
| 1 | **NetworkPolicy 零信任** | 採用 Default Deny All僅白名單允許 | `docs/infrastructure/DEPLOYMENT_CONTRACTS.md` |
| 2 | **Nginx SSE 長連線** | proxy_buffering off + timeout 3600s | `docs/infrastructure/DEPLOYMENT_CONTRACTS.md` |
| 3 | **K8s 資源配額** | 限制 AWOOOI 使用叢集 40% CPU/Memory | `docs/infrastructure/DEPLOYMENT_CONTRACTS.md` |
| 4 | **i18n Key 語意化命名** | `[頁面].[組件].[元素]` 樹狀結構 | `docs/design/I18N_STRUCTURE.md` ✅ |
### 環境架構 (最終版)
| 環境 | 用途 | 域名 | K8s Namespace |
|------|------|------|---------------|
| **Dev** | 本機開發 | `localhost:3000` | - |
| **Prod** | 生產環境 | `awoooi.wooo.work` | `awoooi-prod` |
> ⚠️ **重要**: 不設 UAT 環境,與舊系統完全隔離
### AI 備援策略 (成本控制)
```
Ollama (本地) → Gemini API → Claude API → 靜態回應
$0 ~$0.20/月 ~$5/月 $0
```
**監控告警閾值**:
- Gemini 每日 70K tokens → 警告
- Gemini 每日 90K tokens → 嚴重
- Claude 每日 35K tokens → 警告
- 月度成本 $5 → 警告
### 資料保留策略
| 層級 | 位置 | TTL | 用途 |
|------|------|-----|------|
| 熱資料 | Redis | 7-30 天 | 快取、Session |
| 溫資料 | PostgreSQL | **6 個月** | 歷史查詢、報表 |
| 冷資料 | 歸檔 | 永久/1-7年 | 審計、合規 |
### Phase VI 產出文件清單
| 文件 | 路徑 | 狀態 |
|------|------|------|
| 依賴清單 | `docs/DEPENDENCIES.md` | ✅ |
| 技術文檔清單 | `docs/TECHNICAL_DOCUMENTATION_CHECKLIST.md` | ✅ |
| 部署契約 | `docs/infrastructure/DEPLOYMENT_CONTRACTS.md` | ✅ |
| i18n 結構規範 | `docs/design/I18N_STRUCTURE.md` | ✅ |
| AI 降級策略 | `docs/adr/ADR-006-ai-fallback-strategy.md` | ✅ |
| 資料保留策略 | `docs/adr/ADR-007-data-retention-policy.md` | ✅ |
| 週報自動化 | `docs/operations/WEEKLY_REPORT_SOP.md` | ✅ |
---
*Operation Phoenix Rising Phase VI 完成。*
*所有施工規範已定義CIO 可開始 K8s 基建作業!*
🎖️ **CEO 最終施工規範已確認Phase 0 正式動工!**
---
## 會議總結 (Final Summary)
### 一、戰略定位
| 項目 | 決策 |
|------|------|
| **產品定位** | 100% 獨立 SaaS 平台,完全取代舊版 63+ 頁面 |
| **視覺風格** | Nothing.tech 純白工業風 |
| **AI 整合** | 全站 AI Copilot (Ubiquitous AI) |
| **時程** | 24 週Week 8 交付 MVP |
| **團隊** | 14 人 |
### 二、環境架構
| 環境 | 域名 | 部署位置 |
|------|------|---------|
| **Dev** | localhost:3000 | 開發者本機 |
| **Prod** | awoooi.wooo.work | K3s (awoooi-prod) |
> ⚠️ **無 UAT 環境**: 測試驗收在 Dev 完成後直接部署 Prod
### 三、部署拓撲
| 主機 | 服務 | 部署方式 |
|------|------|---------|
| 192.168.0.188 | Nginx, PostgreSQL | **Host 直裝** |
| 192.168.0.188 | Ollama, ClawBot, Redis, SigNoz | **Docker** |
| 192.168.0.110 | Harbor, GH Runner | **Docker** |
| 192.168.0.112 | Kali Scanner | **Docker** |
| 192.168.0.120/121 | awoooi-web, awoooi-api | **K3s** |
### 四、網路流量
```
用戶 → Cloudflare → 188 Nginx (Host)
┌───────────────┴───────────────┐
▼ ▼
/api/* → K3s :32334 /* → K3s :32335
```
### 五、AI 備援策略
```
Ollama ($0) → Gemini (~$0.20/月) → Claude (~$5/月) → 靜態回應
```
告警: Gemini 70K/日, Claude 35K/日, 月度 $10 熔斷
### 六、關鍵文檔產出 (53 份)
| 類別 | 數量 | 狀態 |
|------|------|------|
| ADR | 7 | 6 完成 |
| SOP | 3 | 3 完成 |
| 規格文檔 | 15 | 8 完成 |
| K8s 配置 | 7 | 7 完成 |
| 其他 | 21 | 待開發 |
---
## 實施步驟 (Implementation Plan)
### Week 0 Day 1-2: CIO 基建 (立即執行)
```bash
# Step 1: 建立 Namespace
kubectl apply -f k8s/awoooi-prod/01-namespace-quota.yaml
# Step 2: 配置網路策略
kubectl apply -f k8s/awoooi-prod/02-network-policy.yaml
# Step 3: 建立 ConfigMap
kubectl apply -f k8s/awoooi-prod/04-configmap.yaml
# Step 4: 配置 Secrets (手動)
kubectl apply -f k8s/awoooi-prod/03-secrets.yaml
# Step 5: 部署 Nginx 配置
scp k8s/nginx/awoooi-prod.conf 192.168.0.188:/etc/nginx/conf.d/
ssh 192.168.0.188 "nginx -t && systemctl reload nginx"
# Step 6: 驗證
kubectl get all -n awoooi-prod
curl -k https://awoooi.wooo.work/api/health
```
### Week 0 Day 3-5: CTO 開發環境
```bash
# Step 1: 本機開發環境
pnpm install
pnpm dev
# Step 2: 驗證 API 連線
curl http://localhost:8000/health
curl http://localhost:3000
# Step 3: 測試 AI 備援
python -m app.services.ai.test_fallback
```
### Week 1: CPO 組件開發
1. 設定 Tailwind Nothing.tech 配置
2. 建立 Design Tokens
3. 開發原子組件 (StatusOrb, GlassCard)
4. 設定 i18n 框架 (next-intl)
### Week 2: MVP 戰情室骨架
1. Dashboard 頁面佈局
2. HostCard 組件
3. AlertPanel 組件
4. SSE 即時更新
### Week 3-8: MVP 完整功能
按 WBS 執行,詳見 `docs/architecture/WBS.md`
---
## Memory 記錄 (9 筆)
| 類型 | 檔案 | 內容 |
|------|------|------|
| project | project_phoenix_rising.md | 戰略決策 |
| feedback | feedback_deployment_topology.md | 部署位置定義 |
| feedback | feedback_ai_fallback_order.md | AI 備援順序 |
| feedback | feedback_no_uat_environment.md | 禁止 UAT |
| feedback | feedback_path_based_routing.md | API 路徑路由 |
| feedback | feedback_dependencies_version_control.md | 依賴版控 |
| feedback | feedback_playwright_screenshot_video.md | E2E 截圖錄影 |
| feedback | feedback_weekly_report.md | 週報自動化 |
| reference | reference_four_hosts.md | 四主機架構 |
---
## K8s 配置檔案清單
| 檔案 | 用途 |
|------|------|
| `k8s/awoooi-prod/01-namespace-quota.yaml` | Namespace + 資源配額 |
| `k8s/awoooi-prod/02-network-policy.yaml` | 零信任網路策略 |
| `k8s/awoooi-prod/03-secrets.yaml` | Secrets 模板 |
| `k8s/awoooi-prod/04-configmap.yaml` | ConfigMap |
| `k8s/awoooi-prod/05-deployment-web.yaml` | Frontend Deployment |
| `k8s/awoooi-prod/06-deployment-api.yaml` | Backend Deployment |
| `k8s/awoooi-prod/kustomization.yaml` | Kustomize 配置 |
| `k8s/nginx/awoooi-prod.conf` | Nginx 路由配置 |
---
*Operation Phoenix Rising 會議結束。*
*Phase 0 正式啟動 (Engage)*
🎖️ **AWOOOI - Zero-Touch Ops. Human-Centric Decisions.**

View File

@@ -0,0 +1,279 @@
# AWOOOI C-Suite 戰略會議OpenClaw 實體化升級
> **日期**: 2026-03-21
> **地點**: Virtual War Room
> **主席**: 統帥 (CEO)
> **出席**: CTO/CIO、CPO、CISO
---
## 1. 會議議程
AWOOOI 2.0 - OpenClaw 實體化升級藍圖 (Phase 5)
### 核心目標
| 目標 | 說明 |
|------|------|
| **全面正名** | ClawBot → OpenClaw對齊開源社群 |
| **財務獨立** | Ollama-First 零 API 成本策略 |
| **行動決策** | Telegram Gateway 行動簽核通道 |
| **硬核防禦** | executor.py 封裝為 OpenClaw Skill |
---
## 2. 各部門專業評估
### 2.1 CTO/CIO 技術評估
**正名影響範圍**:
| 目錄 | 檔案數 |
|------|--------|
| apps/api/ | 12 |
| apps/web/ | 15 |
| docs/ | 18 |
| k8s/ | 2 |
| .claude/ | 1 |
| **合計** | ~48 |
**技術建議**:
1. `models.json` 集中管理 AI 路由設定 (P0)
2. `agent.md` 靈魂定義 (P0)
3. Telegram Gateway 隔離部署 (P1)
4. ContextGatherer 獨立模組 (P1)
**警示**: Telegram Bot 需 Webhook 反向連線K3s 需開放 Ingress 路由
### 2.2 CPO 產品評估
**使用者價值**:
| 功能 | 價值評分 |
|------|----------|
| Telegram 即時通知 | ⭐⭐⭐⭐⭐ |
| 手機遠端簽核 | ⭐⭐⭐⭐⭐ |
| Token 用量儀表板 | ⭐⭐⭐⭐ |
| OpenClaw 品牌統一 | ⭐⭐⭐ |
**產品建議**:
1. i18n 延伸至 Telegram 訊息模板 (P0)
2. 成本儀表板 UI (P2)
3. OpenClaw 品牌視覺更新 (P2)
**鐵律提醒**: UI 中所有 "ClawBot" 必須透過 i18n 更新
### 2.3 CISO 安全評估
**威脅分析**:
| 威脅向量 | 風險等級 | 緩解措施 |
|----------|----------|----------|
| Telegram Bot Token 洩漏 | 🔴 CRITICAL | K8s Secret |
| Webhook 偽造攻擊 | 🔴 CRITICAL | HMAC 簽章 |
| Prompt Injection via Alert | 🟡 HIGH | 輸入消毒 |
| Local LLM 供應鏈攻擊 | 🟢 LOW | 官方模型 |
**安全需求矩陣**:
| 功能 | 安全需求 | 實作方式 |
|------|----------|----------|
| Telegram Gateway | Bot Token 隔離 | K8s Secret |
| Webhook 接收 | 來源驗證 | X-Signature-256 HMAC |
| 遠端簽核 | 身份綁定 | Telegram user_id ↔ AWOOOI user_id |
| AI 回應解析 | 結構強制 | Pydantic strict mode |
**強制安全檢查清單**:
- [ ] Telegram Bot Token 存放於 K8s Secret
- [ ] Webhook endpoint 啟用 HMAC 簽章驗證
- [ ] Telegram user_id 白名單機制
- [ ] executor.py 呼叫鏈必須經過 TrustEngine
- [ ] AuditLog 記錄 Telegram 來源簽核
- [ ] Prompt Injection 防護測試
---
## 3. 三方共識決議
| 決議項目 | CTO | CPO | CISO | 結論 |
|----------|-----|-----|------|------|
| ClawBot → OpenClaw 正名 | ✅ | ✅ | ✅ | **通過** |
| Ollama-First 零成本策略 | ✅ | ✅ | ✅ | **通過** |
| Telegram Gateway 整合 | ✅ | ✅ | ⚠️ | **附安全條件通過** |
| executor.py Skill 封裝 | ✅ | N/A | ✅ | **通過** |
---
## 4. 修訂後 WBS (Work Breakdown Structure)
| Phase | 任務 | 負責 | 預估 | 前置條件 |
|-------|------|------|------|----------|
| 5.1 | 全專案正名 ClawBot → OpenClaw | CTO | 2h | 無 |
| 5.2 | agent.md 靈魂定義 + capabilities.json | CTO | 1h | 5.1 |
| 5.3 | models.json AI 路由設定 | CTO | 1h | 5.1 |
| 5.4 | ContextGatherer 告警上下文收集 | CTO | 2h | Phase 5 架構 |
| 5.5 | Telegram Gateway (含 HMAC 驗證) | CTO+CISO | 3h | 5.2, 5.3 |
| 5.6 | Telegram user_id 白名單 + 防重放 | CISO | 2h | 5.5 |
| 5.7 | executor.py → OpenClaw Skill 封裝 | CTO | 2h | 5.2 |
| 5.8 | i18n Telegram 訊息模板 | CPO | 1h | 5.5 |
| 5.9 | 整合測試 + 安全審核 | ALL | 2h | 5.1-5.8 |
**總預估**: 16 工時
---
## 5. 統帥待確認事項
### 5.1 Telegram 設定
- [ ] Telegram Bot Token (需統帥建立或提供)
- [ ] 統帥專屬 Chat ID
- [ ] 是否有其他授權簽核人員?
### 5.2 正名範圍
- [ ] 是否包含 Git commit history(建議: 不追溯)
- [ ] README.md 開源門面更新?
### 5.3 K3s 憑證
- [ ] `apps/api/k3s-prod.yaml` 目前 Blocker 狀態
- [ ] 請確認憑證是否已放置
---
## 6. 下一步行動
統帥確認上述事項後,立即執行 Phase 5.1 全專案正名。
**預期交付**:
1. 所有 ClawBot 字串替換為 OpenClaw
2. agent.md 身份定義檔案
3. Git status 報告
---
## 附錄OpenClaw 身份定義草稿
```markdown
# agent.md (SOUL)
## Identity
I am **OpenClaw**, the AI-powered Infrastructure Operations Engine for AWOOOI.
## Core Values
1. **Zero-Cost First**: Prioritize local Ollama for AI inference
2. **Human-in-the-Loop**: All CRITICAL actions require human approval
3. **Defense-in-Depth**: Dry-run before execute, audit everything
4. **Transparency**: Every decision is explainable and logged
## Capabilities
- Kubernetes cluster operations (restart, scale, delete pods)
- Root Cause Analysis via local LLM
- Multi-channel notifications (Web SSE, Telegram)
- Multi-signature approval for high-risk operations
## Boundaries
- NEVER bypass TrustEngine for CRITICAL operations
- NEVER store secrets in plain text
- NEVER execute without Dry-run validation
```
---
## 7. 首席架構師深度評審 (會後補充)
### 7.1 三點專業建議整合
| 建議 | 整合方案 | 新增模組 |
|------|---------|----------|
| **HMAC + 白名單** | `security_interceptor.py` 獨立攔截器 | Phase 5.4.2 |
| **訊息壓縮原則** | SOUL.md 定義強制格式 | Phase 5.0.2 |
| **日誌清洗** | ContextGatherer 過濾 DEBUG | Phase 5.2.1 |
### 7.2 已確認資源
| 資源 | 來源 | 值 |
|------|------|-----|
| Telegram Token | `wooo-aiops/clawbot/.env` | `8569***cpjMk` |
| Chat ID | `wooo-aiops/clawbot/.env` | `5619078117` |
| 白名單 | 統帥確認 | 初期僅統帥 |
### 7.3 修訂後總工時
| 原提案 | 安全強化 | 總計 |
|--------|----------|------|
| 16h | +8h | **24h** |
---
## 8. 整合確認
Phase 5 OpenClaw 升級計畫已整合至:
- `memory/project_phases.md` - 主專案狀態追蹤
- `memory/project_phase5_openclaw.md` - 詳細計畫
- `memory/feedback_openclaw_security.md` - CISO 安全需求
---
## 9. 執行進度報告 (2026-03-21)
### 9.1 已完成項目
| Phase | 項目 | 狀態 | 產出檔案 |
|-------|------|------|----------|
| 5.0.1 | 前端元件正名 | ✅ | `openclaw-panel.tsx`, `openclaw-state-machine.tsx` |
| 5.0.1 | 後端服務正名 | ✅ | `services/openclaw.py` |
| 5.0.1 | 元件匯出更新 | ✅ | `components/ai/index.ts` |
| 5.0.2 | SOUL.md 靈魂定義 | ✅ | `/SOUL.md` |
| 5.0.3 | capabilities.json | ✅ | `/capabilities.json` |
| 5.0.4 | models.json AI 路由 | ✅ | `apps/api/models.json` |
### 9.2 向後相容
為確保現有程式碼不會中斷,已建立向後相容別名:
```typescript
// 前端 (index.ts)
export { OpenClawPanel as ClawBotPanel } from './openclaw-panel'
export { OpenClawStateMachine as ClawBotStateMachine } from './openclaw-state-machine'
```
```python
# 後端 (openclaw.py)
ClawBotService = OpenClawService
get_clawbot = get_openclaw
```
### 9.3 新增檔案清單
| 路徑 | 說明 |
|------|------|
| `/SOUL.md` | OpenClaw 身份定義 + 訊息壓縮原則 |
| `/capabilities.json` | 允許操作 + 禁止操作定義 |
| `apps/api/models.json` | AI 路由集中設定 |
| `apps/web/src/components/ai/openclaw-panel.tsx` | 新 AI 面板元件 |
| `apps/web/src/components/ai/openclaw-state-machine.tsx` | 新狀態機元件 |
| `apps/api/src/services/openclaw.py` | 新 AI 服務 |
### 9.4 待完成項目
| Phase | 項目 | 狀態 |
|-------|------|------|
| 5.4.1 | config.py Telegram 設定 | 🔲 |
| 5.4.2 | security_interceptor.py | 🔲 |
| 5.2.1 | context_gatherer.py | 🔲 |
---
**會議結束時間**: 2026-03-21
**記錄人**: Claude (AI 首席架構師)
**會後補充**: 首席架構師深度評審 + 整合確認 + 執行進度報告

View File

@@ -0,0 +1,448 @@
# AWOOOI C-Suite 戰略會議紀錄
> **會議主題**: ChatGPT 架構分析回應與產品方向校準
>
> **會議時間**: 2026-03-22 (週六)
> **與會者**: CEO (統帥)、CTO/CIO (Claude Code)、CPO、CISO
> **會議形式**: 腦力激盪 + 戰略決策
---
## 1. 會議背景
### 1.1 觸發原因
統帥暫停日常開發,召集 C-Suite 討論產品方向的根本問題。
### 1.2 討論素材
| 來源 | 文件 | 核心觀點 |
|------|------|---------|
| ChatGPT | AIOps 平台差異分析報告 | AWOOOI 應從「Alert-Driven」升級為「Decision-Driven」 |
| ChatGPT | WOOO-AIOPS 監控盤點回應 | 需要 Incident Layer + Event Bus |
| Gemini | 首席架構師回應 | 同意 ChatGPT 核心觀點,提出務實落地方案 |
---
## 2. ChatGPT 核心觀點摘要
### 2.1 最大當頭棒喝
> **「你是在做更酷的 Datadog還是真正的 AI Ops OS」**
- 目前系統是 **Alert-Driven (告警驅動)**
- 目標應該是 **Decision-Driven (決策驅動)**
- 缺少 **Incident Layer** 來統一上下文
### 2.2 三個致命問題
| 問題 | 說明 |
|------|------|
| Alert ≠ Incident | 碎片化告警沒有聚合成有上下文的「事件」 |
| Remediation 是 rule-based | `if HighCPU -> scale_up` 不是 AI |
| 系統沒有「上下文記憶」 | Alert/Ticket/Repair/Logs 各自分散 |
### 2.3 建議的 Incident Schema
```json
{
"incident_id": "INC-2026-0322-001",
"status": "investigating",
"severity": "P0",
"signals": [...],
"hypotheses": [...],
"actions": [...],
"approvals": [...],
"timeline": [...]
}
```
### 2.4 建議的架構轉換
| 舊世界 | 新世界 |
|--------|--------|
| Alert | Signal |
| Alert Group | Incident |
| Ticket | Incident View |
| Remediation | Action |
| SLA | Incident Lifecycle |
---
## 3. C-Suite 專業評估
### 3.1 CTO (技術長) 評估
#### ChatGPT 觀點評價表
| 論點 | 評分 | 評估 |
|------|------|------|
| Alert-Driven vs Decision-Driven | ✅ 100% 正確 | 最核心的洞見 |
| 缺少 Incident Layer | 🟡 部分正確 | 有 ApprovalRequest 但確實沒有「事件聚合層」 |
| Remediation → Decision Proposal | ✅ 已具備基礎 | TrustEngine + BlastRadius 已是 Proposal 模式 |
| Event Bus 必要性 | 🟡 需評估 | Redis Streams 是輕量選項 |
| Stack 過度複雜 | ⚠️ 需警惕 | 不該複製 WOOO-AIOPS 全套 |
#### 現有架構優勢
| 元件 | 現況 | ChatGPT 建議 | 差距 |
|------|------|-------------|------|
| Multi-Sig Redis | ✅ 已實作 | Incident Model | 🟡 需升級 Schema |
| GraphRAG | ✅ In-Memory | Causal Inference | 🟡 需整合至決策流程 |
| TrustEngine | ✅ 風險分類完整 | Decision Proposal | ✅ **已符合** |
| SSE 串流 | ✅ 企業級實作 | Thinking Stream | ✅ **已具備** |
| Event Bus | ❌ 無 | Redis Streams | 🔴 需評估導入 |
#### CTO 結論
**ChatGPT 說對的:**
- 「Alert ≠ Incident」是真理我們需要事件聚合層
**ChatGPT 說錯的:**
- Event Bus 不是萬靈丹,對目前規模可能過度設計
- Cognitive System 是 3-5 年願景,不是下個 Phase
---
### 3.2 CIO (資訊長) 評估
#### 企業整合架構
```
Layer 3: AWOOOI (決策層) - 產出 Decision Proposal
Layer 2: OpenClaw (認知層) - 整合 GraphRAG + AI 分析
Layer 1: 188 基地 (感知層) - 收集 Metrics/Logs/Traces
```
#### CIO 結論
- **不要重造 Observability Stack** - 188 基地已有完整監控
- **AWOOOI 定位 = 決策層**,不是監控層
- **Incident Schema 是對的** - 可擴展現有 ApprovalRequest
---
### 3.3 CPO (產品長) 評估
#### 產品定位三選一
| 定位 | 描述 | 風險 |
|------|------|------|
| A. 更酷的 Datadog | 企業監控儀表板 + AI 裝飾 | 🔴 紅海競爭 |
| B. 智慧 SRE 助手 | AI 輔助運維決策 | 🟡 差異化不足 |
| C. AI Ops OS | 自主決策的運維大腦 | 🟢 **藍海定位** |
#### AWOOOI 差異化護城河
| 護城河 | 說明 |
|--------|------|
| **視覺護城河** | Nothing.tech 美學,市面無同類 |
| **體驗護城河** | OpenClaw 人格化AI 是「同事」不是「工具」 |
| **信任護城河** | Multi-Sig HITLAI 提案 + 人類批准 |
#### CPO 結論
我們缺的是:
1. **Incident 視角** - 用戶看到的是「待簽核列表」而非「事件生命週期」
2. **Knowledge Base** - SRE 無法查詢歷史處理經驗
3. **FinOps** - 「WASTED CLOUD BUDGET: $1,245」一個數字比十張圖表有效
---
### 3.4 CISO (資安長) 評估
#### 安全架構評估
| 面向 | 現況 | 評估 |
|------|------|------|
| Multi-Sig | ✅ Redis 分散式鎖 + 7 天 TTL | 穩固 |
| Audit Trail | ✅ 完整簽核記錄 | 可整合 Incident Timeline |
| RBAC | 🔲 未實作 | **優先補齊** |
| HMAC 驗證 | ✅ Telegram Gateway | 已有 |
#### CISO 結論
- **Event Bus 需確保安全性** - 訊息簽章、FIFO、防重放
- **Incident Model 對稽核有價值** - 可追溯完整決策鏈
- **最小可行架構 (MVA)** 優於最大可能架構
---
## 4. 共識與決議
### 4.1 共識點
| # | 共識 | 來源 |
|---|------|------|
| 1 | 目標是「AI Ops OS」不是「另一個 Datadog」 | ChatGPT |
| 2 | 需要 Incident Layer 聚合 Alert → Incident | ChatGPT + Gemini |
| 3 | 現有 TrustEngine 已具備 Decision Proposal 能力 | CTO 盤點 |
| 4 | 不要複製 WOOO-AIOPS 全套 stack | CIO + CISO |
| 5 | Event Bus 可用 Redis Streams 輕量實作 | Gemini |
### 4.2 Phase 6 升級計畫
| 原 Phase 6 項目 | 升級方向 | 工時 |
|----------------|---------|------|
| 6.1.1 Multi-Sig Redis | ✅ 已完成 | 0d |
| 6.1.2 GraphRAG Neo4j | → **Incident Engine v1** | 3d |
| 6.3.1 Redis Pub/Sub | → **Event Bus v1** (Redis Streams) | 2d |
| 新增 6.4 | Incident Schema + Timeline API | 2d |
| 新增 6.5 | Decision Proposal API | 2d |
### 4.3 Incident Schema v0.1 設計
```python
class Incident(BaseModel):
"""
事件模型 - 聚合告警、假說、提案、時間軸
"""
incident_id: str # INC-2026-0322-001
status: IncidentStatus # investigating / mitigating / resolved
severity: Severity # P0 / P1 / P2 / P3
# 感知層
signals: list[Signal] # 原始告警 (Prometheus/SignOz)
affected_services: list[str] # GraphRAG Blast Radius
# 認知層
hypothesis: AIHypothesis | None # AI 根因推論
confidence: float # 0.0 ~ 1.0
# 決策層
proposals: list[DecisionProposal] # 建議動作 (原 ApprovalRequest)
# 時間軸
timeline: list[TimelineEvent] # 事件演進
# Metadata
created_at: datetime
resolved_at: datetime | None
ttl_days: int = 7 # 資安稽核
```
### 4.4 明確不做的事
| 項目 | 原因 |
|------|------|
| 搬遷 Grafana/SignOz | 留在 188 基地AWOOOI 是決策層 |
| 實作 FinOps v1 | Phase 8+ 再說 |
| 導入 Kafka | Redis Streams 足夠 |
| 複製全套監控 stack | 過度工程 |
### 4.5 保留的優勢
| 優勢 | 說明 |
|------|------|
| Nothing.tech 美學 | 視覺護城河 |
| OpenClaw 人格化 | 體驗護城河 |
| Multi-Sig HITL | 信任護城河 |
| SSE 即時串流 | 技術基礎 |
| GraphRAG 因果推論 | AI 核心 |
---
## 5. 視覺化升級藍圖
### 5.1 首頁升級
| 現況 | 升級後 |
|------|--------|
| 顯示「待簽核列表」 | 顯示「活躍事件儀表板」 |
| ApprovalCard 單獨存在 | ApprovalCard 屬於某個 Incident |
| ThinkingTerminal 獨立面板 | ThinkingTerminal 呈現 Incident 的 AI 推論 |
### 5.2 Incident Dashboard 概念
```
┌──────────────────────────────────────────────────────────────────┐
│ ACTIVE INCIDENTS [2 ACTIVE] │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ INC-2026-0322-001 [P0 CRITICAL] │ │
│ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │
│ │ Status: INVESTIGATING │ │
│ │ │ │
│ │ Signals: 3 alerts (HighCPU, PodCrashLoop, HTTP502) │ │
│ │ Affected: frontend, auth-service, order-api │ │
│ │ Root Cause: postgres-db (confidence: 0.87) │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ PROPOSAL: Restart postgres-db pod │ │ │
│ │ │ Risk: HIGH | Signatures: 0/2 required │ │ │
│ │ │ [AUTHORIZE] [REJECT] │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Timeline: │ │
│ │ 14:32:01 - Alert triggered: HighCPU on frontend │ │
│ │ 14:32:15 - Alert triggered: PodCrashLoop on order-api │ │
│ │ 14:32:22 - OpenClaw: Analyzing blast radius... │ │
│ │ 14:32:45 - OpenClaw: Root cause identified - postgres-db │ │
│ │ 14:33:01 - Proposal created: Restart postgres-db │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
```
---
## 6. 下一步行動
### 6.1 立即行動
| 優先級 | 項目 | 負責人 |
|--------|------|--------|
| 🔴 P0 | 儲存會議記錄 | Claude Code |
| 🔴 P0 | 更新 Phase 狀態 Memory | Claude Code |
### 6.2 Phase 6 待辦
| 順序 | 項目 | 預估 |
|------|------|------|
| 1 | Incident Schema 設計 | 1d |
| 2 | Incident Engine v1 (整合 GraphRAG) | 3d |
| 3 | Event Bus v1 (Redis Streams) | 2d |
| 4 | Timeline API | 1d |
| 5 | Decision Proposal API | 2d |
### 6.3 UI 升級
| 順序 | 項目 | 預估 |
|------|------|------|
| 1 | Incident Card 組件 | 1d |
| 2 | Incident Dashboard 頁面 | 2d |
| 3 | Timeline 組件 | 1d |
---
## 7. 會議結論
### 7.1 一句話總結
> **AWOOOI 的使命是成為「AI Ops OS」— 一個能理解事件上下文、產出智慧決策提案、並透過 Human-in-the-Loop 執行的運維大腦。**
### 7.2 ChatGPT 的貢獻
感謝 ChatGPT 的外部視角,幫我們看到了系統的結構性盲點:
- ✅ 點出 Alert ≠ Incident 的核心問題
- ✅ 提出 Incident Layer 的架構建議
- ✅ 挑戰我們的產品定位
### 7.3 我們的回應
- ✅ 採納 Incident Layer 概念,整合進 Phase 6
- ✅ 採納 Event Bus 建議,使用 Redis Streams 輕量實作
- ⚠️ 不採納「Cognitive System」這種宏大願景採漸進式升級
- ⚠️ 不搬遷全套監控 stack保持 AWOOOI 作為決策層的純粹性
---
---
## 第二輪:施工順序與 Schema 設計 (續)
### Gemini 建議 vs C-Suite 修正
| 議題 | Gemini 建議 | C-Suite 修正 |
|------|------------|-------------|
| 施工順序 | Event Bus → Schema | Schema (契約) → Event Bus |
| 理由 | 「Schema-first 是反模式」 | 「Schema 是契約,有了契約才能平行開發」 |
### 共識Schema v0.2 納入項目
1. **AIDecisionChain** - CISO 要求的可稽核性
2. **IncidentOutcome** - CPO 要求的回饋循環
3. **proposal_ids: list[UUID]** - 支援多重決策軌跡
---
## 第三輪:遺漏項目補齊
### 發現的遺漏
| # | 遺漏 | 提出者 | 處理 |
|---|------|--------|------|
| 1 | Knowledge Base | CPO | Phase 7 |
| 2 | Feedback Loop | CPO | 納入 Schema |
| 3 | Multi-Agent 擴展性 | CPO | 架構預留 |
| 4 | AI 決策鏈可稽核性 | CISO | 納入 Schema |
| 5 | 記憶存取控制 (WORM) | CISO | Phase 6.2 |
---
## 第四輪:物理架構對齊
### 統帥提問
1. 四台主機各部署一組 OpenClaw 有意義嗎?
2. Ollama + Gemini/Claude 組成 AI 團隊更好嗎?
3. 開源優先,成本控制
4. 模組化、Open API 方向
### C-Suite 回答
| 問題 | 答案 |
|------|------|
| 多 OpenClaw | **❌ 錯誤** - 會腦分裂,應該單一大腦 + 分散感測器 |
| AI 團隊 | **✅ 分層調用** - P2/P3 用 OllamaP0/P1 動態升級 Claude/Gemini |
| 開源優先 | **✅ 符合** - Redis Streams、PostgreSQL、Ollama |
| 模組化 | **✅ Plugin 架構** - 已規劃 |
### 物理-邏輯對齊
| 主機 | 邏輯角色 | 部署內容 |
|------|---------|---------|
| .188 | 大腦 + 神經中樞 | OpenClaw, Redis, PG, Ollama |
| .110 | 感測器 (CI/CD) | Sensor Agent |
| .112 | 感測器 (安全) | Sensor Agent |
| .120 | 前端入口 | Frontend, API Gateway |
| .121 | 執行肌肉 | K8s Workloads |
---
## 最終遺漏檢查與 Schema v0.3
### 發現的整合問題
| 問題 | 解決方案 |
|------|---------|
| Schema 重複定義 (BlastRadius) | 從 approval.py 引入,不重新定義 |
| Severity vs RiskLevel 混淆 | 兩者並存Severity (事件) vs RiskLevel (操作) |
| 防腦分裂鐵律未寫入 | 已寫入 `.awoooi-agent-rules.md` |
### Schema v0.3 確認
- 復用現有 `BlastRadius`, `DryRunCheck`
- `proposal_ids: list[UUID]` 支援多重決策
- `AIDecisionChain` 完整可稽核
- `IncidentOutcome` 回饋循環
---
**會議結束時間**: 2026-03-22 18:00
**紀錄人員**: Claude Code (CTO/CIO)
**Phase 6.0 狀態**: ✅ 完成
**Phase 6.1 狀態**: ✅ 完成 (動態驗證通過 2026-03-22 15:29)
**下一步**: Phase 6.2 Memory Layer (Redis Hash + PostgreSQL)
---
## 會後更新 (2026-03-22 19:30)
### Phase 6.1 Event Bus 動態驗證通過
| 項目 | 結果 | 證據 |
|------|------|------|
| Producer XADD | ✅ | `message_id: 1774164545219-0` |
| HTTP 200 OK | ✅ | `duration_ms: 54.14` |
| Consumer XREADGROUP | ✅ | `signal_received` 結構化日誌 |
| XACK 確認 | ✅ | `pending: 0, lag: 0` |
**實作檔案**:
- `src/workers/signal_worker.py` (Consumer)
- `src/api/v1/webhooks.py` (Producer `/signals`)
- `src/main.py` (Lifespan 整合)
**統帥結語**: 「188 基地神經網路已正式通電!」

View File

@@ -0,0 +1,161 @@
# Telegram + Agent Teams 整合指南
> 版本: v1.0
> 日期: 2026-03-23
> 狀態: 實驗功能
## 概述
透過 Claude Code Channels 功能,讓統帥可以從 Telegram 遠端控制 Agent Teams。
## 前置需求
| 項目 | 需求 |
|------|------|
| Claude Code | >= 2.1.32 (目前: 2.1.81 ✅) |
| Agent Teams | 已啟用 ✅ |
| Telegram Bot | 已建立 (Token 在 .env) |
## 設定步驟
### 步驟 1: 確認環境變數
```bash
# 確認已加入 ~/.zshrc
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# 驗證
echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS # 應顯示 1
```
### 步驟 2: 安裝 Telegram Plugin
```bash
# 啟動 Claude Code
claude
# 在 Claude 對話中執行
/plugin install telegram@claude-plugins-official
```
### 步驟 3: 配置 Bot Token
```bash
# 在 Claude 對話中執行 (使用 AWOOOI 的 Bot Token)
# 注意:使用 Claude Code 專用 Bot不要用 OpenClaw Bot
# Claude Code Bot: @wooowooowooobot
/telegram:configure 8075645931:AAH-EGKMo8ZC4QJs-Nc1_0s92xHrGdQvdpg
```
### 步驟 4: 以 Channels 模式啟動
```bash
# 重新啟動 Claude Code with Telegram Channel
claude --channels plugin:telegram@claude-plugins-official
```
### 步驟 5: 配對 Telegram
1. 在 Telegram 找到你的 Bot發送任意訊息
2. Bot 會回傳配對碼 (例如: `ABC123`)
3. 在 Claude Code 中執行:
```
/telegram:access pair ABC123
```
4. 鎖定白名單:
```
/telegram:access policy allowlist
```
## 使用方式
### 從 Telegram 發指令
```
# 建立 Agent Team
Create an agent team with 3 teammates:
- Architect: docs/ + memory/
- Frontend: apps/web/
- Backend: apps/api/
# 查詢狀態
show task status
# 審查程式碼
review apps/web/ for i18n violations
```
### 遠端批准操作
當 Claude 需要批准時Telegram 會收到請求:
```
[Claude Code] 需要批准:
執行 kubectl apply -n awoooi-prod
回覆 "yes XXXXX" 批准,或 "no" 拒絕
```
回覆:
```
yes XXXXX
```
## 安全機制
| 機制 | 說明 |
|------|------|
| Sender Allowlist | 只有配對的 Telegram 用戶能發指令 |
| Permission Relay | 危險操作需遠端批准 |
| Local Listener | 不暴露公開 URL |
## 整合 AWOOOI 工作流
```
┌─────────────────────────────────────────────────┐
│ 統帥 Telegram │
│ ┌─────────────────────────────────────────┐ │
│ │ 「審查今天的 PR」 │ │
│ └─────────────────────────────────────────┘ │
│ ↓ │
│ Claude Code + Agent Teams │
│ ├── @security 資安審查 │
│ ├── @quality 代碼品質 │
│ └── 彙整 → Telegram 通知 │
│ ↓ │
│ ┌─────────────────────────────────────────┐ │
│ │ 「審查完成2 個 P1 問題,已自動修復」 │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
```
## Hooks 整合 (任務完成通知)
在 `~/.claude/settings.json` 加入:
```json
{
"hooks": {
"TaskCompleted": [
{
"type": "command",
"command": "curl -X POST https://api.telegram.org/bot$OPENCLAW_TG_BOT_TOKEN/sendMessage -d chat_id=$OPENCLAW_TG_CHAT_ID -d text='[Claude Code] 任務完成'"
}
]
}
}
```
## 故障排除
| 問題 | 解決 |
|------|------|
| Plugin 安裝失敗 | 確認 Claude Code >= 2.1.32 |
| 配對碼無效 | 重新發送訊息給 Bot |
| 指令無回應 | 確認 `--channels` 模式啟動 |
## 相關文檔
- [Claude Code Channels 官方文檔](https://code.claude.com/docs/en/channels.md)
- [Agent Teams 官方文檔](https://code.claude.com/docs/en/agent-teams.md)
- [Memory: reference_agent_teams.md](../../.claude/projects/-Users-ogt-awoooi/memory/reference_agent_teams.md)

View File

@@ -0,0 +1,368 @@
# AWOOOI 週報自動化機制
> **版本**: v1.0
> **建立日期**: 2026-03-20
> **負責人**: CTO
> **CEO 指示 #6**: 增加每週工作週報機制
---
## 概述
系統自動將該週各單位所有處理的工作,依照報告格式發出到 Email。
---
## 週報生成流程
```
┌─────────────────────────────────────────────────────────┐
│ 每週五 17:00 觸發 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 1. 收集各單位工作記錄 │
│ - Git commits (by author) │
│ - 部署記錄 │
│ - 告警處理記錄 │
│ - 工單完成記錄 │
│ - 審批記錄 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 2. AI 生成摘要 │
│ - 按單位分組 │
│ - 提取關鍵成果 │
│ - 識別風險項目 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 3. 生成報告 │
│ - HTML 格式 (Email) │
│ - Markdown 格式 (歸檔) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 4. 發送 Email │
│ - 收件人: C-Level + 團隊成員 │
│ - 抄送: CEO │
└─────────────────────────────────────────────────────────┘
```
---
## 週報格式
### Email 範本
```html
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: 'Inter', sans-serif; }
.header { background: #0A0A0A; color: white; padding: 20px; }
.section { margin: 20px 0; padding: 15px; background: #F5F5F5; }
.metric { display: inline-block; margin: 10px; text-align: center; }
.metric-value { font-size: 24px; font-weight: bold; }
.status-success { color: #10B981; }
.status-warning { color: #F59E0B; }
.status-critical { color: #EF4444; }
</style>
</head>
<body>
<div class="header">
<h1>AWOOOI 週報</h1>
<p>{{week_start}} - {{week_end}}</p>
</div>
<div class="section">
<h2>📊 本週摘要</h2>
<div class="metric">
<div class="metric-value">{{total_commits}}</div>
<div>Commits</div>
</div>
<div class="metric">
<div class="metric-value">{{total_deployments}}</div>
<div>部署次數</div>
</div>
<div class="metric">
<div class="metric-value">{{alerts_resolved}}</div>
<div>告警處理</div>
</div>
<div class="metric">
<div class="metric-value">{{tickets_closed}}</div>
<div>工單完成</div>
</div>
</div>
<div class="section">
<h2>👥 各單位工作</h2>
<h3>CTO 技術團隊</h3>
<ul>
{{#each cto_items}}
<li>{{this}}</li>
{{/each}}
</ul>
<h3>CIO 基建團隊</h3>
<ul>
{{#each cio_items}}
<li>{{this}}</li>
{{/each}}
</ul>
<h3>CPO 產品團隊</h3>
<ul>
{{#each cpo_items}}
<li>{{this}}</li>
{{/each}}
</ul>
<h3>CISO 安全團隊</h3>
<ul>
{{#each ciso_items}}
<li>{{this}}</li>
{{/each}}
</ul>
</div>
<div class="section">
<h2>⚠️ 風險與待辦</h2>
<ul>
{{#each risks}}
<li class="status-{{this.severity}}">{{this.description}}</li>
{{/each}}
</ul>
</div>
<div class="section">
<h2>📅 下週計畫</h2>
<ul>
{{#each next_week_plans}}
<li>{{this}}</li>
{{/each}}
</ul>
</div>
<footer style="margin-top: 20px; color: #6B7280; font-size: 12px;">
<p>此報告由 AWOOOI 系統自動生成</p>
<p>生成時間: {{generated_at}}</p>
</footer>
</body>
</html>
```
---
## 資料來源
### Git Commits
```python
# 取得本週 commits
async def get_weekly_commits(start_date: date, end_date: date) -> list[CommitSummary]:
result = subprocess.run(
[
"git", "log",
f"--since={start_date}",
f"--until={end_date}",
"--pretty=format:%H|%an|%ae|%s|%ai",
"--no-merges"
],
capture_output=True,
text=True
)
commits = []
for line in result.stdout.strip().split("\n"):
hash, author, email, subject, date = line.split("|")
commits.append(CommitSummary(
hash=hash,
author=author,
email=email,
subject=subject,
date=date
))
return commits
```
### 部署記錄
```python
async def get_weekly_deployments(start_date: date, end_date: date) -> list[Deployment]:
async with get_db() as db:
return await db.execute(
select(Deployment)
.where(Deployment.created_at >= start_date)
.where(Deployment.created_at < end_date)
.order_by(Deployment.created_at.desc())
).scalars().all()
```
### 告警處理
```python
async def get_weekly_alerts(start_date: date, end_date: date) -> AlertSummary:
async with get_db() as db:
total = await db.execute(
select(func.count(Alert.id))
.where(Alert.created_at >= start_date)
.where(Alert.created_at < end_date)
).scalar()
resolved = await db.execute(
select(func.count(Alert.id))
.where(Alert.resolved_at >= start_date)
.where(Alert.resolved_at < end_date)
).scalar()
return AlertSummary(total=total, resolved=resolved)
```
---
## 單位分組邏輯
### 根據 Git Author Email 分組
```python
TEAM_MAPPING = {
"cto": ["cto@wooo.work", "dev@wooo.work", "backend@wooo.work"],
"cio": ["cio@wooo.work", "infra@wooo.work", "ops@wooo.work"],
"cpo": ["cpo@wooo.work", "frontend@wooo.work", "design@wooo.work"],
"ciso": ["ciso@wooo.work", "security@wooo.work"],
}
def get_team_by_email(email: str) -> str:
for team, emails in TEAM_MAPPING.items():
if email in emails:
return team
return "other"
```
### 根據工作類型分組
```python
WORK_TYPE_MAPPING = {
"cto": ["api", "backend", "database", "ai"],
"cio": ["k8s", "nginx", "monitoring", "network"],
"cpo": ["ui", "frontend", "design", "i18n"],
"ciso": ["security", "rbac", "audit", "encryption"],
}
```
---
## AI 摘要生成
```python
async def generate_ai_summary(weekly_data: WeeklyData) -> str:
prompt = f"""
請根據以下本週工作資料,生成簡潔的週報摘要:
## Commits ({len(weekly_data.commits)} 筆)
{[c.subject for c in weekly_data.commits[:20]]}
## 部署 ({len(weekly_data.deployments)} 次)
{[d.description for d in weekly_data.deployments]}
## 告警處理
- 總數: {weekly_data.alerts.total}
- 已解決: {weekly_data.alerts.resolved}
請用繁體中文,按 CTO/CIO/CPO/CISO 分組,每組列出 3-5 項關鍵工作。
同時指出本週的風險項目和下週建議關注點。
"""
return await ai_router.generate(prompt, system_user_id="weekly-report")
```
---
## K8s CronJob 配置
```yaml
# k8s/jobs/weekly-report-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: awoooi-weekly-report
namespace: awoooi-prod
spec:
schedule: "0 17 * * 5" # 每週五 17:00
timeZone: "Asia/Taipei"
jobTemplate:
spec:
template:
spec:
containers:
- name: report-generator
image: awoooi-api:latest
command: ["python", "-m", "app.jobs.weekly_report"]
env:
- name: SMTP_HOST
valueFrom:
secretKeyRef:
name: awoooi-secrets
key: SMTP_HOST
- name: SMTP_USER
valueFrom:
secretKeyRef:
name: awoooi-secrets
key: SMTP_USER
- name: SMTP_PASSWORD
valueFrom:
secretKeyRef:
name: awoooi-secrets
key: SMTP_PASSWORD
restartPolicy: OnFailure
```
---
## 收件人配置
```yaml
# 環境變數配置
WEEKLY_REPORT_RECIPIENTS:
- ceo@wooo.work
- cto@wooo.work
- cio@wooo.work
- cpo@wooo.work
- ciso@wooo.work
WEEKLY_REPORT_CC:
- all-team@wooo.work
```
---
## 報告歸檔
每週報告自動保存至:
```
docs/reports/weekly/
├── 2026-W12.md
├── 2026-W13.md
└── ...
```
---
## 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-03-20 | v1.0 | 初版建立 | CTO |
---
*此文件由 CTO 維護,定義週報自動化機制的規範。*

Some files were not shown because too many files have changed in this diff Show More