diff --git a/.agents/skills/04-awoooi-devops-commander.md b/.agents/skills/04-awoooi-devops-commander.md index 1e9c347f..99920bdc 100644 --- a/.agents/skills/04-awoooi-devops-commander.md +++ b/.agents/skills/04-awoooi-devops-commander.md @@ -10,7 +10,7 @@ | 欄位 | 值 | |------|-----| -| **版本** | v1.5 | +| **版本** | v1.6 | | **建立日期** | 2026-03-20 (台北) | | **建立者** | Claude Code | | **最後修改** | 2026-03-26 03:30 (台北) | @@ -26,6 +26,7 @@ | v1.3 | 2026-03-25 | Claude Code | 加入文件資訊區塊 | | v1.4 | 2026-03-26 | Claude Code | 新增部署層級決策鐵律 | | v1.5 | 2026-03-26 | Claude Code | **Phase 15 三層觀測架構 (Deep Linking)** | +| v1.6 | 2026-03-26 | Claude Code | **Runner 殭屍進程修復 + CI/CD cancel-in-progress: false** | --- @@ -234,6 +235,83 @@ runs-on: [self-hosted, harbor, k8s] # ❌ --no-gpg-sign ``` +### Concurrency 策略 (2026-03-26 教訓) + +```yaml +concurrency: + group: cd-${{ github.workflow }}-${{ github.ref }} + # 🔴 改為等待而非取消,避免 Runner _diag/pages 檔案衝突 + cancel-in-progress: false +``` + +**原因**: `cancel-in-progress: true` 在 Runner 清理不完全時會造成: +- `_diag/pages/*.log` 檔案衝突 +- Session Conflict 錯誤 +- set_output 檔案遺失 + +--- + +## 🚨 Runner 殭屍進程修復 (2026-03-26 教訓) + +> **問題**: CI/CD Workflow 反覆失敗 (set_output file missing / file already exists / Session Conflict) +> **Memory**: `feedback_runner_zombie_process.md` + +### 問題症狀 + +| 錯誤訊息 | 原因 | +|---------|------| +| `Missing file at path: _runner_file_commands/set_output_*` | Runner 目錄權限問題 | +| `File already exists: _diag/pages/*.log` | 殭屍進程未清理 | +| `TaskAgentSessionConflictException` | 多個 Runner.Listener 同時運行 | +| `could not read Username for 'https://github.com'` | Git Auth Token 讀取失敗 | + +### 修復流程 (Tier 2 需統帥確認) + +```bash +# Step 1: 停止服務 +sudo systemctl stop actions.runner.owenhytsai-awoooi.awoooi-110.service +sudo systemctl stop actions.runner.owenhytsai-awoooi.awoooi-110-2.service + +# Step 2: 權限校正 (解決 sudo 造成的 root 擁有權) +sudo chown -R wooo:wooo /home/wooo/actions-runner-awoooi +sudo chown -R wooo:wooo /home/wooo/actions-runner-awoooi-2 + +# Step 3: 殺死殭屍進程 +pkill -9 -u wooo -f 'Runner' + +# Step 4: 安全洗地(不加 sudo) +rm -rf /home/wooo/actions-runner-awoooi/_work/* +rm -rf /home/wooo/actions-runner-awoooi-2/_work/* +rm -rf /home/wooo/actions-runner-awoooi*/_diag/pages/* + +# Step 5: 重啟服務 +sudo systemctl start actions.runner.owenhytsai-awoooi.awoooi-110.service +sudo systemctl start actions.runner.owenhytsai-awoooi.awoooi-110-2.service +``` + +### 診斷指令 + +```bash +# 檢查殭屍進程 +ps aux | grep -E 'Runner.Listener|Runner.Worker' | grep -v grep + +# 檢查 Session 衝突日誌 +tail -50 ~/actions-runner-awoooi-2/_diag/Runner_*.log | grep -i conflict + +# 驗證權限 +ls -la ~/actions-runner-awoooi*/_work/_temp/ +``` + +### Workflow 預防措施 + +```yaml +# 每個 Job 開始時清理暫存目錄 +- name: "Clean Runner temp" + run: | + RUNNER_ROOT=$(dirname "$(dirname "$RUNNER_TEMP")") + rm -rf "$RUNNER_TEMP"/* "$RUNNER_ROOT/_diag/pages"/* .claude/worktrees 2>/dev/null || true +``` + ### Telegram 通報 (閉環) ```bash diff --git a/.agents/skills/05-awoooi-sre-qa.md b/.agents/skills/05-awoooi-sre-qa.md index 2d1771c5..dd420b14 100644 --- a/.agents/skills/05-awoooi-sre-qa.md +++ b/.agents/skills/05-awoooi-sre-qa.md @@ -10,7 +10,7 @@ | 欄位 | 值 | |------|-----| -| **版本** | v1.3 | +| **版本** | v1.4 | | **建立日期** | 2026-03-20 (台北) | | **建立者** | Claude Code | | **最後修改** | 2026-03-26 03:30 (台北) | @@ -24,6 +24,7 @@ | v1.1 | 2026-03-24 | Claude Code | 禁止 Mock 測試鐵律 | | v1.2 | 2026-03-25 | Claude Code | 加入文件資訊區塊 | | v1.3 | 2026-03-26 | Claude Code | **Phase 15 觀測性測試** | +| v1.4 | 2026-03-26 | Claude Code | **Runner 殭屍進程診斷流程** | --- @@ -482,6 +483,55 @@ with restore_trace_context({"trace_id": "", "span_id": ""}) as span: --- +--- + +## 🚨 Runner 殭屍進程診斷 (2026-03-26 新增) + +> **問題**: CI/CD Workflow 反覆失敗,錯誤訊息變化多端 +> **Memory**: `feedback_runner_zombie_process.md` + +### 診斷流程 + +#### Step 1: 識別症狀 + +| 錯誤訊息 | 可能原因 | +|---------|---------| +| `Missing file at path: _runner_file_commands/set_output_*` | 權限問題 (sudo 造成) | +| `File already exists: _diag/pages/*.log` | 殭屍進程未清理 | +| `TaskAgentSessionConflictException` | 多個 Runner.Listener | +| `terminal prompts disabled` | Git Auth Token 讀取失敗 | + +#### Step 2: 診斷指令 + +```bash +# 檢查殭屍進程 +ps aux | grep -E 'Runner.Listener|Runner.Worker' | grep -v grep + +# 檢查 Session 衝突 +tail -50 ~/actions-runner-awoooi-2/_diag/Runner_*.log | grep -i conflict + +# 檢查目錄權限 (應為 wooo:wooo) +ls -la ~/actions-runner-awoooi*/_work/_temp/ +``` + +#### Step 3: 判斷處理層級 + +| 情況 | 層級 | 動作 | +|------|------|------| +| 單純暫存檔案衝突 | Tier 1 | 等待 Workflow 自動重試 | +| 權限問題 | Tier 2 | 通報統帥,執行 chown 修復 | +| 殭屍進程 | Tier 2 | 通報統帥,執行 pkill 清理 | +| 服務完全卡死 | Tier 3 | 統帥親自處理服務重啟 | + +### 完整修復 SOP (Tier 2) + +```bash +# 詳見 Skill 04 - Runner 殭屍進程修復 +# 或參考 Memory: feedback_runner_zombie_process.md +``` + +--- + ## 參考文檔 - `apps/web/playwright.config.ts`: Playwright 設定 @@ -491,3 +541,4 @@ with restore_trace_context({"trace_id": "", "span_id": ""}) as span: - `src/core/deep_linking.py`: **👁️ Deep Linking URL 生成器** - `src/core/telemetry.py`: **Phase 15.2 Trace Context** - `memory/project_phase15_langfuse.md`: **📊 Phase 15 完整記錄** +- `memory/feedback_runner_zombie_process.md`: **🚨 Runner 殭屍進程修復** diff --git a/.agents/skills/06-awoooi-monorepo-master.md b/.agents/skills/06-awoooi-monorepo-master.md index e36eb87a..d4f566a7 100644 --- a/.agents/skills/06-awoooi-monorepo-master.md +++ b/.agents/skills/06-awoooi-monorepo-master.md @@ -10,7 +10,7 @@ | 欄位 | 值 | |------|-----| -| **版本** | v1.4 | +| **版本** | v1.5 | | **建立日期** | 2026-03-20 (台北) | | **建立者** | Claude Code | | **最後修改** | 2026-03-26 15:40 (台北) | @@ -25,6 +25,7 @@ | v1.2 | 2026-03-26 | Claude Code | 新增紅區治理 + Git Hooks 章節 | | v1.3 | 2026-03-26 | Claude Code | 首席架構師審查流程 + 審查週期調整 (每週) | | v1.4 | 2026-03-26 | Claude Code | 🔴 新增「封存而非刪除」策略 (統帥裁示) | +| v1.5 | 2026-03-26 | Claude Code | **dependency-cruiser 依賴治理整合 (Phase 14.2)** | --- @@ -323,6 +324,71 @@ scripts/hooks/pre-commit # 原始檔 (tracked) --- +--- + +## 🔗 dependency-cruiser 依賴治理 (Phase 14.2) + +> **ADR-014**: 前端依賴分層治理 +> **配置檔**: `.dependency-cruiser.cjs` + +### Layer Model + +``` +Layer 0: app/ (Pages - 可引用所有) + ↓ +Layer 1: components/ (Features - 禁止互相引用) + │ ├── agent/ + │ ├── approval/ + │ ├── incident/ + │ └── dashboard/ + ↓ +Layer 2: shared/layout (禁止下行引用 Layer 1) + ↓ +Layer 3: ui/lib/stores/hooks (純工具層 - 禁止引用 components) +``` + +### 檢查指令 + +```bash +# 掃描前端依賴違規 +pnpm dep-check + +# 輸出格式: severity | rule | from → to +``` + +### 規則清單 + +| 規則 | 嚴重度 | 說明 | +|------|--------|------| +| `feature-isolation-*` | error | Feature 禁止互相引用 | +| `shared-no-feature-import` | error | Shared 禁止引用 Feature | +| `ui-no-feature-import` | error | UI 禁止引用 Feature/Shared | +| `components-no-app-import` | error | Components 禁止引用 app | +| `no-circular` | error | 禁止循環依賴 | +| `hooks-no-component-import` | warn | Hooks 禁止引用 Components | +| `stores-no-component-import` | warn | Stores 禁止引用 Components | + +### 違規範例 + +```typescript +// ❌ 違反 feature-isolation-agent +// apps/web/src/components/agent/AgentChat.tsx +import { ApprovalCard } from '../approval/ApprovalCard' // error! + +// ✅ 正確: 使用 shared 層 +import { Card } from '../ui/card' +``` + +### CI 整合 + +```yaml +# .github/workflows/ci.yaml +- name: Check dependencies + run: pnpm dep-check +``` + +--- + ## 參考文檔 - `turbo.json`: Turborepo 配置 @@ -331,3 +397,5 @@ scripts/hooks/pre-commit # 原始檔 (tracked) - `docs/LOGBOOK.md`: 進度追蹤 - `docs/RED_ZONES.md`: 紅區治理手冊 - `scripts/hooks/pre-commit`: 紅區 Hook 腳本 +- `.dependency-cruiser.cjs`: **Phase 14.2 依賴治理規則** +- `docs/adr/ADR-014-dependency-governance.md`: **ADR-014 決策記錄** diff --git a/.agents/workflows/awoooi-devops-commander.md b/.agents/workflows/awoooi-devops-commander.md index fe921ca9..5143cf89 100644 --- a/.agents/workflows/awoooi-devops-commander.md +++ b/.agents/workflows/awoooi-devops-commander.md @@ -8,7 +8,7 @@ Docker, K3s, Nginx, Host Networking ## 核心約束 (AWOOOI 憲法) 1. **防止腦分裂 (Split Brain Prevention)**: - - 牢記四主機架構:`.110` (金庫)、`.112` (安全)、`.120/.121` (K3s 資源)、`.188` (唯一大腦,包含 Nginx/Ollama/ClawBot/SigNoz)。 + - 牢記四主機架構:`.110` (金庫)、`.112` (安全)、`.120/.121` (K3s 資源)、`.188` (唯一大腦,包含 Nginx/Ollama/OpenClaw/SigNoz)。 - 嚴禁在 `.188` 以外的主機部署會做決策的 AI 模型。 2. **授權分級 (Authorization Tiers)**: