docs(skills): Phase 14.2 CI/CD 架構審查 + dependency-cruiser 整合
- Skill 04: Runner 殭屍進程修復 + cancel-in-progress: false - Skill 05: 新增 SRE QA 內容 - Skill 06: dependency-cruiser 依賴治理 (Layer Model + ADR-014) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,7 +10,7 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.5 |
|
||||
| **版本** | v1.6 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 03:30 (台北) |
|
||||
@@ -26,6 +26,7 @@
|
||||
| v1.3 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.4 | 2026-03-26 | Claude Code | 新增部署層級決策鐵律 |
|
||||
| v1.5 | 2026-03-26 | Claude Code | **Phase 15 三層觀測架構 (Deep Linking)** |
|
||||
| v1.6 | 2026-03-26 | Claude Code | **Runner 殭屍進程修復 + CI/CD cancel-in-progress: false** |
|
||||
|
||||
---
|
||||
|
||||
@@ -234,6 +235,83 @@ runs-on: [self-hosted, harbor, k8s]
|
||||
# ❌ --no-gpg-sign
|
||||
```
|
||||
|
||||
### Concurrency 策略 (2026-03-26 教訓)
|
||||
|
||||
```yaml
|
||||
concurrency:
|
||||
group: cd-${{ github.workflow }}-${{ github.ref }}
|
||||
# 🔴 改為等待而非取消,避免 Runner _diag/pages 檔案衝突
|
||||
cancel-in-progress: false
|
||||
```
|
||||
|
||||
**原因**: `cancel-in-progress: true` 在 Runner 清理不完全時會造成:
|
||||
- `_diag/pages/*.log` 檔案衝突
|
||||
- Session Conflict 錯誤
|
||||
- set_output 檔案遺失
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Runner 殭屍進程修復 (2026-03-26 教訓)
|
||||
|
||||
> **問題**: CI/CD Workflow 反覆失敗 (set_output file missing / file already exists / Session Conflict)
|
||||
> **Memory**: `feedback_runner_zombie_process.md`
|
||||
|
||||
### 問題症狀
|
||||
|
||||
| 錯誤訊息 | 原因 |
|
||||
|---------|------|
|
||||
| `Missing file at path: _runner_file_commands/set_output_*` | Runner 目錄權限問題 |
|
||||
| `File already exists: _diag/pages/*.log` | 殭屍進程未清理 |
|
||||
| `TaskAgentSessionConflictException` | 多個 Runner.Listener 同時運行 |
|
||||
| `could not read Username for 'https://github.com'` | Git Auth Token 讀取失敗 |
|
||||
|
||||
### 修復流程 (Tier 2 需統帥確認)
|
||||
|
||||
```bash
|
||||
# Step 1: 停止服務
|
||||
sudo systemctl stop actions.runner.owenhytsai-awoooi.awoooi-110.service
|
||||
sudo systemctl stop actions.runner.owenhytsai-awoooi.awoooi-110-2.service
|
||||
|
||||
# Step 2: 權限校正 (解決 sudo 造成的 root 擁有權)
|
||||
sudo chown -R wooo:wooo /home/wooo/actions-runner-awoooi
|
||||
sudo chown -R wooo:wooo /home/wooo/actions-runner-awoooi-2
|
||||
|
||||
# Step 3: 殺死殭屍進程
|
||||
pkill -9 -u wooo -f 'Runner'
|
||||
|
||||
# Step 4: 安全洗地(不加 sudo)
|
||||
rm -rf /home/wooo/actions-runner-awoooi/_work/*
|
||||
rm -rf /home/wooo/actions-runner-awoooi-2/_work/*
|
||||
rm -rf /home/wooo/actions-runner-awoooi*/_diag/pages/*
|
||||
|
||||
# Step 5: 重啟服務
|
||||
sudo systemctl start actions.runner.owenhytsai-awoooi.awoooi-110.service
|
||||
sudo systemctl start actions.runner.owenhytsai-awoooi.awoooi-110-2.service
|
||||
```
|
||||
|
||||
### 診斷指令
|
||||
|
||||
```bash
|
||||
# 檢查殭屍進程
|
||||
ps aux | grep -E 'Runner.Listener|Runner.Worker' | grep -v grep
|
||||
|
||||
# 檢查 Session 衝突日誌
|
||||
tail -50 ~/actions-runner-awoooi-2/_diag/Runner_*.log | grep -i conflict
|
||||
|
||||
# 驗證權限
|
||||
ls -la ~/actions-runner-awoooi*/_work/_temp/
|
||||
```
|
||||
|
||||
### Workflow 預防措施
|
||||
|
||||
```yaml
|
||||
# 每個 Job 開始時清理暫存目錄
|
||||
- name: "Clean Runner temp"
|
||||
run: |
|
||||
RUNNER_ROOT=$(dirname "$(dirname "$RUNNER_TEMP")")
|
||||
rm -rf "$RUNNER_TEMP"/* "$RUNNER_ROOT/_diag/pages"/* .claude/worktrees 2>/dev/null || true
|
||||
```
|
||||
|
||||
### Telegram 通報 (閉環)
|
||||
|
||||
```bash
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.3 |
|
||||
| **版本** | v1.4 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 03:30 (台北) |
|
||||
@@ -24,6 +24,7 @@
|
||||
| v1.1 | 2026-03-24 | Claude Code | 禁止 Mock 測試鐵律 |
|
||||
| v1.2 | 2026-03-25 | Claude Code | 加入文件資訊區塊 |
|
||||
| v1.3 | 2026-03-26 | Claude Code | **Phase 15 觀測性測試** |
|
||||
| v1.4 | 2026-03-26 | Claude Code | **Runner 殭屍進程診斷流程** |
|
||||
|
||||
---
|
||||
|
||||
@@ -482,6 +483,55 @@ with restore_trace_context({"trace_id": "", "span_id": ""}) as span:
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Runner 殭屍進程診斷 (2026-03-26 新增)
|
||||
|
||||
> **問題**: CI/CD Workflow 反覆失敗,錯誤訊息變化多端
|
||||
> **Memory**: `feedback_runner_zombie_process.md`
|
||||
|
||||
### 診斷流程
|
||||
|
||||
#### Step 1: 識別症狀
|
||||
|
||||
| 錯誤訊息 | 可能原因 |
|
||||
|---------|---------|
|
||||
| `Missing file at path: _runner_file_commands/set_output_*` | 權限問題 (sudo 造成) |
|
||||
| `File already exists: _diag/pages/*.log` | 殭屍進程未清理 |
|
||||
| `TaskAgentSessionConflictException` | 多個 Runner.Listener |
|
||||
| `terminal prompts disabled` | Git Auth Token 讀取失敗 |
|
||||
|
||||
#### Step 2: 診斷指令
|
||||
|
||||
```bash
|
||||
# 檢查殭屍進程
|
||||
ps aux | grep -E 'Runner.Listener|Runner.Worker' | grep -v grep
|
||||
|
||||
# 檢查 Session 衝突
|
||||
tail -50 ~/actions-runner-awoooi-2/_diag/Runner_*.log | grep -i conflict
|
||||
|
||||
# 檢查目錄權限 (應為 wooo:wooo)
|
||||
ls -la ~/actions-runner-awoooi*/_work/_temp/
|
||||
```
|
||||
|
||||
#### Step 3: 判斷處理層級
|
||||
|
||||
| 情況 | 層級 | 動作 |
|
||||
|------|------|------|
|
||||
| 單純暫存檔案衝突 | Tier 1 | 等待 Workflow 自動重試 |
|
||||
| 權限問題 | Tier 2 | 通報統帥,執行 chown 修復 |
|
||||
| 殭屍進程 | Tier 2 | 通報統帥,執行 pkill 清理 |
|
||||
| 服務完全卡死 | Tier 3 | 統帥親自處理服務重啟 |
|
||||
|
||||
### 完整修復 SOP (Tier 2)
|
||||
|
||||
```bash
|
||||
# 詳見 Skill 04 - Runner 殭屍進程修復
|
||||
# 或參考 Memory: feedback_runner_zombie_process.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `apps/web/playwright.config.ts`: Playwright 設定
|
||||
@@ -491,3 +541,4 @@ with restore_trace_context({"trace_id": "", "span_id": ""}) as span:
|
||||
- `src/core/deep_linking.py`: **👁️ Deep Linking URL 生成器**
|
||||
- `src/core/telemetry.py`: **Phase 15.2 Trace Context**
|
||||
- `memory/project_phase15_langfuse.md`: **📊 Phase 15 完整記錄**
|
||||
- `memory/feedback_runner_zombie_process.md`: **🚨 Runner 殭屍進程修復**
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
| 欄位 | 值 |
|
||||
|------|-----|
|
||||
| **版本** | v1.4 |
|
||||
| **版本** | v1.5 |
|
||||
| **建立日期** | 2026-03-20 (台北) |
|
||||
| **建立者** | Claude Code |
|
||||
| **最後修改** | 2026-03-26 15:40 (台北) |
|
||||
@@ -25,6 +25,7 @@
|
||||
| v1.2 | 2026-03-26 | Claude Code | 新增紅區治理 + Git Hooks 章節 |
|
||||
| v1.3 | 2026-03-26 | Claude Code | 首席架構師審查流程 + 審查週期調整 (每週) |
|
||||
| v1.4 | 2026-03-26 | Claude Code | 🔴 新增「封存而非刪除」策略 (統帥裁示) |
|
||||
| v1.5 | 2026-03-26 | Claude Code | **dependency-cruiser 依賴治理整合 (Phase 14.2)** |
|
||||
|
||||
---
|
||||
|
||||
@@ -323,6 +324,71 @@ scripts/hooks/pre-commit # 原始檔 (tracked)
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 🔗 dependency-cruiser 依賴治理 (Phase 14.2)
|
||||
|
||||
> **ADR-014**: 前端依賴分層治理
|
||||
> **配置檔**: `.dependency-cruiser.cjs`
|
||||
|
||||
### Layer Model
|
||||
|
||||
```
|
||||
Layer 0: app/ (Pages - 可引用所有)
|
||||
↓
|
||||
Layer 1: components/ (Features - 禁止互相引用)
|
||||
│ ├── agent/
|
||||
│ ├── approval/
|
||||
│ ├── incident/
|
||||
│ └── dashboard/
|
||||
↓
|
||||
Layer 2: shared/layout (禁止下行引用 Layer 1)
|
||||
↓
|
||||
Layer 3: ui/lib/stores/hooks (純工具層 - 禁止引用 components)
|
||||
```
|
||||
|
||||
### 檢查指令
|
||||
|
||||
```bash
|
||||
# 掃描前端依賴違規
|
||||
pnpm dep-check
|
||||
|
||||
# 輸出格式: severity | rule | from → to
|
||||
```
|
||||
|
||||
### 規則清單
|
||||
|
||||
| 規則 | 嚴重度 | 說明 |
|
||||
|------|--------|------|
|
||||
| `feature-isolation-*` | error | Feature 禁止互相引用 |
|
||||
| `shared-no-feature-import` | error | Shared 禁止引用 Feature |
|
||||
| `ui-no-feature-import` | error | UI 禁止引用 Feature/Shared |
|
||||
| `components-no-app-import` | error | Components 禁止引用 app |
|
||||
| `no-circular` | error | 禁止循環依賴 |
|
||||
| `hooks-no-component-import` | warn | Hooks 禁止引用 Components |
|
||||
| `stores-no-component-import` | warn | Stores 禁止引用 Components |
|
||||
|
||||
### 違規範例
|
||||
|
||||
```typescript
|
||||
// ❌ 違反 feature-isolation-agent
|
||||
// apps/web/src/components/agent/AgentChat.tsx
|
||||
import { ApprovalCard } from '../approval/ApprovalCard' // error!
|
||||
|
||||
// ✅ 正確: 使用 shared 層
|
||||
import { Card } from '../ui/card'
|
||||
```
|
||||
|
||||
### CI 整合
|
||||
|
||||
```yaml
|
||||
# .github/workflows/ci.yaml
|
||||
- name: Check dependencies
|
||||
run: pnpm dep-check
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 參考文檔
|
||||
|
||||
- `turbo.json`: Turborepo 配置
|
||||
@@ -331,3 +397,5 @@ scripts/hooks/pre-commit # 原始檔 (tracked)
|
||||
- `docs/LOGBOOK.md`: 進度追蹤
|
||||
- `docs/RED_ZONES.md`: 紅區治理手冊
|
||||
- `scripts/hooks/pre-commit`: 紅區 Hook 腳本
|
||||
- `.dependency-cruiser.cjs`: **Phase 14.2 依賴治理規則**
|
||||
- `docs/adr/ADR-014-dependency-governance.md`: **ADR-014 決策記錄**
|
||||
|
||||
@@ -8,7 +8,7 @@ Docker, K3s, Nginx, Host Networking
|
||||
|
||||
## 核心約束 (AWOOOI 憲法)
|
||||
1. **防止腦分裂 (Split Brain Prevention)**:
|
||||
- 牢記四主機架構:`.110` (金庫)、`.112` (安全)、`.120/.121` (K3s 資源)、`.188` (唯一大腦,包含 Nginx/Ollama/ClawBot/SigNoz)。
|
||||
- 牢記四主機架構:`.110` (金庫)、`.112` (安全)、`.120/.121` (K3s 資源)、`.188` (唯一大腦,包含 Nginx/Ollama/OpenClaw/SigNoz)。
|
||||
- 嚴禁在 `.188` 以外的主機部署會做決策的 AI 模型。
|
||||
|
||||
2. **授權分級 (Authorization Tiers)**:
|
||||
|
||||
Reference in New Issue
Block a user