Commit Graph

36 Commits

Author SHA1 Message Date
OG T
fe77e6d297 fix(ai): SuggestedAction enum 擴充 + Pydantic fallback 防護
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 10m48s
Type Sync Check / check-type-sync (push) Failing after 2m52s
根本原因: NemoTron 輸出 "investigate" → Pydantic 只接受 4 個值 → 爆炸
→ openclaw_analysis_parse_failed → analysis_result=None → 全部 fallback 卡片顯示「待分析」

修復:
1. SuggestedAction enum 新增 INVESTIGATE/OBSERVE/APPLY_HPA/TUNE_RESOURCES
   (prompt.py 列了 6 個,enum 只有 4 個,prompt/model 不同步是根源)
2. normalize_suggested_action validator: uppercase + 別名映射 + 未知值 fallback NO_ACTION
   確保任何 LLM 輸出都不會讓 Pydantic 爆炸導致 analysis_result = None

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:36:36 +08:00
OG T
800ab1685f fix(playbook+flywheel): 修復 PlaybookSource enum + repair_steps 相容 + KM stats raw SQL
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 14m58s
Type Sync Check / check-type-sync (push) Failing after 1m17s
修復三個串聯 bug,讓 Playbook seed 能正常執行:

1. PlaybookSource 新增 YAML_RULE enum(alert_rules.yaml 匯入專用)
2. playbook_seed_service: source=YAML_RULE,dedup 改用 raw SQL by name,
   不再呼叫 list_playbooks(舊格式 repair_steps 會 validation error)
3. playbook_repository._orm_to_pydantic: 舊格式 repair_steps 補齊
   step_number/action_type 必填欄位(向下相容)
4. flywheel_stats_service: embedding IS NULL 改用 raw SQL,
   修復 KnowledgeEntryRecord ORM 無 embedding 屬性的 AttributeError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:32:04 +08:00
OG T
7da64eaad2 feat(Phase 3): 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 19m7s
Type Sync Check / check-type-sync (push) Failing after 1m18s
ADR-083 Phase 3 學習閉環重建:

**三根因修復**
- approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2
  (成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash)
- models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位
- decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id
- learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback)

**2x EWMA 負向強化**
- models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位)
- repositories/playbook_repository.py: update_stats 加 EWMA
  成功: trust = 0.9 × old + 0.1 × 1.0
  失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x)
  trust < 0.1 → log warning,等 Evolver 封存

**Evolver Agent(新建)**
- services/playbook_evolver.py: 三功能全靜態規則
  1. 低信任封存: trust < 0.1 → DEPRECATED
  2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED
  3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
  AIOPS_P3_EVOLVER_ENABLED=False 預設關閉

**文件**
- ADR-083 學習閉環重建
- MASTER §8 Phase 3 完工記錄

AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟

Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
2026-04-15 14:01:37 +08:00
OG T
914c7e7a90 fix: 9b9ff5b 引發的 NoneAttr bug — incident_id 上移到 Base
Some checks failed
CD Pipeline / build-and-deploy (push) Has started running
Type Sync Check / check-type-sync (push) Failing after 1m17s
bug: 'ApprovalRequestCreate' object has no attribute 'incident_id'
Live-fire #6 整個 webhook 500 fail。

根因: 9b9ff5b 在 approval_db 寫 request.incident_id,
但 ApprovalRequestCreate 繼承 Base 沒這 field(只在 ApprovalRequest 才有)。

修復: 把 incident_id 上移到 ApprovalRequestBase
- ApprovalRequestCreate 自動繼承 → webhook 可建帶 incident_id 的 request
- ApprovalRequest 不重複定義
- 786/786 回歸測試全過

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 20:01:47 +08:00
OG T
325b3851b5 feat(adr-071): 告警通知四類型第一批 B/C/E/F/G/H 全實作
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 1m7s
ADR-071-B: classify_notification() — 五型分類器 (TYPE-1/2/3/4/4D)
ADR-071-C: send_info_notification() — TYPE-1 純資訊無按鈕卡片
ADR-071-E: _build_inline_keyboard() — 依 alert_category 動態組合 TYPE-3 按鈕
ADR-071-F: send_drift_card() — TYPE-4D Config Drift 卡片 + Diff 截斷
ADR-071-G: km_conversion_service.py — Incident RESOLVED 自動轉 KM
ADR-071-H: handle_manual_fix_done() — TYPE-4 手動修復 Bot 對話閉環

前批已完成: ADR-071-A (DB Migration) + ADR-071-D (狀態機守衛)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 02:24:20 +08:00
OG T
88696dba9b feat(sprint5.1): Data Safety Guardrails 全鏈路整合 (L1-L5)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m33s
Type Sync Check / check-type-sync (push) Failing after 58s
Layer 0 - K8s RBAC:
  - k8s/rbac/api-velero-reader.yaml: awoooi-executor SA Velero backup reader

Layer 1 - DB Migration (已在 188 執行):
  - M-002: approval_records 新增 approval_level/votes/required_votes
  - M-003: alert_event_type ENUM 新增 8 個值

Layer 2 - IaC:
  - ops/config/service-registry.yaml: 全服務 Stateful 分級清單 (BLOCK/CRITICAL_HITL/STANDARD_HITL/AUTO)

Layer 3 - Python Services:
  - service_registry.py: 讀取 YAML,提供 is_blocked/requires_multisig/get_required_votes
  - velero_client.py: kubectl 查詢 Velero 備份年齡,失敗 fallback 999h
  - preflight_service.py: Pre-flight 安全檢查 (Q2/Q4 決策)

Layer 1-M001 - Playbook model:
  - playbook.py: 新增 requires_approval_level/stateful_targets/requires_pre_backup

Layer 4 - 業務邏輯:
  - alert_operation_log_repository.py: 新增 8 個 event_type (Guardrail/Pre-flight/MultiSig/備份)
  - auto_repair_service.py: 注入 Service Registry Guardrail 檢查 (BLOCK → 直接拒絕)
  - webhooks.py: ALERT_RECEIVED 溯源記錄 + auto_repair flag Q9 + Langfuse trace_id Q10
  - db/models.py: ApprovalRecord 同步 approval_level/votes/required_votes 欄位
  - docker-health-monitor.sh: 純感知層改造(移除所有 docker restart 邏輯)

Layer 5 - Telegram 通知:
  - telegram_gateway.py: T1-T6 六個新通知方法 (Guardrail/Pre-flight/Backup/MultiSig/ChangeApplied)

參考: ADR-062 Data Safety Guardrails, ADR-063 Service Registry IaC

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 16:24:09 +08:00
OG T
9253281d46 feat(api): Sprint 4 Phase A+B — 告警處置統計資料層+寫入層
Phase A: 資料層
- A1: IncidentFrequencyStats 新增 4 欄位 (human_approved/manual_resolved/cold_start_trust/total_resolution)
- A2: AnomalyCounter.record_disposition() — Redis HINCRBY 原子遞增
- A3: get_disposition_stats() — HGETALL 回傳處置分佈
- AnomalyFrequency dataclass 擴充 + to_dict() 同步
- _record_anomaly_impl() 整合 disposition stats

Phase B: 寫入層觸發點接線
- B1: 自動修復成功 → record_disposition("auto_repair")
- B2: 冷啟動信任成功 → record_disposition("cold_start_trust")
  - AutoRepairDecision 新增 is_cold_start flag
  - execute_auto_repair() 接收並區分處置類型
- B3: 人工批准執行成功 → record_disposition("human_approved")
  - 新增 _get_anomaly_key_from_approval() helper
- B4: 手動處理推斷 → resolve_incident() 排除法判定
  - 若 resolved 且無 auto/human/cold_start 紀錄 → manual_resolved

安全設計: 所有 disposition 記錄走 try/except,失敗不阻塞主流程

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-07 11:54:46 +08:00
OG T
cd37befbe6 fix(models): 全面替換 datetime.UTC → timezone.utc 相容 Python 3.10
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 59s
terminal.py, incident.py, utils/timezone.py 同樣問題。
CI runner Python 3.10 無 UTC 常數,導致所有模型靜默 import 失敗。

# 2026-04-06 ogt: 完整修復,不再有漏網之魚

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 12:40:27 +08:00
OG T
59c3dfb910 fix(models): approval.py 改用 timezone.utc 相容 Python 3.10
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 12m12s
Type Sync Check / check-type-sync (push) Failing after 52s
CI runner 用 Python 3.10,datetime.UTC 是 3.11 才加入。
改用 datetime.timezone.utc 全版本相容,修復 CI type-sync 全量失敗。

# 2026-04-06 ogt: root cause — CI Python 3.10 無法 import UTC

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 12:19:23 +08:00
OG T
658337ec18 fix(phase26): 打通 Incident→DB→KM 完整鏈路 + namespace 修正
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m29s
Type Sync Check / check-type-sync (push) Failing after 52s
問題根因:
1. create_incident_for_approval 只存 Redis,不存 PostgreSQL
   → TTL 7天後消失,Playbook 萃取永遠找不到 Incident
2. ApprovalRecord 無 incident_id 欄位
   → _trigger_playbook_extraction 靠 regex 掃中文文字找 INC-,永遠失敗
3. operation_parser namespace fallback 是 "default"
   → 所有 deployment 在 awoooi-prod,203 次執行全失敗

修復:
- Incident 同時寫入 Redis + PostgreSQL (save_to_episodic_memory)
- ApprovalRecord 加入 incident_id 欄位 (model + ORM + migration)
- alertmanager_webhook 建立 Approval 後回寫 incident_id
- _trigger_playbook_extraction 直接用 approval.incident_id
- operation_parser DEFAULT_NAMESPACE = "awoooi-prod"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 11:46:05 +08:00
OG T
5499169996 feat(auto-repair): 打通自動修復閉環 (ADR-058)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 53s
問題: 告警鏈路從未呼叫 auto_repair_service,機制完全死路
修正:
1. webhooks.py: alertmanager_webhook 建立 Incident 後觸發 _try_auto_repair_background
2. playbook.py: is_high_quality 門檻降低 (冷啟動期)
   - success_count: 10 → 3
   - success_rate: 95% → 80%
3. tests: test_evaluate_not_high_quality 更新為新門檻

流程: Alertmanager → API → Incident → evaluate → P2以下+高品質Playbook → 自動執行 → Telegram通知

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:08:08 +08:00
OG T
bf4f81412c feat(api): ActionType.SSH_COMMAND + auto_repair_service SSH分支 (Task 12)
- playbook.py: 新增 SSH_COMMAND ActionType
- auto_repair_service._execute_step: SSH_COMMAND 分支,格式 layer/component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:47:00 +08:00
OG T
3455044457 feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout

P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
  - generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
  - generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
  - 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint

P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:35:05 +08:00
OG T
d8be78b135 feat(api): Knowledge Base Phase 1 後端四層架構
Some checks failed
CD Pipeline / build-and-deploy (push) Successful in 7m0s
E2E Health Check / e2e-health (push) Successful in 17s
Type Sync Check / check-type-sync (push) Failing after 30s
- models/knowledge.py: Pydantic Schema (EntryType/Source/Status/CRUD)
- db/models.py: KnowledgeEntryRecord ORM (PostgreSQL)
- repositories/interfaces.py: IKnowledgeRepository Protocol
- repositories/knowledge_repository.py: PostgreSQL CRUD 實作
- services/knowledge_service.py: 業務邏輯 (get_db_context 內部管理 session)
- api/v1/knowledge.py: REST Router (get_knowledge_service,無直接 DB 存取)
- main.py: 掛載 Knowledge Base Router
- k8s/jobs/migrate-knowledge-entries.yaml: DB Migration Job

API 端點: GET/POST / | GET/PATCH/DELETE /{id} | POST /{id}/approve
         GET /search | GET /categories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 00:55:56 +08:00
OG T
43a370fc11 fix(model): IncidentOutcome 舊 Redis 字串格式相容性
Some checks failed
CD Pipeline (Dev) / build-and-deploy-dev (push) Successful in 2m38s
CD Pipeline / build-and-deploy (push) Has been cancelled
E2E Health Check / e2e-health (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 22s
舊事件 outcome 存為字串 "resolved",Pydantic v2 無法解析
→ INTERNAL_ERROR on /auto-repair/evaluate/{incident_id}

field_validator mode='before' 將字串轉為 None (安全丟棄)
確保舊資料不引發 incident_parse_error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 18:03:21 +08:00
OG T
c9c60c3a61 feat(mcp-integrations): Phase S 架構修復 + MCP 整合基礎建設
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 22s
Phase S 技術債修復 (首席架構師審查 82→完整):
- S-01: generate_alert_fingerprint 移至 AlertAnalyzer.generate_fingerprint() staticmethod
- S-04: 移除 Pydantic v2 deprecated json_encoders (直接用原生 datetime 序列化)

Sentry MCP 整合 (Phase 23):
- ADR-048: Sentry→OpenClaw AI Triage 架構決策
- sentry_webhook_service.py: parse/analyze/create_incident/build_message Service 層
- config.py: SENTRY_WEBHOOK_SECRET (Fail-Closed HMAC-SHA256)

Playwright MCP 整合 (短期):
- smoke.spec.ts: 5 頁面 E2E smoke test (home/dashboard/incidents/approvals/terminal)
- cd.yaml: E2E Smoke Test 步驟 + Telegram 🎭 Smoke 狀態通知

長期規劃 ADR:
- ADR-049: Figma Code Connect 設計系統同步
- ADR-050: Telegram 互動式 Incident 2.0 (6鍵 Inline Keyboard)
- ADR-051: Context7 依賴升級顧問 (Next.js 14→15, FastAPI 0.115→0.128)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 16:20:57 +08:00
OG T
22de22c989 refactor(phase-s): Phase S 技術債清理 - 五項架構改善
S-01: generate_alert_fingerprint() 移至 alert_analyzer_service (Router→Service)
S-02: 移除廢棄 USE_NEW_ENGINE config (Phase R 已完成歷史使命)
S-03: github_webhook.py linter 清理 (Field unused + delivery_id noqa)
S-04: Pydantic v2 遷移 - approval/incident models (class Config → ConfigDict)
S-05: Skill 09 v1.1 更新 (USE_NEW_ENGINE 廢棄說明)

測試: 393 passed, 零失敗

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 13:12:02 +08:00
OG T
411880842f refactor(router): R4 #129 AlertAnalyzer 遷移至 services 層
ADR-024 Router 層瘦身 R4: 將業務邏輯從 Router 移出至正確層次。

變更:
- 新增 src/models/webhook.py: AlertPayload + AlertResponse 移至 models 層
- 新增 src/services/alert_analyzer_service.py: AlertAnalyzer (141行) 移至 services 層
  - RISK_MAPPING / ACTION_MAPPING / BLAST_RADIUS_MAPPING 對應表
  - analyze() 方法含 K8s 資源名稱正規化 (ADR-016)
- webhooks.py: 移除重複定義,改為 import,-243行

Router 層 webhooks.py 已符合 ADR-024 禁止清單規範:
AlertAnalyzer 不再存在於 Router 層。

R4 狀態: #127 #128 #129 #130 (全部完成)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 09:27:23 +08:00
OG T
f5b19cf108 feat(learning): 實作 Playbook 信心度調整機制 (ADR-030)
- 新增 _promote_playbook: 高評分提升信心度 +0.1
- 新增 _demote_playbook: 低評分降低信心度 -0.15
- 新增 find_by_source_incident: 按 incident_id 查詢 Playbook
- 新增 adjust_confidence: 信心度調整 + 狀態自動轉換
- 新增 Playbook.failure_rate 屬性

自動狀態轉換:
- ai_confidence >= 0.9 + DRAFT → 自動 APPROVED
- ai_confidence < 0.3 + failure_rate > 50% → 自動 DEPRECATED

測試: 13 案例全部通過

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 22:10:49 +08:00
OG T
d89f0520f9 fix(api): 修復 34 個 Ruff lint 錯誤
- 自動修復 import 排序、unused imports
- 手動修復 raise from、isinstance union、unused variable
- scripts/ 暫時保留 (非 CI 阻擋)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 15:27:49 +08:00
OG T
4f7282a97a fix(ai): Phase 20 P2 修復 - Protocol + 邊界測試 + model_registry
P2-1: 定義 INvidiaProvider Protocol (@runtime_checkable)
P2-2: 補充邊界測試 15 → 25 案例
P2-3: model_registry 新增 NVIDIA + tool_calling_fallback_order

首席架構師評分: 82 → 86 → 90/100

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 01:24:17 +08:00
OG T
6de1c0ff3b fix(ai): 修復 Pydantic validation error + tuple unpacking
1. kubectl_command 允許 None (LLM 可能返回 null)
2. 加入 field_validator 將 null 轉換為空字串
3. generate_incident_proposal 完整解包 6 值 (含 ai_tokens/ai_cost)

2026-03-29 ogt: Gemini API validation 修復

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 00:46:02 +08:00
OG T
b77e151387 feat(ai): ADR-036 NVIDIA Nemotron Tool Calling 整合
Phase 20 - 提升 Tool Calling 精準度 50% → 83.3%

新增:
- src/models/nvidia.py: Pydantic Schema
- src/services/nvidia_provider.py: NvidiaProvider 類別
- tests/test_nvidia_provider.py: 15 項單元測試 (全部通過)

修改:
- ai_router.py: AIProvider.NVIDIA + route_tool_calling()
- ai_rate_limiter.py: NVIDIA 限制 (5 RPM, 100/day)
- models.json: NVIDIA 配置
- cd.yaml: Secrets 注入 NVIDIA_API_KEY

路由策略:
- Tool Calling: Nemotron → Gemini → Claude
- 一般對話: Ollama → Gemini → Claude (不變)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 00:00:08 +08:00
OG T
d469a239af fix(ai): 移除 confidence 預設值,強制 LLM 真實計算
變更:
1. models/ai.py: confidence 改為 REQUIRED (移除 default=0.8)
2. openclaw.py: 如果 LLM 沒輸出 confidence,設為 0.5 + COLLAB

根本原因:
- 原本 Pydantic default=0.8 導致信心分數永遠是 80%
- 現在強制 LLM 必須計算真實信心分數

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 22:21:29 +08:00
OG T
59c9eff83a fix(api): 修復 10 個 Lint 錯誤 (imports 排序 + unused imports + set comprehension)
- F401: 移除未使用的 imports (TerminalSessionStatus, AutoApproveDecision, TerminalSession)
- I001: 修正 import blocks 排序
- C401: set(generator) → {set comprehension}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:51:52 +08:00
OG T
d206460751 feat(security): Phase 20 CSRF 防護實作
Phase 19 首席架構師審查指出: 核鑰 UX 安全性缺 CSRF 防護

後端:
- 新增 src/core/csrf.py (Double Submit Cookie 模式)
- 新增 src/api/v1/csrf.py (GET /api/v1/csrf/token)
- 新增 src/models/csrf.py (CSRFTokenResponse)
- 修改 approvals.py sign/reject/bulk 端點加入 CSRFToken 驗證

前端:
- 新增 hooks/useCSRF.ts (React Hook)
- 修改 approval.store.ts 整合 CSRF Token 參數

安全特性:
- 256-bit Token (secrets.token_hex)
- 時序安全比較 (secrets.compare_digest)
- SameSite=Strict Cookie
- 1 小時 Token 有效期

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 18:31:58 +08:00
OG T
e5ded3b3f2 feat(phase19): OmniTerminal + GenUI + Hybrid SSE 架構實作 (Wave 0-2)
Phase 19 OmniTerminal MVP 完成:
- Wave 0: Backend (Hybrid SSE POST→GET 架構)
- Wave 1: Frontend (OmniTerminal 狀態機 + GenUI Registry)
- Wave 2: UI 組件 (8 個 GenUI 動態卡片)

ADR 文檔:
- ADR-031: OmniTerminal SSE 架構
- ADR-032: GenUI 動態渲染框架
- ADR-033: K3s HA 架構設計

GenUI 組件:
- GenUIRenderer, K8sPodStatusCard, SentryErrorCard
- MetricsSummaryCard, IncidentTimelineCard
- TraceWaterfallCard, ApprovalCard, NuclearKeyButton

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-28 00:17:26 +08:00
OG T
30153496d1 fix(api): 修復全部 lint 錯誤 (ruff --fix)
- Import sorting (I001)
- Unused imports (F401)
- f-string without placeholders (F541)
- Loop variable unused (B007)
- zip() strict parameter (B905)
- Exception chaining (B904)
- collections.abc imports (UP035)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 16:06:20 +08:00
OG T
698687f092 feat(api): #7 Playbook 萃取功能 (Phase 7.1-7.4)
實作內容:
- models/playbook.py: Playbook 資料模型 + Request/Response
- repositories/playbook_repository.py: Redis 雙層儲存
- repositories/interfaces.py: IPlaybookRepository Protocol
- services/playbook_service.py: 業務邏輯 (萃取/推薦/核准)
- api/v1/playbooks.py: REST API 端點

API 端點:
- POST /playbooks/extract/{incident_id} - 從成功案例萃取
- POST /playbooks/recommend - 症狀匹配推薦
- POST /playbooks/{id}/approve - 人工核准
- GET/PATCH/DELETE /playbooks/{id} - CRUD

遵循 leWOOOgo 積木化: Router → Service → Repository

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 10:54:13 +08:00
OG T
e0584bc181 refactor(api): Phase 16 R2 封存死代碼 + RiskLevel 統一
封存 (866 行):
- routes/approvals.py → _archived/routes/ (477 行,未註冊死代碼)
- services/approval.py → _archived/services/ (389 行,僅被死代碼使用)

合併 RiskLevel:
- models/approval.py 新增 HIGH (從 trust_engine.py 合併)
- trust_engine.py 改 import from models/approval.py
- 保留舊定義為註解供回滾

更新 services/__init__.py:
- 移除已封存模組的 import (註解保留回滾路徑)

驗證:
- RiskLevel 統一: models 與 trust_engine 使用同一 class
- 24 個 action_parsing 測試通過

回滾指令見 _archived/README.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 23:14:24 +08:00
OG T
b13b063282 feat(web): Phase 11 對話式 AI UI/UX (#47-59)
Phase 11.1 對話式容器:
- ConversationalView 雙欄佈局 (左側列表 + 右側詳情)
- ApprovalThreadItem 風險等級 + 相對時間顯示
- SSE 即時更新整合

Phase 11.2 批次處理:
- BatchModeSelector 組件 (全部接受/逐一審核/CRITICAL Only)
- POST /api/v1/approvals/bulk-approve API 端點
- CRITICAL + DESTRUCTIVE 安全過濾 (禁止批次核准)

Phase 11.4 鍵盤快捷鍵:
- useKeyboardShortcuts hook (Y/N/方向鍵/Esc)
- Y 鍵長按 2 秒核准 + 頂部進度指示器
- 快捷鍵說明 Modal (Y/N 高亮顯示)

i18n: 100% next-intl 覆蓋

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 10:31:35 +08:00
OG T
8159d22db9 refactor: ClawBot → OpenClaw 全域更名
- 刪除舊版 clawbot.py (已有新版 openclaw.py)
- 更新 models/ai.py 類型定義 (ClawBotAnalysisRequest/Response)
- 更新 api/v1/ai.py import 與註解
- 更新 Discord username
- 更新所有註解與文檔

依據: feedback_openclaw_naming.md (統帥 2026-03-20 正式命名決議)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-24 12:57:36 +08:00
OG T
6f049877fc fix(lint): ruff auto-fix + lewooogo-core src 加入 git
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:51:37 +08:00
OG T
65fa1168b8 feat(api): ApprovalRequestResponse 新增 metadata 欄位
讓前端/API 可見 incident_id,用於除錯和關聯追蹤

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 21:51:05 +08:00
OG T
7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00
OG T
196d269b92 feat: add all application source code
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 18:57:44 +08:00