Commit Graph

95 Commits

Author SHA1 Message Date
OG T
aae7c12645 feat(adr-076): Task 3.3 — SSH 修復 KM 萃取(補齊飛輪雙手)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
動機: SSH MCP 修復(docker restart/systemctl)成功後,KM 無法學習
因為 _extract_repair_steps 只處理 kubectl,SSH 路徑完全漏失。

approval_execution.py:
  - _trigger_playbook_extraction: 成功執行後將 approval.action 寫入
    incident.outcome.learning_notes,供 Playbook 萃取器讀取

playbook_service.py:
  - _parse_ssh_command(): 新增模組函式,解析 ssh [user@]host 'cmd' 格式
  - _extract_repair_steps(): 步驟 2 擴充 SSH 路徑分支
      ssh ... → ActionType.SSH_COMMAND + host 記錄
      kubectl ... → ActionType.KUBECTL(保留原有邏輯)
  - _generate_name(): SSH 修復自動加 [SSH] 前綴
  - _extract_tags(): SSH 修復自動加 ssh + host_layer 標籤

test_playbook_ssh_extraction.py: 18 tests(100% 通過)

飛輪雙手對齊:
  kubectl 路徑: decision_chain.reasoning_steps → KM  (既有)
  SSH 路徑: approval.action → learning_notes → KM  (Task 3.3 新增)

測試: 794 passed, 26 skipped, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 15:19:54 +08:00
OG T
cc42aa0bdb feat(adr-076): Task 2.2 + 2.3 — 規則擴充 + kubectl 注入防護
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Task 2.2: alert_rules.yaml 新增 3 類規則 (priority 125-127)
  - gitea_down: Gitea CI/CD 下線 → NO_ACTION (priority 125, critical)
  - ssl_cert_expiring: SSL 憑證到期 → NO_ACTION (priority 126, medium)
  - external_site_down: MoWoooWork/Dev/Blackbox probe → NO_ACTION (priority 127, medium)
  規則總數: 21 → 24

Task 2.3: alert_rule_engine.py kubectl 注入防護
  - _RULE_ENGINE_DESTRUCTIVE_RE: 阻擋 delete pvc/namespace/statefulset/deployment,
    drain/cordon, --replicas=0, rm -rf, DROP TABLE, $() 反引號
  - validate_kubectl_command(): 公開 API,SSH 指令/空字串直接通過
  - match_rule() 整合: 變數替換後驗證,阻擋時清空 + log warning
  - test_alert_rule_engine_validation.py: 34 tests (100% 通過)

測試: 776 passed, 26 skipped, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 15:10:10 +08:00
OG T
684d6cfb43 feat(adr-076): 戰術 B 四大 Task 全部完成 — 告警聚合+重試+自動報告
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 17m34s
Task 2: AlertGroupingService — Redis 5分鐘滑動視窗,防告警風暴
- apps/api/src/services/alert_grouping_service.py (新增)
- webhooks.py 整合:指紋生成後/LLM前短路子告警
- Threshold=3,Graceful Degradation,16 tests

Task 3: approval_execution.py 執行失敗重試
- MAX_RETRY=2, RETRY_DELAY_SECONDS=30
- _is_transient_error() 瞬態/永久分類,永久錯誤不重試
- Timeline 記錄重試進度,成功後標注重試次數,29 tests

Task 4: report_generation_service.py 自動報告
- 日度巡檢報告:每日 08:00 台北時間,Telegram SRE 群組推送
- Postmortem:Incident resolved + duration > 10 分鐘自動觸發
- main.py lifespan 掛載 run_daily_report_loop(),30 tests

測試: 600 → 675 通過 (+75),0 failed

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-14 14:39:14 +08:00
OG T
1a4b52ed28 fix(alert): fingerprint 加 alertname 防跨告警指紋衝突 + 補入缺漏心跳分類
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
問題根因:
1. generate_fingerprint 用 alert_type(大量 alertname 落入 "custom")
   → 不同告警名稱同目標共用指紋 → 30 分鐘 debounce 互相擋截
2. classify_alert_early 漏掉 DeadMansSwitch / NoAlertsReceived /
   PrometheusNotConnectedToAlertmanager → 落入 TYPE-3 一般告警

修復:
- alert_analyzer_service.py: 指紋改為 namespace:deployment:alertname:target_resource
  alertname 取自 labels(Alertmanager),fallback 到 alert_type(其他來源)
- incident_service.py: DeadMansSwitch → backup/TYPE-1;
  NoAlertsReceived + PrometheusNotConnectedToAlertmanager → alertchain_health/TYPE-8M
- 補 2 個測試,全套 627 passed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:50:20 +08:00
OG T
db4d4280f5 test(ai-router): 更新 DIAGNOSE routing 測試反映暫停 NEMOTRON 現況
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m28s
NEMOTRON 因 confidence=0.0 問題暫停,改走複雜度路由(None)
待 _parse_confidence() 修復後恢復

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:22:52 +08:00
OG T
b3d4b9c8a9 test(telegram): 修正 test_telegram_message_templates 斷言
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 14m24s
CRITICAL → 嚴重 (ADR-075 中文風險等級)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:20:16 +08:00
OG T
01e6d75ee7 test(telegram): 修正測試斷言符合 ADR-075 中文風險等級
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m55s
HIGH→高風險, MEDIUM→中風險 (test_sentry / test_github webhook)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:08:48 +08:00
OG T
1cb654cf59 fix(adr-075): CR P0/P1 修補 — TYPE_8M enum + 死碼清理 + docstring 更新
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
P0-2: NotificationType 新增 TYPE_8M = "TYPE-8M"
      classify_notification 早期回傳 TYPE-8M
      decision_manager 改用 NotificationType.TYPE_8M enum 比較(移除字串字面量)
P1-1: 移除 _CATEGORY_BUTTONS 中不可達的 alertchain_health/flywheel_health 條目
P1-4: test_classify_alert_early.py docstring 更新為 13 條規則/10 分類

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:44:12 +08:00
OG T
2cef2098d3 feat(adr-075): 修復 Telegram 動態按鈕 4 個斷點 + 新增 7 種告警分類
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
斷點 A: decision_manager 提取 alert_category/notification_type 傳入 send_approval_card
斷點 B: send_approval_card 新增參數並傳遞至 _build_inline_keyboard
斷點 C: 互動型通知 (TYPE-3/4/4D/8M) 禁止發 SRE 群組,防 nonce 洩漏
斷點 D: _CATEGORY_BUTTONS k8s_workload → kubernetes + 新增 6 類按鈕組

classify_alert_early 新增: alertchain_health, flywheel_health, storage,
devops_tool, external_site, ssl_cert, host_resource (從 infrastructure 分離)
Test: 52 classify + 664 total passed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:35:56 +08:00
OG T
1074936e54 fix(classify): backup/heartbeat severity=warning/critical 告警恢復告警卡片格式
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m38s
根因:classify_alert_early() backup 規則無 severity 條件,導致
VeleroBackupFailed / HostBackupFailed (warning/critical) 被分為 TYPE-1
(純資訊無按鈕),告警卡片格式遺失。

修復:
- backup/heartbeat 關鍵字只在 severity=info/none 才命中 TYPE-1
- severity=warning/critical 的 backup 告警走正確 prefix 規則
  (Velero→kubernetes TYPE-3, HostBackup→infrastructure TYPE-3)
- Watchdog (severity=none) 由 severity 規則先命中,維持 TYPE-1
- 補強測試:25 cases,含 VeleroBackupFailed critical → kubernetes TYPE-3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:24:00 +08:00
OG T
0d239838b4 fix(cr): Code Review P2 — 測試覆蓋 + CronJob 腳本重構
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
P2-1: CronJob inline Python 抽成 scripts/cron_km_vectorize.py
      Dockerfile 加入 COPY scripts/,CronJob YAML 改用腳本路徑
P2-2: 新增 test_classify_alert_early.py — 23 tests 覆蓋 7 條分類規則
      含邊界情況:VeleroBackupFailed(backup優先於k8s)、優先順序驗證

595 unit tests passed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:14:44 +08:00
OG T
a67a27f780 fix(test): test_model_regression 加 @pytest.mark.integration(需 Ollama 服務)
與 global_repair_cooldown / anomaly_counter 一致,
Ollama 測試預設排除,需真實服務時用 pytest -m integration 執行

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:32:42 +08:00
OG T
8be87b0f32 fix(review): 首席架構師 Code Review — c439277 Tier 3 紅區修補
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 8m39s
Critical:
- C1: decision_manager _collect_mcp_context container 變數 Python ternary 優先度 bug 修正
  原: `A or B or C[0] if list else ""` (ternary 控制全式)
  修: `A or B or (C[0] if list else "")` (明確括號)
- C2: 所有 MCP 呼叫加 asyncio.wait_for timeout=5s,防止阻塞決策主路徑
  同時加 unknown host warning log (C4)
- C3+M1: _DESTRUCTIVE_PATTERNS 補全移至模組頂層常量
  新增: delete pods(複數)/kubectl drain/kubectl cordon/kubectl rollout undo/
        docker rm/docker stop/docker kill/rm -rf/"replicas": 0(JSON patch)

Important:
- I1: webhooks.py IP 排除改用 is_internal_ip() 支援全 RFC-1918 (10.x/172.16-31.x/192.168.x)
- I4: 新增 test_destructive_patterns.py — 25 測試全過
  涵蓋: 常量存在、攔截、誤攔迴歸、critical 永遠攔截

🔴 Tier 3 紅區 — 首席架構師 Code Review 通過後 push

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 22:05:52 +08:00
OG T
d77b2add73 fix(review): 首席架構師 Code Review 修補 — I1 get_incident_type 邏輯修正 + 測試補全
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 8m13s
Code Review 發現 2 個 Critical + 2 個 Important 問題:

Critical:
- rule.id 語意為「規則識別符」,與 incident_type 命名空間不同,不可混用
  移除 rule_id fallback 路徑,YAML 匹配無 incident_type 時 fall through 靜態 dict
- get_incident_type() 關鍵路徑無測試覆蓋
  新增 test_get_incident_type.py:11 測試、4 類別(靜態/YAML優先/YAML錯誤/custom)全過

Important:
- ALERTNAME_TO_TYPE deferred import 移至模組頂層(無 circular 風險)
- alert_types.py TODO 過期 → 更新為 I1 整合後正確說明

技術債記錄:NetworkPolicy ArgoCD egress ClusterIP 10.43.16.201/32 需 ArgoCD 重裝後更新

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 21:33:19 +08:00
OG T
485b8cb003 fix(ci): B5 整合測試加 ssl=disable — asyncpg 預設嘗試 SSL 被 container 拒絕
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m55s
錯誤: ConnectionRefusedError Connect call failed ('127.0.0.1', 15432)
根因: asyncpg 走 _create_ssl_connection,臨時 postgres container 無 SSL
修正: TEST_DATABASE_URL + conftest 預設 URL 均加 ?ssl=disable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:40:40 +08:00
OG T
49bfbd573c feat(test): B5 整合測試框架 — 真實 DB, 5/5 通過
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 2m34s
新增:
- docker-compose.test.yml: CI 用臨時 pgvector PostgreSQL (port 15432)
- tests/factories.py: Incident/Approval/Knowledge/RAG 測試資料工廠
- tests/integration/test_b5_core_flows.py: 5 個 E2E 整合測試 (5/5 PASSED 1.03s)
- tests/integration/setup_test_schema.sql: CI schema 初始化 SQL
- cd.yaml: 新增 Integration Tests B5 step
- scripts/sync_dev_db.py: dev DB 同步工具

修正:
- .env.test: DATABASE_URL 指向 awoooi_dev (本機設定, gitignore 不入庫)

禁止 Mock 鐵律: 所有 DB 測試使用真實 PostgreSQL, 無 SQLite/MagicMock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:22:57 +08:00
OG T
e672635edf fix(test): 更新 TestHistoryMessageFormat 適配 Phase 27 雙層策略
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 01:12:00 +08:00
OG T
2bc2a2f174 test(integration): drift API + DB 持久化整合測試
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
覆蓋 GET /drift/reports、POST /drift/internal/scan
驗證掃描後 DB 有新資料(B5 整合測試框架擴充)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 23:36:17 +08:00
OG T
1e1f24c561 fix(test): ComplexityScorer 模型名稱更新 llama3.2:3b → gemma3:4b
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-09 09:01:59 +08:00
OG T
65e1edb0ad feat(web): OpenClaw 風格龍蝦 SVG + 三色狀態燈號 + 測試修正
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m39s
前端:
- OpenClawLobster 全新 SVG (參考 dashboardicons.com/icons/openclaw)
  圓潤身體 + 大眼睛 + 鉗子 + 觸角 + 微笑 + 小腳
- 三色版本: red(異常/預設) / green(健康) / yellow(警告)
- LobsterLoading 改用新 SVG

測試修正:
- test_nemotron_failure_still_returns_proposal: func_body 截取 5000→10000
  原因: 函數超過 5000 字元,導致 rfind 找不到最後的 return

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:55:21 +08:00
OG T
b380b6a34c fix(ci): 修正 nemotron 測試函數體截斷 5000→10000 字元
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 21:09:19 +08:00
OG T
170ce2f11d fix(ci): 修正測試與 Sprint 5.2 部署腳本
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 1m38s
tests/test_auto_repair_service.py:
  - 更新 3個測試符合 2026-04-07 統帥指令移除門檻
  - APPROVED Playbook 直接通過 (低相似度/低品質/高風險均通過)

tests/test_phase22_nemotron_collab.py:
  - 更新 log key: nemotron_collaboration_failed → exhausted

ops/monitoring/docker-compose.exporters.yaml:
  - 修正 postgres DSN: awoooi:awoooi_prod_2026@localhost:5432/awoooi_prod

Sprint 5.2 新增腳本:
  - scripts/sprint51_e2e_validation.py: L7 E2E 驗收腳本 (T1-T5)
  - scripts/ops/deploy-docker-health-monitor.sh: Plan A 一鍵部署腳本

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:17:48 +08:00
OG T
b20a619a3d fix(ci): CD 修復 — shared-types 型別同步 + 測試冷啟動衝突
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 1m2s
1. pnpm shared-types generate — 同步 Sprint 4 新增的 Pydantic model
2. test_evaluate_not_high_quality 修復 — 加 MEDIUM risk step 避免
   意外走冷啟動路徑 (Redis 未初始化 → COLD_START_DAILY_LIMIT)

11/11 auto_repair 測試通過

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:09:17 +08:00
OG T
2fe8062fb8 refactor(api): Re-Review S1/S2/S3 改善 — 消除重複+防禦性驗證+測試隔離
S1: 抽取 _execute_and_observe() 公用方法
  - 消除 repair_by_uri 中 3 處重複的 execute+audit+langfuse 邏輯
  - 統一 AuditLog + Langfuse trace 寫入路徑

S2: SSH username 防禦性驗證
  - 新增 validate_ssh_user() + _SSH_USER_RE 正則
  - 在 _ssh_execute() 入口驗證 user 參數
  - 防止 user@host 拼接產生非預期行為
  - 新增 8 個 username 驗證測試

S3: Singleton 測試重置
  - 新增 _reset_for_test() classmethod
  - 避免跨測試狀態污染
  - 新增 2 個 singleton reset 測試

測試: 55/55 全數通過 (原 45 + 新 10)
首席架構師 Re-Review: 91/100  通過,3 個 Suggestion 全數實裝

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-07 11:17:40 +08:00
OG T
f8d4772abf fix(api): Sprint 3 P0-1/P0-2/P0-3/P0-4 Critical Security Fixes
P0-1: Complete shell metacharacter regex detection
  - Enhanced _SHELL_METACHAR_RE to detect: >, <, \n, ${}, $()
  - Prevents all shell injection vectors (redirects, variable expansion, newlines)
  - Added 5 new validation tests

P0-2: Add shlex.quote() protection for ansible playbook path
  - Wraps playbook_path in shlex.quote() before SSH command construction
  - Prevents shell injection if path contains special characters
  - Applied in _execute_ansible() method

P0-3: Add SSH target host whitelist validation
  - Introduces validate_ssh_target_host() function
  - Only allows SSH to: 192.168.0.110, 192.168.0.188
  - Prevents unauthorized SSH target exploitation
  - Added 5 new whitelist validation tests

P0-4: Convert HostRepairAgent to singleton pattern
  - Implements __new__() singleton with shared _in_process_locks dict
  - Ensures in-process locks persist across multiple auto_repair_service calls
  - Previously created new instance per call, making locks ineffective
  - Added singleton persistence test

Test Results: 45/45 passing (34 existing + 11 new P0 tests)
All security validations verified via comprehensive unit test coverage.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-07 11:09:45 +08:00
OG T
a4e11bfa92 feat(api): AuditLog + Langfuse Trace for SSH_COMMAND (Sprint 3 T5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:38:59 +08:00
OG T
4561f141bb feat(api): Redis 冪等鎖防止重複修復 (Sprint 3 T4)
雙層鎖設計: in-process asyncio.Lock (必定生效) + Redis 分散式鎖 (跨 Pod best-effort)
同一 URI 的第二次修復呼叫立即返回 "already running" 錯誤

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:26:53 +08:00
OG T
1a654aa37d feat(api): HostRepairAgent 三條執行路徑 + known_hosts + Ansible 白名單 (Sprint 3 T3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:22:54 +08:00
OG T
5e8b2a6894 feat(api): URI scheme 解析器 + Shell Injection 防護 (Sprint 3 T1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 14:18:21 +08:00
OG T
8d496e84e2 fix(test): 更新 action_parsing 測試 — 無 -n 參數預設 namespace 改為 awoooi-prod
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
action_planner.py default_namespace 已是 awoooi-prod,測試預期值同步更新。
明確指定 -n default 的 kubectl 命令保持不變。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 11:49:24 +08:00
OG T
5499169996 feat(auto-repair): 打通自動修復閉環 (ADR-058)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 53s
問題: 告警鏈路從未呼叫 auto_repair_service,機制完全死路
修正:
1. webhooks.py: alertmanager_webhook 建立 Incident 後觸發 _try_auto_repair_background
2. playbook.py: is_high_quality 門檻降低 (冷啟動期)
   - success_count: 10 → 3
   - success_rate: 95% → 80%
3. tests: test_evaluate_not_high_quality 更新為新門檻

流程: Alertmanager → API → Incident → evaluate → P2以下+高品質Playbook → 自動執行 → Telegram通知

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:08:08 +08:00
OG T
9629367bc2 fix(webhook): Gitea 簽章格式修正 — 純 hex,無 sha256= 前綴
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 13m12s
Gitea X-Gitea-Signature 送出純 hex(與 GitHub X-Hub-Signature-256 不同)
- router: 兩種格式皆接受(向後相容)
- tests: generate_signature 改為純 hex(符合 Gitea 實際行為)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 15:40:40 +08:00
OG T
d9af8e1c7a docs(logbook): ADR-059 Gitea Webhook 遷移完成記錄
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:45:02 +08:00
OG T
23364423fa feat(webhook): ADR-059 GitHub → Gitea Webhook 遷移完成
- gitea_webhook.py: Header 全部改 X-Gitea-*,移除 workflow_run handler
- gitea_webhook_service.py: _fetch_pr_diff 改直接 httpx,不依賴 github_api_service
- 清除兩個檔案的所有殘留 github_ log key,review_id prefix 改 gitea-
- test_gitea_webhook.py: 10/10 通過,docstring 修正
- 03-secrets.yaml: 新增 GITEA_WEBHOOK_SECRET 佔位
- cd.yaml: 新增 GITEA_WEBHOOK_SECRET 注入步驟
- ADR-059: 建立架構決策文件

待統帥操作: Gitea Actions secret + Gitea UI Webhook 設定

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:44:32 +08:00
OG T
b5905ae283 fix(test): 根治 test_github_webhook.py segfault — 改用最小化 app
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
根本原因:
  from src.main import app
  → import 整個 FastAPI 應用所有路由
  → src.api.v1.knowledge → knowledge_service → knowledge_repository
  → sqlalchemy.ext.asyncio (C extension) → asyncpg.protocol.protocol
  → CI runner (catthehacker/ubuntu:act-22.04) segfault (exit 139)

修復:
  改用只掛載 github_webhook router 的最小化 FastAPI app
  github_webhook 的 import chain: config → redis_client → structlog
  完全不走 DB / sqlalchemy / asyncpg,無 C extension segfault 風險

結果:
  - test_github_webhook.py 恢復進入 CI 測試
  - 移除 cd.yaml 中 --ignore=tests/test_github_webhook.py
  - HMAC 簽章、whitelist、事件類型等 8 個測試全部覆蓋

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:36:24 +08:00
OG T
4b24ecd67f fix(sprint3): 首席架構師 Review C1/C2/C3/M3/m1 修正
C1: _ssh_execute 直接接收 key_path 參數,不反查 LAYER_SSH_CONFIG
C2: PlaybookService.create() proxy,Router 不再穿透呼叫 _repository
C3: CD Step 1b sed 替換 IMAGE_TAG_PLACEHOLDER,消除失敗中斷風險
M3: repair-bot 110/188 regex 統一 [a-z0-9][a-z0-9-]{0,30},禁止底線
m1: defaultMode 0400 加八進位說明注釋
m2: _ssh_execute 用 deadline 計算剩餘 timeout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:07:59 +08:00
OG T
e7d8da85f6 feat(api): HostRepairAgent — SSH 主機層修復 (Task 11)
- host_repair_agent.py: layer路由、command injection防護、asyncio SSH執行
- 測試: 12 cases 全通過 (routing/sanitize/success/fail/timeout/denied)
- SSH key: /etc/repair-ssh/id_ed25519 (K8s secret mount)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:22:00 +08:00
OG T
5ad403b287 fix(p0): v4.3 — 實測確認 Ollama CPU-only 不可用,DIAGNOSE 統一走 NIM
實測依據 (2026-04-05):
- Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true},生產不可用
- Nemotron NIM: 2.2s~27.3s,avg 10.6s,一直是主力(Phase 22 起)
- NIM 從未有隱私問題,Incident 資料一直送雲端 GPU

變更:
- ai_router.py: _local_fallback_chain 廢棄(空 list)
- ai_router.py: DIAGNOSE route/route_sync 改回 _full_fallback_chain
- config.py: 更新 timeout 說明反映實測結果
- test_p0_diagnose_routing.py: 更新 docstring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 01:49:06 +08:00
OG T
4bc4757fdc test(phase25): Phase 25 P1/P2 source code inspection tests (36 tests)
- test_phase25_auto_harvesting.py: 18 tests for NemotronRunbookGenerator,
  AntiPattern gate, fire-and-forget pattern, symptoms_hash
- test_phase25_drift_detection.py: 18 tests for DriftDetector, NemotronDriftInterpreter
  (read-only), DriftRemediator, local fallback chain for DIAGNOSE

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 00:14:50 +08:00
OG T
688146ef9c test(ai-router): test_fallback_list >= 2 改 >= 1
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
DIAGNOSE local chain 選 Nemotron 後 fallback 只剩 Ollama 一個
>= 2 斷言過嚴,與 test_query_routes_to_ollama 同樣修正

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:05:25 +08:00
OG T
428ed5f8cd test(ai-router): 修正 test_query_routes_to_ollama 斷言
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 41s
Phase 25 P0 後 DIAGNOSE 走 _local_fallback_chain [NEMOTRON, OLLAMA]
選 NEMOTRON 為 primary,fallback 只剩 OLLAMA 一個,
>= 2 斷言過嚴,改為 >= 1。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:02:43 +08:00
OG T
8056be5847 feat(ai-router): DIAGNOSE intent override 升級至 Nemotron (P0) 2026-04-04 17:41:45 +08:00
OG T
671974dedb test(ai-router): TestLocalFallbackChain — require_local 隱私邊界驗證 (P0)
Some checks failed
CD Pipeline / build-and-deploy (push) Failing after 43s
新增兩個測試:cloud provider 被跳過 + 全失敗回傳 local_providers_unavailable。
實作邏輯已存在於 AIRouterExecutor.execute()(2026-04-04 ogt Phase 25 P0)。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 17:32:32 +08:00
OG T
ffd679f5d3 feat(nemotron): per-task timeout,DIAGNOSE 使用獨立 timeout 設定 (P0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:58:23 +08:00
OG T
b6e12f74f4 test(phase22): Phase 22.4 Nemotron 協作測試 18/18 PASSED
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m12s
- 修正 file path: apps/api/src/ → src/ (從 apps/api/ 目錄執行)
- 擴大 snippet size: 800→1500 chars (docstring 過長導致 flag check 超出範圍)
- 擴大 _call_nemotron_tools snippet: 2000→5000 chars (timeout 在函數後段)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 12:16:28 +08:00
OG T
f6567751a9 test(knowledge): pgvector 語意搜尋整合測試 (5 tests)
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
- test_save_embedding: CAST AS vector 語法驗證
- test_semantic_search_returns_results: cosine similarity 查詢
- test_semantic_search_threshold_filters: 正交向量被 threshold 過濾
- test_semantic_search_archived_excluded: archived 不出現
- test_list_unembedded_entries: 未 embed 條目列舉

全部 5/5 PASSED (awoooi_dev PostgreSQL + pgvector)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 11:55:09 +08:00
OG T
5e836bde24 test(integration): 新增真實 DB 整合測試 — knowledge_repository + API E2E (2026-04-04 ogt)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m18s
- tests/integration/conftest.py: 連接 awoooi_dev PostgreSQL,每個測試後 rollback
- tests/integration/test_knowledge_repository.py: 23 個真實 DB 測試
  - create/get_by_id/list/update/delete(軟刪除)/search/categories/view_count
- tests/integration/test_incident_api.py: 7 個 HTTPS 端點測試
  - health check + knowledge API smoke test
- 遵循禁止 Mock 鐵律 (feedback_no_mock_testing.md)
- 本地驗證: 30/30 PASSED

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 02:35:38 +08:00
OG T
2e9845074e fix(test): nvidia → openclaw_nemo 對齊 RATE_LIMITS/COST_LIMITS key (I3)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 6m57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 14:00:21 +08:00
OG T
6266a4fc01 fix(test): 更新 AIProviderEnum 測試 — NVIDIA → NEMOTRON (Phase 24 B3)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- test_nvidia_provider_in_router: 改為驗證 NEMOTRON enum
- test_tool_calling_route: 改為期望 NEMOTRON provider
- test_existing_routing_not_affected: 排除 NEMOTRON (非一般路由)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:39:46 +08:00
OG T
5a7919f55c fix(test): AIProvider → AIProviderEnum (Phase 24 C1 rename fix)
All checks were successful
CD Pipeline / build-and-deploy (push) Successful in 7m11s
E2E Health Check / e2e-health (push) Successful in 16s
C1 修復 (3ad7b60) 重命名 AIProvider Enum 為 AIProviderEnum
test_nvidia_provider.py 未同步更新,導致 CD 測試失敗。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:38:04 +08:00