Your Name
9908fdf50d
feat(p3.1-t2-patha): DiagnosisAggregator 路徑 A + Solver F4 critical reject + 對齊測試
...
CD Pipeline / build-and-deploy (push) Failing after 1m59s
Wave 8 P3.1-T2 PathA 啟用 + Solver F4 安全強化 + test 對齊:
PathA — DiagnosisAggregator 信號分類層補 PDI:
- ENABLE_DIAGNOSIS_AGGREGATOR default=False → True
· PathA 純信號分類層(OOMKilled/CrashLoop 等業務邏輯)
· 不重複呼叫 K8s/SignOz API(只取 PDI 已收集的 raw 資料)
· 安全 default on — 純邏輯處理,無外部依賴重疊
- diagnosis_aggregator.py +155 行(PathA 實作)
- pre_decision_investigator.py 已接 (commit 3a2cd151 )
F4 — Solver critical risk reject:
- solver_agent.py: _validate_recommended_action 拒絕 risk=critical
· 鐵律:critical 動作必須走人工審批,不可變 Telegram 按鈕
· log warning + return None(被 _extract 過濾掉)
- _extract_recommended_actions 改返回 (list, status_str) tuple
· status="ok"/"empty"/"all_invalid" 供呼叫端決策
- protocol.py +16 / metrics.py +9 / ai_router.py +18 — 配套 metric + protocol field
測試對齊:
- test_solver_recommended_actions.py 拆 test_all_valid → low/medium/high accepted +
test_critical_rejected
- result tuple unpack: result, _ = _extract_recommended_actions(...)
- test_diagnosis_aggregator_stub.py: feature flag default 改 True 對齊 PathA
Tests: 51 passed (solver 28 + aggregator 16 + router fallback 8)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
Co-Authored-By: Multiple Engineers (Wave 8 P3.1-T2 PathA + F4) <noreply@anthropic.com >
2026-04-27 14:42:29 +08:00
Your Name
3a2cd15144
feat(p3.1-t2): Tier-2 三服務感知強化 — Sentry 簽章 + DiagnosisAggregator + Solver actions test
...
CD Pipeline / build-and-deploy (push) Has been cancelled
Wave 8 P3.1-T2 三項感知強化(多 engineer 補完):
Sentry Webhook 簽章驗證:
- sentry_webhook.py: 接入 SentryWebhookService.verify_sentry_signature()
- 拒絕無效 sentry-hook-signature → 401 → 防偽造攻擊
DiagnosisAggregator Pod 深診斷整合:
- pre_decision_investigator.py: 新增 _collect_diagnosis_aggregator()
- ENABLE_DIAGNOSIS_AGGREGATOR feature flag 守衛(default=False)
- evidence_snapshot.py: extra_diagnosis 欄位 + build_summary 顯示
- timeout=3.0s + try/except 隔離(fail-soft)
- Conservative 策略:待重疊分析確認 vs PreDecisionInvestigator 不重複
config.py:
- 新增 ENABLE_DIAGNOSIS_AGGREGATOR Field(default=False,K8s ConfigMap 動態啟用)
Solver B1 補測(commit 7c726ebc 對應):
- test_solver_recommended_actions.py — 20 tests + 3 skipped
- 驗證結構化 recommended_actions(北極星 §1.1 修復多樣性 ≥ 40%)
- LLM 失敗 graceful degraded(candidates=[], degraded=True)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
Co-Authored-By: Multiple Engineers (Wave 8 P3.1-T2) <noreply@anthropic.com >
2026-04-27 08:24:15 +08:00
Your Name
cc547736ab
feat(wave6-8): P2.1 fusion + P2.2 governance + P2.4 consensus + Wave 7/8 BLOCKER 修復
...
承接 Wave 6/7/8 多 engineer 在 agent 限額前完成的代碼,補 commit 解 production
HEAD 隱性 import error(decision_fusion 已被 decision_manager 引用但檔案 untracked)。
新增(後端核心):
- decision_fusion.py (562 行) — P2.1 方法 III(OpenClaw + Hermes + Elephant 三 LLM 融合)
- aiops_timeline.py + aiops_timeline_service.py — critic B4 修復
/api/v1/aiops/timeline endpoint,DB 存取抽到 service 層遵守 leWOOOgo 積木化
- migrations/p2_decision_fusion_columns.sql + rollback — approval_records fusion 欄位
修改(後端整合):
- decision_manager.py — fusion 三斷鏈修補(critic B1+B2+B3):
· B1: 寫 _evidence_snapshot_ref 到 token.proposal_data
· B2: fusion 前計算 complexity_score 並寫 token
· B3: fusion composite 寫 token.proposal_data["decision_fusion"]
- auto_approve.py — fusion + consensus 認識(critic B3+B5):
· composite > 0.7 → auto_execute_eligible bypass min_confidence
· source=consensus_engine + score>=0.6 → 規則可信路徑
- consensus_engine.py — db-fix _save_consensus 重用 agent_sessions
- governance_agent.py — db-fix _alert PG 寫入 ai_governance_events
- approval_db.py — fusion 3 欄位 + 2 partial index + CheckConstraint
- db/models.py — schema 對齊 migration
- core/config.py — vuln #1 修復:OLLAMA_URL/_FALLBACK_URL field_validator
拒絕公網 IP + 外部域名,僅允許私網/loopback/K8s SVC 白名單
- core/feature_flags.py — P2 fusion + consensus flags
- main.py — governance_agent lifespan 啟動
- failover_alerter.py — Wave8-X2: in-memory dedup fallback(Redis 拒絕後不 fail-open)
- ollama_*.py — metrics 整合 + recovery 改善
- auto_repair_service.py — verifier 接線
新增(測試 2438 行):
- test_decision_fusion.py / test_governance_agent.py / test_consensus_integration.py
- test_p2_db_fixes.py / test_wave8_fusion_fixes.py
- test_config_url_validation.py(vuln #1 12 tests)
- test_failover_alerter.py +Wave8-X2 in-memory dedup 補測
驗收: 116 tests pass (decision_fusion + wave8_fusion + config_url + consensus +
governance + p2_db_fixes + failover_alerter)
Conflict resolution:
- 3 檔(config.py + auto_approve.py + decision_manager.py)git stash pop 衝突
保留 stashed (engineer 最終版),補回 ValueError 「公網 IP」字樣對齊 test
Note: 此 commit 解 production HEAD 隱性 import error
仍未修: vuln #4 prompt injection / debugger B14 quota fail-closed
/ B25-B26 drain_pending_tasks / B8 governance fail alert
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
Co-Authored-By: Multiple Engineers (Wave 6/7/8) <noreply@anthropic.com >
2026-04-27 08:11:40 +08:00
Your Name
55c6b4e2d9
feat(p1): Ollama 多層容災系統 — P1.1 健康檢測 + P1.2 ai_router 整合 + P1.5 容災告警
...
ADR-092 P1 飛輪閉環的 Ollama 失敗轉移子系統,全部 Engineer-A2/C/C2 補上。
新服務 (1581 行):
- ollama_health_monitor.py (356):3 層健康檢測(TCP/HTTP/推理)
- ollama_failover_manager.py (571):111→188 自動切換 + Redis 持久化 + recovery callback
- ollama_auto_recovery.py (436):30s 背景監控 + 連續 3 次 HEALTHY → 切回 + clear_cache
- failover_alerter.py (218):P1.5 Telegram 容災告警
服務整合:
- ai_router.py: AIProviderEnum.OLLAMA_188 + 120s budget + failover fallback chain
- main.py lifespan: 啟動時 wire callback + start recovery,關閉時優雅 stop
- config.py: OLLAMA_FALLBACK_URL / OLLAMA_HEALTH_CHECK_MODEL / GEMINI_DAILY_QUOTA(帳單熔斷)
K8s 配置:
- 04-configmap.yaml.patch-188-fallback:注入 OLLAMA_FALLBACK_URL=http://192.168.0.188:11434
測試 (2082 行):
- test_ollama_health_monitor.py (402)
- test_ollama_failover_manager.py (707)
- test_ollama_auto_recovery.py (580)
- test_ai_router_failover_integration.py (257)
- test_lifespan_failover_wiring.py (136)
依賴鏈:service 三件套 + ai_router + main.py 一起 commit,缺一就 ImportError。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-26 20:18:33 +08:00
Your Name
86ee013cdf
feat(hermes-complete): Hermes NL 三項補強 + ConsensusEngine + ADR 收尾
...
CD Pipeline / build-and-deploy (push) Successful in 9m32s
## Hermes NL 補強(nl_gateway.py)
- T1 hermes_dispatch_log DB 寫入(asyncio.create_task 非阻擋)
- T2 Redis 速率限制:per-chat_id 20 req/min,fail-open
- T3 Multi-turn session:hermes:session:{chat_id}:{user_id} TTL=300s,最近 3 輪
## ConsensusEngine(ADR-095 宣告式設計)
- consensus_engine.py: CONSENSUS_WEIGHTS class 屬性
security=0.4 鎖定,9 個 Claude Code agent 分配 0.6
- config.py: ENABLE_12AGENT_CONSENSUS=False feature flag
## ADR 狀態
- ADR-093/094/095: Proposed → 🟡 批准實作中
- 各 ADR 加 v1.1 變更紀錄
## K8s ConfigMap
- prod 04-configmap.yaml: 加 3 個 feature flags(均 false)
- dev 02-configmap.yaml: 同步加入
## LOGBOOK
- 記錄 WS0–WS6 + 補強完成,feature flags 啟用指引
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-25 02:22:40 +08:00
Your Name
2572ec46d2
feat(ws4): Hermes NL 自然語言介面 — 12-Agent Claude SDK 接入(ADR-094/095)
...
## hermes/ 套件(5 個新模組)
### display_names.py
- 12 agent 視覺識別表(emoji + hashtag + handle + short_name)
- format_response_header() 產生 Telegram 前綴
### agent_loader.py
- 解析 .claude/agents/*.md frontmatter → system prompt
- lru_cache 避免重複讀檔
### safety_hooks.py
- 移植 awoooi-guard.js 20 條 HARD BLOCK 規則(DENY_PATTERNS)
- 5 條 MUTATE_PATTERNS → 須走審批流
### nl_gateway.py
- Layer 1: 關鍵字正則路由(12 條規則,<10ms)
- Layer 3: DEFAULT_AGENT = "debugger"
- Claude Agent SDK query() 非同步串流,取 ResultMessage.result
- 安全降級:SDK error → 友好錯誤訊息
### telegram_webhook.py
- WS4 Hermes NL 接入(@tsenyangbot mention 或私訊觸發)
- HERMES_NL_ENABLED=False(feature flag 保護,預設關閉)
## telegram_gateway.py
- send_hermes_reply(text, chat_id, reply_to_message_id)
無 500 字截斷,支援 Agent 長回覆
## config.py
- HERMES_NL_ENABLED: bool = False
- TELEGRAM_BOT_USERNAME: str = "tsenyangbot"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-25 02:10:06 +08:00
Your Name
294e0e3387
feat(ws3): ADR-093 Callback User-ID Binding + ADR-094 Webhook 入口
...
## T3.1/T3.2 Bound User Check(security_interceptor.py)
- verify_callback() Step 0: 檢查 Redis cb_bind:{nonce}
→ 若有 binding 且 caller != bound_user_id → UserNotWhitelistedError
→ 若 key 不存在(舊格式)→ 降級走 whitelist(向後相容)
→ 若 Redis unavailable → 降級繼續(安全降級)
- bind_callback_user(nonce, user_id): async 方法,TTL=48h
## T3.3 Telegram Webhook 入口(ADR-094)
- apps/api/src/api/v1/telegram_webhook.py(新建)
POST /api/v1/telegram/webhook
- X-Telegram-Bot-Api-Secret-Token header 驗證
- TELEGRAM_WEBHOOK_SECRET="" → dev 跳過(不 break 現有測試)
- WS4 Hermes NL 接入預留佔位
## T3.4 config.py
- 新增 TELEGRAM_WEBHOOK_SECRET field(預設空字串)
## main.py
- 掛載 telegram_webhook_v1.router 到 /api/v1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-25 02:10:06 +08:00
Your Name
6d5fd3c124
feat(ws2): ADR-093 路由統一 — BIGINT + NotificationMatrix + feature flag
...
## 修復
### T2.1 BigInteger overflow 修復
- `db/models.py`: telegram_chat_id Integer → BigInteger
(原 int32 無法容納群組 ID -1003711974679)
### T2.2 移除 CAST workaround
- `approval_db.py:739`: 移除 CAST(:telegram_chat_id AS BIGINT)
ORM 已正確使用 BigInteger,workaround 可退役
### T2.3 Redis key 一致性修復
- `heartbeat_report_service.py:575`: telegram:polling_leader → telegram:polling:leader
(telegram_gateway.py 使用冒號分隔,heartbeat 用底線是 bug)
## 新增
### T2.4 notification_matrix.py
- `services/notification_matrix.py`: ADR-093 路由矩陣
- Destination(DM/GROUP/BOTH) + RoutingRule dataclass
- NOTIFICATION_ROUTING dict(TYPE-1 ~ TYPE-8M 完整映射)
- resolve_chat_ids(type, dm, group, *, tg_group_cutover=False) 灰階切流 API
### T2.5 telegram_gateway.py feature flag 保護
- line 43: 加 notification_matrix import
- line 1827-1834: TG_GROUP_CUTOVER=False 時維持舊行為
TG_GROUP_CUTOVER=True 時解除 _interactive_types 黑名單,由矩陣控制
### T2.6 Migration SQL
- `migrations/adr093_notification_routing.sql`:
- CREATE TABLE approval_records (telegram_chat_id BIGINT)
- CREATE ROLE awoooi_migrator (IF NOT EXISTS)
- 含舊環境 ALTER COLUMN int→bigint 保護
## 測試同步
- `tests/integration/setup_test_schema.sql`: telegram_chat_id BIGINT
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-25 02:10:06 +08:00
Your Name
d0591c54b0
fix(security): 體健修復 — 7項 Critical/Major 安全問題全修
...
CD Pipeline / build-and-deploy (push) Failing after 35s
## Critical 修復 (C1-C5)
- C1: git rm --cached 03-secrets.yaml(CHANGE_ME 模板不再追蹤)
- C2: git rm --cached awoooi.db + .gitignore 加 *.db(SQLite HARD_RULES 違規)
- C3: sentry-tunnel SENTRY_HOST 改為 process.env fallback
- C4: config.py DATABASE_URL 移除 changeme default,改為必填
- C5: run_migration.py 改為 os.environ["DATABASE_URL"]
## Major 修復 (M1-M4)
- M1: auto_repair /execute 加 CSRF 保護 + AutoRepairPanel.tsx 同步
- M2: drift /rollback /adopt 加 CSRF 保護(/internal/scan 保持無 CSRF)
- M3: terminal /intent 加 CSRF 保護 + terminal.store.ts 同步
- M4: live-dashboard HOST_IPS + host-grid VIP 改為 env var
## 其他
- 新增 apps/web/.env.example(6 個 env var 說明)
- K8s deployment-web 補入 3 個新 env var
- 整合測試:新增 aider_event_repository + ai_router_feedback 真實 DB 測試
- test_terminal.py CSRF dependency override 修復
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-22 01:27:39 +08:00
Your Name
e1539a813e
feat(config+main): aider-watch v2 settings + router + lifespan register
...
- Add 4 settings to config.py: AIDER_WEBHOOK_SECRET, AIDER_EVENTS_STREAM_KEY, AIDER_PATTERN_EXTRACT_INTERVAL_HOURS, USE_AIDER_FEEDBACK (ADR-091)
- Import aider_events_v1 router in main.py imports (alphabetical after ai_slo_v1)
- Register aider_events_v1.router in include_router block (after alert_operation_logs_v1)
- Register run_aider_event_processor_loop() in lifespan (after compliance_scanner_loop)
- All 65 tests pass (24 action_parsing + 41 aider-watch tests)
Co-Authored-By: Claude Haiku 4.5 (1M context) <noreply@anthropic.com >
2026-04-20 19:40:02 +08:00
OG T
8b7e9cbfb8
fix(BLOCKER): LLM 連續失敗 — 4 個違反設計處全部修復
...
CD Pipeline / build-and-deploy (push) Successful in 14m21s
統帥盤點發現飛輪沉默真因:4 個違反既定架構設計的 bug 同時撞車。
P0a — Ollama timeout 違反 GAP-B4 設計
config.py:OPENCLAW_TIMEOUT 從 120s 改 30s
原 120s 違反 ADR-052 GAP-B4 (LLM 25s hard timeout) 設計
致 Ollama 過載時 thread 飢餓 120s 才降級
P0b — AI Router silent skip 觀測性修復
ai_router.py: not_registered/circuit_open/rate_limit/privacy_skip
全部累積到 errors 陣列,log all_providers_failed 時可知為何 skip
原本 errors=["ollama: Timeout"] 但 tried=4 個,無法診斷
P1a — send_text 方法不存在 bug
ai_router.py:1005 tg.send_text() → tg.send_notification(parse_mode=HTML)
TelegramGateway 只有 send_notification 沒 send_text
致 fallback 失敗通知本身失敗(雙重靜默)
P1b — resend_stale_ready_tokens 並發爆炸
decision_manager.py: 加 asyncio.Semaphore(5) + 200ms throttle
原本 fire_and_forget N 個 task 同時跑,N=108 時 Ollama embedding
全部 timeout,包括我打的 live-fire 也被擠爆
改:max 5 並發 + 每完成喘 200ms
CD 流程審查 (Blocker 1): 完全符合 ADR-039 設計,10-15 min 是預期
不需修,是設計就需要這時間。
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2026-04-14 19:37:03 +08:00
OG T
00a31abb85
feat(heartbeat): ADR-073 P2 心跳整合重構 — HeartbeatReportService + RedisLock
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 HeartbeatReportService:11 個並行探針(Ollama/Nemotron/Gemini/Claude/MCP×4/ArgoCD/Velero)
- 重寫 send_heartbeat():RedisLock 防重發 + 統一發送 SRE_GROUP_CHAT_ID
- 簡化 _heartbeat_loop():移除散落的 silence 多次發送
- config.py:新增 OLLAMA_REQUIRED_MODELS 欄位
- 03-secrets.example.yaml:補 SRE_GROUP_CHAT_ID 確保 CD Inject 不遺漏
2026-04-12 ogt (ADR-073 Phase 2-3/4)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-12 15:35:13 +08:00
OG T
184b37a8b1
refactor(decision_manager): I2 DI 化 MCP Providers + fix config list type bug
...
- DecisionManager.__init__ 注入 SSHProvider/K8sProvider,移除函數內 import+實例化
- config.get_tg_user_whitelist() 支援 list 輸入(monkeypatch/直接傳入),修復 AttributeError
- LOGBOOK 更新(test fix 6e0ee8b)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-12 13:04:46 +08:00
OG T
5d78c5492b
feat(argocd-mcp): 啟用 ArgoCD MCP Provider + token 注入流程
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- config.py: ARGOCD_URL → https://192.168.0.125:30443(實際 HTTPS NodePort)
- config.py: ARGOCD_MCP_ENABLED=True + SENTRY_MCP_ENABLED=True(預設啟用)
- cd.yaml: 新增 ARGOCD_API_TOKEN Gitea Secret → K8s Secret 注入步驟
- K8s: ARGOCD_API_TOKEN 已手動注入 awoooi-secrets + API pods 已 rollout restart
- ArgoCD: 已開啟 admin account apiKey capability
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 09:32:28 +08:00
OG T
a2cc985f60
feat(mcp-phase3): ArgoCD MCP + Sentry MCP + 完整 Provider 註冊
...
CD Pipeline / build-and-deploy (push) Has been cancelled
ArgoCDProvider (3 工具):
- argocd_list_apps: 列出所有 App + sync/health 狀態
- argocd_get_app_status: 詳細狀態 + 問題資源清單
- argocd_get_sync_history: 最近 N 筆部署記錄
- 輸入驗證: app_name 白名單 regex
- 需 ARGOCD_API_TOKEN + ARGOCD_MCP_ENABLED=true
SentryProvider (3 工具):
- sentry_list_issues: 列出最近 Issues(狀態過濾)
- sentry_get_issue: 詳情 + stacktrace 最後 5 frames
- sentry_search_issues: PromQL 風格搜尋
- issue_id 白名單驗證(只允許純數字)
- 需 SENTRY_AUTH_TOKEN + SENTRY_MCP_ENABLED=true
providers/__init__.py: 補上 Prometheus + SSH + ArgoCD + Sentry 全部 10 個 providers
config.py: 新增 ARGOCD_URL / ARGOCD_API_TOKEN / ARGOCD_MCP_ENABLED / SENTRY_MCP_ENABLED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 09:11:53 +08:00
OG T
6351e9a0e9
feat(mcp-phase2): MCP Phase 2 — Prometheus MCP + SSH MCP + alert labels
...
CD Pipeline / build-and-deploy (push) Successful in 13m37s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 35s
MCP-2b: prometheus_provider.py
- prometheus_query (PromQL 即時查詢)
- prometheus_query_range (歷史趨勢,預設 15 分鐘)
- prometheus_get_alert_history (告警觸發歷史)
- config: PROMETHEUS_URL + PROMETHEUS_MCP_ENABLED
MCP-2a: ssh_provider.py
- 群組A 9 個只讀診斷工具 (top/disk/memory/logs/status/port/nginx/swap)
- 群組B 6 個安全操作工具 (restart/compose/systemctl/clear-log/ssl/nginx-reload)
- 四層安全守衛 (白名單/allowed_hosts/forbidden_patterns/trust_score)
- config: SSH_MCP_ENABLED + SSH_MCP_ALLOWED_HOSTS
K8s: 04-ssh-mcp-secret.example.yaml (ssh-mcp-key Secret 範本 + 建立步驟)
Alert labels: alerts-unified.yml 補充 mcp_provider/host_type/alert_category
覆蓋: HostHighCpuLoad/HostOutOfMemory/HostOutOfDiskSpace/DockerContainer*
SignOzDown/SentryDown/HarborDown/GiteaDown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-11 02:35:35 +08:00
OG T
8c2983b70a
fix(api+web): CORS 補 K3s NodePort origins + sign 補 signer_id/name
...
CD Pipeline / build-and-deploy (push) Has been cancelled
CORS (config.py):
- 補 http://192.168.0.125:32335 (K3s VIP NodePort)
- 補 http://192.168.0.120:32335 + 121:32335 (K3s nodes)
- 修前: 內網瀏覽器開 :32335 打 API 全 CORS blocked
(incidents Failed to fetch / monitoring 無法連線根因)
sign body (pending-approvals-card.tsx):
- signer: 'web-ui' → signer_id: CURRENT_USER.id + signer_name: CURRENT_USER.name
- 修前: POST /approvals/{id}/sign 回 403 (缺必填欄位 422 誤報為 403)
— 實際是 422 Field required signer_id + signer_name
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 19:50:48 +08:00
OG T
7857c25677
feat: Ollama 本機 Tool Calling 取代 NVIDIA 雲端 (44s→~5s)
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- nvidia_provider.py: 新增 OllamaToolProvider
- 實作 INvidiaProvider protocol,打 Ollama /v1/chat/completions
- 模型: llama3.1:8b (tool calling 最穩定的 8B)
- 延遲: 44s → ~5s(本機 M1 Pro 192.168.0.111)
- get_nvidia_provider() 根據 USE_OLLAMA_TOOL_CALLING 切換
- config.py: USE_OLLAMA_TOOL_CALLING=True (預設開啟), OLLAMA_TOOL_MODEL=llama3.1:8b
- 回退: USE_OLLAMA_TOOL_CALLING=False → 恢復 NvidiaProvider 雲端
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-09 14:55:04 +08:00
OG T
8b5db2f58e
feat(infra): 切換 Ollama 到 M1 Pro 192.168.0.111 + NetworkPolicy 更新
...
CD Pipeline / build-and-deploy (push) Has been cancelled
- OLLAMA_URL: 188 → 111 (M1 Pro, 40+ tok/s vs 0.45 tok/s)
- OPENCLAW_DEFAULT_MODEL: qwen2.5:7b-instruct → deepseek-r1:14b (SRE最強推理)
- OPENCLAW_TIMEOUT: 90s → 120s (deepseek-r1:14b 實測最慢 54s)
- NetworkPolicy v1.3: 新增 192.168.0.111:11434 egress,移除 188 的 Ollama port
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-08 22:05:14 +08:00
OG T
84f1f9f021
refactor(config): GITHUB_WEBHOOK_SECRET → GITEA_WEBHOOK_SECRET (ADR-059)
2026-04-05 14:25:47 +08:00
OG T
5ad403b287
fix(p0): v4.3 — 實測確認 Ollama CPU-only 不可用,DIAGNOSE 統一走 NIM
...
實測依據 (2026-04-05):
- Ollama llama3.2:3b CPU-only: 238s 回 {"ok":true},生產不可用
- Nemotron NIM: 2.2s~27.3s,avg 10.6s,一直是主力(Phase 22 起)
- NIM 從未有隱私問題,Incident 資料一直送雲端 GPU
變更:
- ai_router.py: _local_fallback_chain 廢棄(空 list)
- ai_router.py: DIAGNOSE route/route_sync 改回 _full_fallback_chain
- config.py: 更新 timeout 說明反映實測結果
- test_p0_diagnose_routing.py: 更新 docstring
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 01:49:06 +08:00
OG T
a81bf50537
feat(drift): ADR-057 adopt() Gitea PR API 實作
...
- DriftAdoptService: 透過 Gitea REST API 建立 branch + commit + PR
不在 API Pod 內執行 git(修復 C2 安全漏洞)
- adopt() 端點: 501 → 真實實作(呼叫 DriftAdoptService)
- config.py: 新增 GITEA_API_URL / GITEA_API_TOKEN / GITEA_REPO_OWNER / GITEA_REPO_NAME
- K8s secret awoooi-secrets 已注入 GITEA_API_TOKEN
- drift.py: 移除 trigger_drift_scan 中未使用的 interpreter 變數
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 00:39:29 +08:00
OG T
96d5e18924
fix(p0): 實測修正 — timeout 依 benchmark 調整,_local_fallback_chain 移除雲端 Nemotron
...
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=60s (NIM 實測 11-45s + 15s buffer)
- config.py: OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=200s (Ollama 實測 ~173s + 27s buffer)
- ollama.py: 新增 per-task timeout (diagnose/force_local 用 200s)
- ai_router.py: _local_fallback_chain 移除 Nemotron (NIM=雲端,不可進 local chain)
- ai_router.py: v4.2 — Option C 分情境路由正式確立
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-05 00:29:09 +08:00
OG T
3455044457
feat(phase25): Nemotron 主動防禦三方向 P0+P1+P2 完整實作
...
CD Pipeline / build-and-deploy (push) Failing after 38s
Type Sync Check / check-type-sync (push) Failing after 35s
P0 - DIAGNOSE Privacy-First Routing:
- ai_router.py: _local_fallback_chain [NEMOTRON→OLLAMA→REJECT]
- DIAGNOSE 意圖 override 改為 NEMOTRON (原 OLLAMA)
- DIAGNOSE fallback 使用 local-only 鏈,不觸碰雲端
- 全部失敗時 REJECT + Telegram 通知
- config.py: NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS=30, OLLAMA_DIAGNOSE_TIMEOUT_SECONDS=60
- nemotron.py: 根據 context[task_type] 選擇 timeout
P1 - Knowledge Auto-Harvesting:
- models/knowledge.py: EntryType.AUTO_RUNBOOK + ANTI_PATTERN + symptoms_hash
- EntryStatus.PUBLISHED (ANTI_PATTERN 直接發布,無需審核)
- models/playbook.py: SymptomPattern.compute_hash() (16字元確定性 hash)
- services/runbook_generator.py: NemotronRunbookGenerator (v1.1)
- generate_runbook() → AUTO_RUNBOOK (DRAFT) + Telegram 審核 card
- generate_anti_pattern() → ANTI_PATTERN (PUBLISHED) + Telegram 通知
- 使用 nvidia.chat() (正確介面),Nemotron 超時時 Minimal fallback
- knowledge_service.py: check_anti_pattern(symptoms_hash, days=7)
- db/models.py: symptoms_hash VARCHAR(16) + ix_knowledge_symptoms_hash
- repositories/knowledge_repository.py: create() 支援 symptoms_hash + status
- auto_repair_service.py: anti_pattern_gate 在 decide() + runbook hook 在 execute()
- migrations/phase8_symptoms_hash.sql: ALTER TABLE + partial index + PUBLISHED constraint
P2 - Config Drift Detection:
- models/drift.py: DriftItem/DriftReport/DriftLevel/DriftIntent/DriftStatus
- services/drift_detector.py: GitStateReader + K8sStateReader + DriftDetector
- services/drift_analyzer.py: 白名單過濾 + DriftLevel 分級
- services/drift_interpreter.py: NemotronDriftInterpreter(意圖分析,不生成修復指令)
- services/drift_remediator.py: rollback(kubectl apply) + adopt(git push gitea)
- api/v1/drift.py: POST /scan, GET /reports, POST /rollback, POST /adopt
- migrations/phase9_drift_reports.sql: drift_reports 表
- k8s/drift-cronjob.yaml: 每小時自動掃描 CronJob
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-04 12:35:05 +08:00
OG T
c65ed5b1c9
feat(telegram): SRE 戰情室群組三頭政治 Triumvirate (ADR-053)
...
CD Pipeline / build-and-deploy (push) Successful in 7m6s
- config.py: 新增 OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN / SRE_GROUP_CHAT_ID
- telegram_gateway.py: send_to_group / send_as_openclaw / send_as_nemotron / trigger_group_ai_discussion / _send_approval_card_to_group
- send_approval_card 告警發送後非同步觸發群組 AI 雙向討論
- configmap: SRE_GROUP_CHAT_ID=-1003711974679
- secrets: OPENCLAW_BOT_TOKEN / NEMOTRON_BOT_TOKEN CHANGE_ME 佔位
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-03 17:16:05 +08:00
OG T
73e8f8ab77
feat(ai): Phase 24-A+B1 — AI Provider Registry + 絞殺者包裝 (ADR-052)
...
E2E Health Check / e2e-health (push) Successful in 16s
CD Pipeline / build-and-deploy (push) Has been cancelled
Brain Layer 雙軌 Registry 架構:
- 新建 src/services/ai_providers/ 目錄 (interfaces + 4 providers)
- OllamaProvider (local, rca/chat/code_review)
- GeminiProvider (cloud, rca/chat)
- ClaudeProvider (cloud, rca/chat/code_review)
- OpenClawNemoProvider (cloud, rca — 委派 188→NIM)
- 擴展 ai_router.py 加入:
- AIProviderRegistry (動態註冊/啟停)
- AIRouterExecutor (Cache + 閘門 CB/RL/Sem + 執行)
- openclaw.py 絞殺者包裝: USE_AI_ROUTER=true 走新路徑
- config.py + ConfigMap 加入 USE_AI_ROUTER=false (安全預設)
- ADR-052 正式文件 (14 項決策 D1-D14)
- HARD_RULES v1.7 加入 AI Router 規範
安全: USE_AI_ROUTER=false 預設不啟用,需手動開啟觀察
回滾: kubectl set env deployment/awoooi-api USE_AI_ROUTER=false
2026-04-02 ogt: Phase 24 首批實作
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-02 13:16:09 +08:00
OG T
c9c60c3a61
feat(mcp-integrations): Phase S 架構修復 + MCP 整合基礎建設
...
E2E Health Check / e2e-health (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Failing after 22s
Phase S 技術債修復 (首席架構師審查 82→完整):
- S-01: generate_alert_fingerprint 移至 AlertAnalyzer.generate_fingerprint() staticmethod
- S-04: 移除 Pydantic v2 deprecated json_encoders (直接用原生 datetime 序列化)
Sentry MCP 整合 (Phase 23):
- ADR-048: Sentry→OpenClaw AI Triage 架構決策
- sentry_webhook_service.py: parse/analyze/create_incident/build_message Service 層
- config.py: SENTRY_WEBHOOK_SECRET (Fail-Closed HMAC-SHA256)
Playwright MCP 整合 (短期):
- smoke.spec.ts: 5 頁面 E2E smoke test (home/dashboard/incidents/approvals/terminal)
- cd.yaml: E2E Smoke Test 步驟 + Telegram 🎭 Smoke 狀態通知
長期規劃 ADR:
- ADR-049: Figma Code Connect 設計系統同步
- ADR-050: Telegram 互動式 Incident 2.0 (6鍵 Inline Keyboard)
- ADR-051: Context7 依賴升級顧問 (Next.js 14→15, FastAPI 0.115→0.128)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 16:20:57 +08:00
OG T
22de22c989
refactor(phase-s): Phase S 技術債清理 - 五項架構改善
...
S-01: generate_alert_fingerprint() 移至 alert_analyzer_service (Router→Service)
S-02: 移除廢棄 USE_NEW_ENGINE config (Phase R 已完成歷史使命)
S-03: github_webhook.py linter 清理 (Field unused + delivery_id noqa)
S-04: Pydantic v2 遷移 - approval/incident models (class Config → ConfigDict)
S-05: Skill 09 v1.1 更新 (USE_NEW_ENGINE 廢棄說明)
測試: 393 passed, 零失敗
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-04-01 13:12:02 +08:00
OG T
2ba61acf72
fix(api): Phase R-R2.2 首席架構師 72/100 P2 修復
...
P2-01 signal_worker.py: persisted_to_pg 改用 getattr 防 BrainIncident AttributeError
P2-02 IIncidentEngine Protocol: update_incident_status → update_status 對齊 brain 實作
P2-03 config.py USE_NEW_ENGINE: 標記失效 + 回滾路徑更正 (git revert 而非 kubectl)
ADR-046: Option B (IncidentConverter) 決策完成,待實作清單更新
ADR-024: 審查結論 + 正式回滾指令更新
Skill 02: v2.5 版本記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-31 22:33:08 +08:00
OG T
dd526684ab
feat(ai): Phase 22 OpenClaw + Nemotron 協作架構 (ADR-044)
...
E2E Health Check / e2e-health (push) Successful in 17s
統帥批准實作「仲裁-執行分工」架構:
- OpenClaw = 仲裁者 (Why + Risk Level)
- Nemotron = 執行者 (How + kubectl Command)
新增功能:
- config.py: ENABLE_NEMOTRON_COLLABORATION Feature Flag
- openclaw.py: generate_incident_proposal_with_tools()
- openclaw.py: _call_nemotron_tools() Nemotron 呼叫
- telegram_gateway.py: TelegramMessage Nemotron 欄位
- telegram_gateway.py: format_with_nemotron() 雙區塊格式
- decision_manager.py: 整合協作方法
- proposal_service.py: 整合協作方法
觸發條件:
- LOW 風險 → 僅 OpenClaw
- MEDIUM/HIGH/CRITICAL → OpenClaw + Nemotron 雙軌
首席架構師審查: 83/100 條件通過
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-31 18:52:53 +08:00
OG T
b77e151387
feat(ai): ADR-036 NVIDIA Nemotron Tool Calling 整合
...
Phase 20 - 提升 Tool Calling 精準度 50% → 83.3%
新增:
- src/models/nvidia.py: Pydantic Schema
- src/services/nvidia_provider.py: NvidiaProvider 類別
- tests/test_nvidia_provider.py: 15 項單元測試 (全部通過)
修改:
- ai_router.py: AIProvider.NVIDIA + route_tool_calling()
- ai_rate_limiter.py: NVIDIA 限制 (5 RPM, 100/day)
- models.json: NVIDIA 配置
- cd.yaml: Secrets 注入 NVIDIA_API_KEY
路由策略:
- Tool Calling: Nemotron → Gemini → Claude
- 一般對話: Ollama → Gemini → Claude (不變)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-29 00:00:08 +08:00
OG T
801b08a4b7
fix(api): AI_FALLBACK_ORDER 無法正確解析 JSON 格式
...
根因: ConfigMap 用 JSON '["gemini","ollama","claude"]'
但 validator 用 split(",") 解析,導致無法匹配任何 provider
結果永遠用 default ["ollama","gemini","claude"]
影響: /api/v1/incidents 超時 (Ollama CPU 推理慢)
修復: 新增 JSON 格式支援,優先嘗試 json.loads()
這是根因修復,不是重啟!
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 20:10:56 +08:00
OG T
e58da5c534
feat(api): Phase 13.2 #83 Grafana MCP Tool
...
New MCP provider for Grafana dashboard integration:
- list_dashboards: List available dashboards with filtering
- get_dashboard: Get dashboard details by UID
- get_panel_data: Query panel data via Grafana Query API
- generate_dashboard_url: Generate shareable dashboard URLs
Security:
- API key authentication (Bearer token)
- Dashboard UID validation (alphanumeric + dash/underscore)
- Read-only operations only
- 30s request timeout
Config:
- GRAFANA_URL (default: http://192.168.0.188:3000 )
- GRAFANA_API_KEY
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 15:36:17 +08:00
OG T
1ac8965a7a
feat(api): Phase 15.1 Langfuse LLMOps 整合 + 模型升級
...
## 新功能
- Langfuse 自建部署 (192.168.0.110:3100)
- langfuse_client.py - LLM 呼叫追蹤包裝
- OpenClaw 整合 Langfuse trace
## 模型升級 (統帥批准)
- 生產預設: llama3.2:3b → qwen2.5:7b-instruct
- 摘要任務: llama3.2:3b (速度優先)
## 配置更新
- requirements.txt: +langfuse>=2.0.0
- config.py: +LANGFUSE_* 設定
- models.json: 更新 Ollama 模型配置
- K8s: Secret + ConfigMap 更新
## 審查通過
- 模組化檢查 ✅
- 核心測試 31/31 ✅
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 00:32:19 +08:00
OG T
a202a2693a
feat(api): Phase 16 R1.2 絞殺者模式 (Strangler Fig Pattern)
...
- 新增 USE_NEW_ENGINE 設定開關 (預設 False)
- incident_memory.py 雙軌切換: 內嵌版本 ↔ lewooogo-brain
- 自動降級: lewooogo-brain 不可用時回退內嵌版本
- 回滾指令: kubectl set env deployment/awoooi-api USE_NEW_ENGINE=false
統帥批准 2026-03-26 立即執行
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 15:23:03 +08:00
OG T
22cada563b
fix(config): Share Redis DB 0 with OpenClaw
...
- Change REDIS_URL from DB 10 to DB 0
- AWOOOI and OpenClaw now share the same Redis database
- Incidents created by OpenClaw visible in AWOOOI UI
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 18:44:34 +08:00
OG T
8159d22db9
refactor: ClawBot → OpenClaw 全域更名
...
- 刪除舊版 clawbot.py (已有新版 openclaw.py)
- 更新 models/ai.py 類型定義 (ClawBotAnalysisRequest/Response)
- 更新 api/v1/ai.py import 與註解
- 更新 Discord username
- 更新所有註解與文檔
依據: feedback_openclaw_naming.md (統帥 2026-03-20 正式命名決議)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 12:57:36 +08:00
OG T
e23493741a
fix(telegram): respect C-Suite decision - OpenClaw is sole brain
...
架構修正 2026-03-23 (遵循 C-Suite 決議):
- 鐵律: .188 為唯一大腦,禁止腦分裂
- OpenClaw (192.168.0.188) = 唯一 Telegram Gateway
- AWOOOI API (K8s) = Web API + Sensor,不做 Polling
- TELEGRAM_ENABLE_POLLING 預設 False
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 19:25:08 +08:00
OG T
7478dc0254
feat(phase6-9): Complete modular architecture and Agent Teams
...
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context
Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture
DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies
Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback
Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 18:40:36 +08:00
OG T
962b1e75a5
refactor: Rename ClawBot → OpenClaw across documentation
...
- Update .awoooi-agent-rules.md (4 occurrences)
- Update docs/api/openapi.yaml (all schema references)
- Update apps/web/tailwind.config.ts (comment)
- Update apps/api/src/core/config.py (comment)
Legacy CLAWBOT_URL field kept for backward compatibility.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 14:05:53 +08:00
OG T
b00f318450
fix(api): correct OTEL gRPC endpoint format and SignOz query table
...
Root cause analysis:
1. OTEL gRPC endpoint had http:// prefix which is invalid for gRPC
2. SignOz query was targeting wrong table (signoz_metrics.distributed_samples_v4)
3. Should query signoz_traces.distributed_signoz_index_v2 for trace data
Fixes:
- Remove http:// prefix from OTEL_EXPORTER_OTLP_ENDPOINT (gRPC needs host:port)
- Update SignOz client to query traces table instead of metrics table
- Fix timestamp format (nanoseconds for DateTime64(9))
- statusCode: 0=Unset, 1=Ok, 2=Error
This should enable OTEL traces to reach SigNoz and GlobalPulse to show real metrics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:41:51 +08:00
OG T
21ce7056fa
fix(otel): correct OTEL endpoint to port 24317 and fix NetworkPolicy
...
- SigNoz OTEL Collector maps container:4317 to host:24317
- Updated NetworkPolicy egress to allow 24317/24318
- Updated ConfigMap with correct OTEL_EXPORTER_OTLP_ENDPOINT
- Fixed OpenClaw port from 8089 to 8088
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:06:07 +08:00
OG T
551a305fcf
fix(config): rename _OPENCLAW_TG_USER_WHITELIST_RAW to comply with pydantic v2
...
Pydantic v2 does not allow field names with leading underscores.
Changed from @property pattern to method pattern.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-22 23:40:09 +08:00
OG T
c2b33a99a3
fix(config): 避免 pydantic-settings 自動 JSON 解析 WHITELIST
...
使用 str + property 取代 list[int] + validator
解決 K8s Secret 注入時的解析錯誤
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 23:18:50 +08:00
OG T
196d269b92
feat: add all application source code
...
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 18:57:44 +08:00