AWOOOI CD
|
dfbf3f8f20
|
chore(cd): deploy a184b82 [skip ci]
|
2026-04-27 08:08:52 +00:00 |
|
Your Name
|
c3fa03fc19
|
fix(solver): 補 AGENT_SOLVER_TIMEOUT_SEC=80 + prompt 禁無腦重啟
CD Pipeline / build-and-deploy (push) Has been cancelled
問題1:AGENT_SOLVER_TIMEOUT_SEC 預設 20s K8s 未設 → deepseek-r1:14b 必然
timeout → candidates=[] → action="" → Telegram 顯示「待分析」+「規則分析」
問題2:Solver prompt JSON 範例只有 restart + kubectl top,LLM 模仿範例
→ 所有告警都推重啟,HostDisk/CPU 類應優先診斷+清理
修復:
- K8s 加 AGENT_SOLVER_TIMEOUT_SEC=80(< OPENCLAW_TIMEOUT=120,留 buffer)
- Solver prompt 加根因對應修復規則:HostDisk→df/du/journalctl,CPU→top/ps,
OOM→kubectl logs,禁止「先重啟」
- JSON 範例改為 HostDisk SSH 診斷場景,不再只有 K8s 命令
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-27 15:51:42 +08:00 |
|
Your Name
|
1b6a4dc14c
|
fix(k8s): 補 AGENT_DIAGNOSTICIAN_TIMEOUT_SEC=100 救急 step_timeout
CD Pipeline / build-and-deploy (push) Has been cancelled
根因:deepseek-r1:14b 推理單題實測 28s,SRE prompt 更長必然 >30s
AGENT_DIAGNOSTICIAN_TIMEOUT_SEC 預設 30s,K8s 沒有覆寫
導致 diagnostician 必然 step_timeout → 信心 20% 降級
修復:K8s 加 AGENT_DIAGNOSTICIAN_TIMEOUT_SEC=100(低於 OPENCLAW_TIMEOUT=120,留 20s buffer)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-27 15:40:46 +08:00 |
|
AWOOOI CD
|
e0ca1c1f78
|
chore(cd): deploy ea23972 [skip ci]
|
2026-04-27 07:30:40 +00:00 |
|
AWOOOI CD
|
92a5d94382
|
chore(cd): deploy f4998b3 [skip ci]
|
2026-04-27 07:15:37 +00:00 |
|
Your Name
|
1ab6786ce3
|
feat(ops): Ollama 容災 Runbook + Grafana 儀表板 + Consensus K8s ConfigMap patch
run-migration / migrate (push) Failing after 13s
CD Pipeline / build-and-deploy (push) Failing after 2m1s
Wave 6 P2.3 ops 配套 + tool-expert 部署文件:
新增:
- docs/runbooks/RUNBOOK-OLLAMA-FAILOVER.md (240 行)
· 三大鐵律驗證步驟(自動切 Gemini / 自動切回 / quota 熔斷)
· failover/recovery 完整 SOP
· 故障排查清單(Ollama 111/188 不通、Gemini quota 超發等)
- ops/monitoring/grafana/dashboards/ollama_failover.json (295 行)
· 4 panel:current primary / failover events / quota usage / health status
· 對應 P2.3 metrics: OLLAMA_FAILOVER_TRIGGERED_TOTAL / GEMINI_DAILY_CALL_COUNT
- k8s/awoooi-prod/04-configmap.yaml.patch-consensus
· ENABLE_12AGENT_CONSENSUS / ENABLE_AIOPS_P2_FUSION feature flag patch
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tool-expert agent (Wave 6) <noreply@anthropic.com>
|
2026-04-27 08:11:40 +08:00 |
|
AWOOOI CD
|
b0bf3783e4
|
chore(cd): deploy 2c57b71 [skip ci]
|
2026-04-26 13:04:37 +00:00 |
|
Your Name
|
55c6b4e2d9
|
feat(p1): Ollama 多層容災系統 — P1.1 健康檢測 + P1.2 ai_router 整合 + P1.5 容災告警
ADR-092 P1 飛輪閉環的 Ollama 失敗轉移子系統,全部 Engineer-A2/C/C2 補上。
新服務 (1581 行):
- ollama_health_monitor.py (356):3 層健康檢測(TCP/HTTP/推理)
- ollama_failover_manager.py (571):111→188 自動切換 + Redis 持久化 + recovery callback
- ollama_auto_recovery.py (436):30s 背景監控 + 連續 3 次 HEALTHY → 切回 + clear_cache
- failover_alerter.py (218):P1.5 Telegram 容災告警
服務整合:
- ai_router.py: AIProviderEnum.OLLAMA_188 + 120s budget + failover fallback chain
- main.py lifespan: 啟動時 wire callback + start recovery,關閉時優雅 stop
- config.py: OLLAMA_FALLBACK_URL / OLLAMA_HEALTH_CHECK_MODEL / GEMINI_DAILY_QUOTA(帳單熔斷)
K8s 配置:
- 04-configmap.yaml.patch-188-fallback:注入 OLLAMA_FALLBACK_URL=http://192.168.0.188:11434
測試 (2082 行):
- test_ollama_health_monitor.py (402)
- test_ollama_failover_manager.py (707)
- test_ollama_auto_recovery.py (580)
- test_ai_router_failover_integration.py (257)
- test_lifespan_failover_wiring.py (136)
依賴鏈:service 三件套 + ai_router + main.py 一起 commit,缺一就 ImportError。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 20:18:33 +08:00 |
|
AWOOOI CD
|
4a8c3ca5c4
|
chore(cd): deploy bb12647 [skip ci]
|
2026-04-25 02:39:34 +00:00 |
|
AWOOOI CD
|
f676b61282
|
chore(cd): deploy cbd28e2 [skip ci]
|
2026-04-25 01:55:58 +00:00 |
|
AWOOOI CD
|
b8b5c68f31
|
chore(cd): deploy f9f2263 [skip ci]
|
2026-04-24 19:37:26 +00:00 |
|
AWOOOI CD
|
411a285735
|
chore(cd): deploy 250eca9 [skip ci]
|
2026-04-24 19:23:08 +00:00 |
|
Your Name
|
c14f23b33a
|
feat(k8s+notification): TG_GROUP_CUTOVER=true — 所有告警全切 SRE 群組
notification_matrix TYPE-5S: DM → GROUP(SignOz 事件補齊)
prod/dev ConfigMap TG_GROUP_CUTOVER: false → true
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 03:07:28 +08:00 |
|
AWOOOI CD
|
fa453fa1f3
|
chore(cd): deploy 974cc7f [skip ci]
|
2026-04-24 18:52:18 +00:00 |
|
Your Name
|
974cc7f204
|
feat(k8s): prod ConfigMap HERMES_NL_ENABLED=true
CD Pipeline / build-and-deploy (push) Successful in 13m22s
@tsenyangbot @mention 在 SRE 群組已接通,polling 路徑 → Hermes NL → 12-Agent Claude SDK
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:43:42 +08:00 |
|
AWOOOI CD
|
f48e0725e8
|
chore(cd): deploy 86ee013 [skip ci]
|
2026-04-24 18:30:57 +00:00 |
|
Your Name
|
86ee013cdf
|
feat(hermes-complete): Hermes NL 三項補強 + ConsensusEngine + ADR 收尾
CD Pipeline / build-and-deploy (push) Successful in 9m32s
## Hermes NL 補強(nl_gateway.py)
- T1 hermes_dispatch_log DB 寫入(asyncio.create_task 非阻擋)
- T2 Redis 速率限制:per-chat_id 20 req/min,fail-open
- T3 Multi-turn session:hermes:session:{chat_id}:{user_id} TTL=300s,最近 3 輪
## ConsensusEngine(ADR-095 宣告式設計)
- consensus_engine.py: CONSENSUS_WEIGHTS class 屬性
security=0.4 鎖定,9 個 Claude Code agent 分配 0.6
- config.py: ENABLE_12AGENT_CONSENSUS=False feature flag
## ADR 狀態
- ADR-093/094/095: Proposed → 🟡 批准實作中
- 各 ADR 加 v1.1 變更紀錄
## K8s ConfigMap
- prod 04-configmap.yaml: 加 3 個 feature flags(均 false)
- dev 02-configmap.yaml: 同步加入
## LOGBOOK
- 記錄 WS0–WS6 + 補強完成,feature flags 啟用指引
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:22:40 +08:00 |
|
AWOOOI CD
|
ad0e5cbbbc
|
chore(cd): deploy 0044337 [skip ci]
|
2026-04-24 18:20:09 +00:00 |
|
AWOOOI CD
|
c31bc8411f
|
chore(cd): deploy 55f111e [skip ci]
|
2026-04-24 16:21:56 +00:00 |
|
AWOOOI CD
|
6df631c895
|
chore(cd): deploy 0d81b28 [skip ci]
|
2026-04-24 16:02:18 +00:00 |
|
AWOOOI CD
|
ad494288cb
|
chore(cd): deploy c995fe4 [skip ci]
|
2026-04-24 12:49:30 +00:00 |
|
AWOOOI CD
|
8f02a9efe2
|
chore(cd): deploy 97ce5ea [skip ci]
|
2026-04-24 08:05:11 +00:00 |
|
AWOOOI CD
|
3bd105be9a
|
chore(cd): deploy 88af639 [skip ci]
|
2026-04-22 01:18:56 +00:00 |
|
AWOOOI CD
|
757a58cc60
|
chore(cd): deploy 1625e7b [skip ci]
|
2026-04-21 18:10:42 +00:00 |
|
AWOOOI CD
|
ca8361e0bc
|
chore(cd): deploy 6d5f070 [skip ci]
|
2026-04-21 17:56:34 +00:00 |
|
Your Name
|
d0591c54b0
|
fix(security): 體健修復 — 7項 Critical/Major 安全問題全修
CD Pipeline / build-and-deploy (push) Failing after 35s
## Critical 修復 (C1-C5)
- C1: git rm --cached 03-secrets.yaml(CHANGE_ME 模板不再追蹤)
- C2: git rm --cached awoooi.db + .gitignore 加 *.db(SQLite HARD_RULES 違規)
- C3: sentry-tunnel SENTRY_HOST 改為 process.env fallback
- C4: config.py DATABASE_URL 移除 changeme default,改為必填
- C5: run_migration.py 改為 os.environ["DATABASE_URL"]
## Major 修復 (M1-M4)
- M1: auto_repair /execute 加 CSRF 保護 + AutoRepairPanel.tsx 同步
- M2: drift /rollback /adopt 加 CSRF 保護(/internal/scan 保持無 CSRF)
- M3: terminal /intent 加 CSRF 保護 + terminal.store.ts 同步
- M4: live-dashboard HOST_IPS + host-grid VIP 改為 env var
## 其他
- 新增 apps/web/.env.example(6 個 env var 說明)
- K8s deployment-web 補入 3 個新 env var
- 整合測試:新增 aider_event_repository + ai_router_feedback 真實 DB 測試
- test_terminal.py CSRF dependency override 修復
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-22 01:27:39 +08:00 |
|
AWOOOI CD
|
49e465954c
|
chore(cd): deploy 4fc1f49 [skip ci]
|
2026-04-21 14:35:32 +00:00 |
|
AWOOOI CD
|
0a72ae21e4
|
chore(cd): deploy 8fd31ec [skip ci]
|
2026-04-21 13:38:44 +00:00 |
|
AWOOOI CD
|
4bc183742f
|
chore(cd): deploy bd73548 [skip ci]
|
2026-04-21 13:26:51 +00:00 |
|
AWOOOI CD
|
a2777aee04
|
chore(cd): deploy 685f5c6 [skip ci]
|
2026-04-21 13:05:41 +00:00 |
|
AWOOOI CD
|
4bc52a9bdc
|
chore(cd): deploy acab1cd [skip ci]
|
2026-04-21 07:29:25 +00:00 |
|
AWOOOI CD
|
3c266190cf
|
chore(cd): deploy 3323a90 [skip ci]
|
2026-04-20 17:13:47 +00:00 |
|
AWOOOI CD
|
e60c064bdc
|
chore(cd): deploy 9a44516 [skip ci]
|
2026-04-20 12:29:49 +00:00 |
|
AWOOOI CD
|
f9ff23f007
|
chore(cd): deploy 156a52f [skip ci]
|
2026-04-20 12:09:31 +00:00 |
|
AWOOOI CD
|
72aea671b3
|
chore(cd): deploy ce918ee [skip ci]
|
2026-04-20 11:48:59 +00:00 |
|
AWOOOI CD
|
770e869f7e
|
chore(cd): deploy 803b389 [skip ci]
|
2026-04-19 20:31:09 +00:00 |
|
AWOOOI CD
|
525102d87e
|
chore(cd): deploy 4188df6 [skip ci]
|
2026-04-19 20:22:13 +00:00 |
|
AWOOOI CD
|
a837172fd5
|
chore(cd): deploy f572561 [skip ci]
|
2026-04-19 15:10:19 +00:00 |
|
AWOOOI CD
|
b9068d495f
|
chore(cd): deploy fa643eb [skip ci]
|
2026-04-19 14:47:23 +00:00 |
|
AWOOOI CD
|
2af623032a
|
chore(cd): deploy 37b6c9b [skip ci]
|
2026-04-19 14:31:48 +00:00 |
|
AWOOOI CD
|
b9c4896c7f
|
chore(cd): deploy 2f5cab2 [skip ci]
|
2026-04-19 14:10:25 +00:00 |
|
AWOOOI CD
|
32959db83d
|
chore(cd): deploy 0004554 [skip ci]
|
2026-04-19 13:29:28 +00:00 |
|
AWOOOI CD
|
f1b13d7b26
|
chore(cd): deploy 7db8845 [skip ci]
|
2026-04-19 12:36:04 +00:00 |
|
AWOOOI CD
|
638053346b
|
chore(cd): deploy ceb61c3 [skip ci]
|
2026-04-19 12:15:43 +00:00 |
|
AWOOOI CD
|
576f9dad18
|
chore(cd): deploy ba18ad2 [skip ci]
|
2026-04-19 11:46:35 +00:00 |
|
AWOOOI CD
|
e84338e615
|
chore(cd): deploy 6ab0ce9 [skip ci]
|
2026-04-19 10:18:43 +00:00 |
|
AWOOOI CD
|
691bdc6cc1
|
chore(cd): deploy e677773 [skip ci]
|
2026-04-19 09:35:27 +00:00 |
|
AWOOOI CD
|
46677a3392
|
chore(cd): deploy df71c9a [skip ci]
|
2026-04-19 09:12:54 +00:00 |
|
AWOOOI CD
|
0d2455ae9a
|
chore(cd): deploy fdf8b73 [skip ci]
|
2026-04-19 09:01:49 +00:00 |
|
AWOOOI CD
|
c77ce63a32
|
chore(cd): deploy 0226344 [skip ci]
|
2026-04-19 08:39:23 +00:00 |
|