Your Name
|
337b2df60d
|
chore(cd): deploy latest image tag for prod manifests
|
2026-06-04 00:13:51 +08:00 |
|
Your Name
|
ab21d8bad2
|
chore: execute W1-redline convergence updates and evidence log
|
2026-06-03 20:10:14 +08:00 |
|
Your Name
|
2d37383fc6
|
fix(monitoring): fix false positive NoAlertsReceived2Hours by filtering only alertmanager source
|
2026-05-28 15:33:17 +08:00 |
|
Your Name
|
3779f6f1e0
|
fix(metrics): 串入飛輪指標到 /metrics 主端點,修復 FlywheelExecutionRateMissing 死告警
INC-20260507-99ADF2 根因(feedback_full_chain_first_then_fix.md 全鏈分析):
【鏈路斷點】規則層(5/3 加)vs 指標層(5/6 改)vs scrape 層(從沒同步)
- 577250a6(5/3)「反消音化」commit 加了 FlywheelExecutionRateMissing
rule,要求 110 Prom scrape 到 awoooi_flywheel_execution_success_rate;
- a2c4b3d4(5/6)Codex 改 FlywheelStatsService 用 auto_repair_executions
作 source of truth(24h 樣本 1-9 筆回 None 給 W-3b watchdog 接管);
- 但 awoooi_flywheel_* 指標自始至終只在 /api/v1/stats/flywheel/metrics
暴露,110 Prom awoooi-api job 抓的是 /metrics → absent() 永遠 1
→ 自 2026-05-06T04:14 UTC 起 firing 26h+ 屬 dead alert
【修法】只動 awoooi-api 一處,不碰 Codex 設計、不碰 110 Prom 配置:
- main.py /metrics endpoint 改 async,在 generate_latest() 後串入
FlywheelStatsService.compute() → to_prometheus_lines()。
- 既有 awoooi-api scrape job 自動拿到飛輪指標。
- 完全保留 Codex a2c4b3d4 設計:1-9 筆回 None 讓 W-3b watchdog 雙保險。
【不碰的部分】
- flywheel_stats_service.py 不動:Codex 5/6 LOGBOOK 已明確說明
「Redis playbook counter 失準 → 用 auto_repair_executions 為唯一信任源」,
1-9 筆 return None 是配合 ai_slo_watchdog_job W-3b grace+30min 設計的
反消音化雙保險,不是 bug。
驗證計畫(部署後):
1. curl /metrics | grep awoooi_flywheel → 看到飛輪指標
2. Prom query awoooi_flywheel_execution_success_rate → 非空
3. ALERTS{alertname="FlywheelExecutionRateMissing"} → resolved
4. 30 分鐘觀察 Telegram 不再收 INC-20260507-99ADF2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-07 15:32:47 +08:00 |
|
Your Name
|
c38227e945
|
fix(ai): remove 188 ollama provider
|
2026-05-06 14:33:16 +08:00 |
|
Your Name
|
1b4a6c1e8c
|
fix(awooop): align console with flywheel execution metrics
|
2026-05-06 00:44:53 +08:00 |
|
Your Name
|
894174da5b
|
fix(ops): harden cold-start schedule recovery
|
2026-05-05 22:14:54 +08:00 |
|
Your Name
|
10cd9fc025
|
fix(openclaw): gate alert cloud fallback behind flag
|
2026-05-05 20:53:12 +08:00 |
|
Your Name
|
8161ccf83f
|
fix(ops): persist host resource guardrails
|
2026-05-05 16:13:02 +08:00 |
|
Your Name
|
6b93c8f454
|
fix(chat): route OpenClaw chat through Ollama lane
CD Pipeline / tests (push) Successful in 5m26s
Code Review / ai-code-review (push) Successful in 25s
CD Pipeline / build-and-deploy (push) Successful in 8m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-05 15:57:26 +08:00 |
|
AWOOOI CD
|
3a17a860a0
|
chore(cd): deploy 1cc9de5 [skip ci]
|
2026-05-05 15:41:33 +08:00 |
|
Your Name
|
6ec5c06bad
|
docs(ops): record docker limit cleanup
|
2026-05-05 15:39:46 +08:00 |
|
Your Name
|
44d8322c4d
|
docs(ops): record live runner guardrail fix
|
2026-05-05 15:34:00 +08:00 |
|
Your Name
|
819734f655
|
docs(ops): record runner guardrail follow-up
|
2026-05-05 15:28:31 +08:00 |
|
Your Name
|
1cc9de5722
|
fix(ops): point runner guardrail alerts to host script
CD Pipeline / tests (push) Successful in 5m31s
Code Review / ai-code-review (push) Successful in 30s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 37s
CD Pipeline / build-and-deploy (push) Successful in 7m45s
CD Pipeline / post-deploy-checks (push) Successful in 5m4s
|
2026-05-05 15:25:37 +08:00 |
|
Your Name
|
96c1ba20da
|
fix(ci): cap host-runner helper containers
Code Review / ai-code-review (push) Successful in 27s
|
2026-05-05 15:09:44 +08:00 |
|
Your Name
|
855a39ad95
|
docs(ops): record docker limit alert deploy
|
2026-05-05 15:06:47 +08:00 |
|
Your Name
|
209da7ba33
|
chore(ops): deploy docker limit alert image
CD Pipeline / tests (push) Successful in 5m24s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-05 15:05:23 +08:00 |
|
Your Name
|
d08d1e4951
|
fix(ops): alert on missing docker resource limits
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Successful in 23s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 38s
|
2026-05-05 15:01:31 +08:00 |
|
Your Name
|
e24c8ea051
|
fix(ci): align B5 schema with tenant isolation
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 15:00:07 +08:00 |
|
Your Name
|
72d66e4ae6
|
fix(ops): align stale job cleanup thresholds
Code Review / ai-code-review (push) Successful in 28s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 36s
|
2026-05-05 14:54:17 +08:00 |
|
Your Name
|
5e625f777d
|
fix(ops): add stale gitea job cleanup guard
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:50:47 +08:00 |
|
Your Name
|
df72c77880
|
chore(ops): deploy stale gitea job alert image
CD Pipeline / tests (push) Successful in 5m29s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-05 14:43:53 +08:00 |
|
Your Name
|
7d45f0cb58
|
fix(ops): alert on stale gitea actions jobs
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Has been cancelled
|
2026-05-05 14:42:09 +08:00 |
|
Your Name
|
fc1a6196df
|
fix(code-review): keep Gemini fallback opt-in
CD Pipeline / tests (push) Successful in 2m2s
Code Review / ai-code-review (push) Successful in 27s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-05 14:38:44 +08:00 |
|
Your Name
|
3b73cc7f94
|
fix(ci): avoid cd on workflow-only changes
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:37:31 +08:00 |
|
Your Name
|
96b860dc2c
|
docs(ops): record ci stale-run guard
|
2026-05-05 14:35:24 +08:00 |
|
Your Name
|
2e128f90db
|
fix(ci): skip stale code review runs
Code Review / ai-code-review (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-05 14:35:09 +08:00 |
|
Your Name
|
228768ff68
|
docs(ops): record host baseline follow-up
|
2026-05-05 14:31:59 +08:00 |
|
Your Name
|
ab0f0a8a62
|
chore(ops): deploy runner classification image
CD Pipeline / tests (push) Successful in 2m35s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Successful in 26s
|
2026-05-05 14:29:55 +08:00 |
|
Your Name
|
0e14935351
|
fix(ops): classify systemd runner alerts as host resources
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:28:18 +08:00 |
|
Your Name
|
a5192d4e03
|
chore(ops): deploy runner alert routing image
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:21:17 +08:00 |
|
Your Name
|
34d1c76be9
|
fix(ops): route systemd runner baseline alerts
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 14:19:58 +08:00 |
|
Your Name
|
2b93975d37
|
chore(ops): deploy systemd runner baseline image
CD Pipeline / tests (push) Successful in 2m6s
Code Review / ai-code-review (push) Successful in 26s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-05 14:12:30 +08:00 |
|
Your Name
|
fe618960a8
|
fix(ops): monitor systemd runners in host baseline
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 39s
|
2026-05-05 14:08:43 +08:00 |
|
Your Name
|
8e22110030
|
fix(governance): keep trust drift watchdog on governance agent
CD Pipeline / tests (push) Successful in 2m51s
Code Review / ai-code-review (push) Successful in 24s
CD Pipeline / build-and-deploy (push) Has started running
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-05 14:00:13 +08:00 |
|
Your Name
|
2ff0ef3bb6
|
fix(openclaw): route legacy ollama through failover endpoints
CD Pipeline / tests (push) Failing after 1m49s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 24s
|
2026-05-05 13:55:52 +08:00 |
|
Your Name
|
bb1995f349
|
fix(awooop): use naive utc for run lease timestamps
CD Pipeline / tests (push) Failing after 1m48s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 13:53:07 +08:00 |
|
Your Name
|
e8e6748f70
|
fix(ops): add docker host resource baseline guardrails
CD Pipeline / tests (push) Failing after 1m50s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 25s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 38s
|
2026-05-05 13:45:09 +08:00 |
|
Your Name
|
a57e3d3d75
|
test(consensus): expect redis namespace dual write
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-05-05 13:41:41 +08:00 |
|
Your Name
|
b00a7b050a
|
test(ollama): align inference connect errors with degraded health
CD Pipeline / tests (push) Failing after 2m26s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 28s
|
2026-05-05 13:34:19 +08:00 |
|
Your Name
|
506744ba3a
|
test(ollama): keep slow gcp primary on ollama
CD Pipeline / tests (push) Failing after 2m21s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 26s
|
2026-05-05 13:29:27 +08:00 |
|
Your Name
|
869646459c
|
fix(ollama): treat legacy primary as ollama
CD Pipeline / tests (push) Failing after 1m48s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 28s
|
2026-05-05 13:25:27 +08:00 |
|
Your Name
|
33d4326cce
|
test(ollama): align slow recovery with gcp routing policy
CD Pipeline / tests (push) Failing after 1m51s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 33s
|
2026-05-05 13:21:16 +08:00 |
|
Your Name
|
b3d412f9eb
|
fix(cd): restore gitea workflow yaml parsing
CD Pipeline / tests (push) Failing after 2m20s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 31s
|
2026-05-05 13:17:15 +08:00 |
|
Your Name
|
f78b1b0690
|
fix(ollama): honor provider endpoint selection
Code Review / ai-code-review (push) Successful in 37s
|
2026-05-05 13:14:46 +08:00 |
|
Your Name
|
0ebd0d8a92
|
fix(deploy): 緊急部署 API 2e17325c — governance skip cooldown + watchdog B4
Code Review / ai-code-review (push) Successful in 54s
CI cancel-in-progress 導致 CD 未執行,手動更新 kustomization.yaml。
包含修復:
- governance_dispatcher skip 路徑 cooldown(消除 30s 重複處理)
- watchdog B4 A2/A3/W6 三層修復(消除 META SYSTEM 重複告警)
- Operator Console leWOOOgo 積木化修復(e22b8e7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-05 12:09:29 +08:00 |
|
Your Name
|
2e17325c3f
|
fix(ollama): 更新 failover_manager URL 註解反映 ADR-110 nginx proxy 拓撲
Code Review / ai-code-review (push) Successful in 43s
url_primary/secondary/tertiary 的 comment 還是舊版(ADR-110 前的 IP),
更新為 110:11435→GCP-A / 11436→GCP-B / 11437→Local111 nginx proxy 格式。
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-05 11:03:36 +08:00 |
|
Your Name
|
e22b8e7ab2
|
feat(awooop): Operator Console API + 前端(leWOOOgo 積木化修復)
Code Review / ai-code-review (push) Successful in 42s
後端:
- 新增 platform_operator_service.py(DB 存取集中 Service 層)
- Router 層移除 Depends(get_db),改呼叫 Service 函數
- tenants/contracts/operator_runs 三個 Router 符合 leWOOOgo 規範
- __init__.py 整合四個 platform router
前端:
- apps/web/src/app/[locale]/awooop/ 完整建立(7 個頁面)
- layout.tsx:四分頁導覽(tenants/contracts/runs/approvals)
- 全部使用 @/i18n/routing(Link/usePathname/useRouter)避免 i18n 路徑問題
- approvals page:10s 自動刷新、timeout 倒數、緊急紅色高亮
ADR-106/107/112/114/115/116
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-05 11:00:20 +08:00 |
|
Your Name
|
aa4ccec429
|
fix(watchdog): ADR-092 B4 — 三層修復消除 META SYSTEM 重複告警 + Ollama 路由強化
Code Review / ai-code-review (push) Successful in 7m16s
問題根因(debugger 全景徹查):
1. Prod 仍跑舊版代碼(ec013f66 後的修法未部署 → 告警字串仍含舊格式)
2. replicas=2 時 Pod 間 grace period 不共享 → violation_codes 分歧 → 不同 SHA256 → dedup 失效
3. 新 Pod 啟動立即執行 _check_once() → rollout 時多發一波
4. W6 violation_codes 含動態 low_count → count 微變繞過 dedup
修復(A2/A3/W6/C1/C2):
- A2:run_ai_slo_watchdog_loop 加 90s leading sleep,避免 rollout 立即觸發
- A3:_grace_active() 改為 Redis cluster-shared(watchdog:cluster_grace, ex=1800s, nx=True)
消除 Pod 間 grace period 不一致;Redis 故障時 fallback 為 process-local monotonic
- W6:violation_codes 移除動態 low_count,改為穩定 "W6:trust_drift"
- C1:ollama_auto_recovery.py recovered_host 改動態 label(依 URL port 判斷 GCP-A/B/Local)
- C2:ConfigMap OLLAMA_FALLBACK_URL 改走 110:11437 nginx proxy,三層容災統一架構
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-05 10:31:53 +08:00 |
|