OG T
190cfda65c
revert(k8s): 恢復 commonLabels (Deployment selector immutable)
...
還原到 commonLabels,因為:
1. Deployment selector 是 immutable,不能移除 environment/system labels
2. commonLabels 只影響 spec.podSelector,不影響 egress[].to[].podSelector
3. DNS 規則 (k8s-app=kube-dns) 不會被 commonLabels 破壞
DNS 問題的根因是之前的錯誤配置,NetworkPolicy YAML 已修復
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 19:56:41 +08:00
OG T
42bf6a8729
fix(k8s): 修復 NetworkPolicy DNS 被 kustomize commonLabels 破壞問題
...
根因: commonLabels 會自動加到 NetworkPolicy 的所有 selector,
導致 DNS egress 規則要求 CoreDNS 有 system/environment labels (它沒有)
修復: 改用 labels + includeSelectors=false,只加 metadata labels
不會影響 NetworkPolicy 的 podSelector/namespaceSelector
- 2026-03-27 (台北時間) DNS 解析失敗 RCA
- Telegram Bot 無法連線是因為這個問題
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 19:44:22 +08:00
OG T
0a9d94d82b
feat(k8s): CoreDNS GitOps 架構 (ADR-026)
...
問題: DNS 配置沒有版本控制,手動修改易遺失
架構:
- k8s/k3s-system/coredns-custom.yaml: HelmChartConfig
- CD workflow: k3s-system 路徑偵測 + 自動 apply
- ADR-026: CoreDNS GitOps 管控架構
DNS 上游:
- 使用 8.8.8.8 + 1.1.1.1
- 禁止 /etc/resolv.conf (systemd-resolved)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 18:43:28 +08:00
OG T
539f14bcd5
feat(api): Phase 13.2 #84 RAG Provider + Gemini 優先切換
...
1. 新增 RAGProvider MCP Tool Provider
- search_runbook: 語義搜尋維運手冊
- index_documents: 索引文檔
- get_index_stats: 取得索引統計
2. 更新 AI_FALLBACK_ORDER 為 Gemini 優先
- 臨時措施:Ollama CPU 推論緩慢導致 mock_fallback
- 預計 2026-03-27 切回 Ollama
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 18:21:24 +08:00
OG T
34bfa994c2
fix(k8s): NetworkPolicy DNS 規則修復
...
- 使用 namespaceSelector 明確指定 kube-system
- ADR-011 Appendix B: CoreDNS 只有 k8s-app=kube-dns 標籤
- 修復 Telegram 告警鏈 DNS 解析失敗問題
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 17:41:11 +08:00
OG T
df6ba33a1d
fix(k8s): NetworkPolicy 新增 Langfuse LLMOps 連線規則
...
Phase 15.1 必要: 允許 Pod 連接 Langfuse (192.168.0.110:3100)
變更:
- 新增 port 3100 (Langfuse HTTP API)
- 更新版本 v1.0 → v1.1
- 更新註解說明
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 01:01:20 +08:00
OG T
1ac8965a7a
feat(api): Phase 15.1 Langfuse LLMOps 整合 + 模型升級
...
## 新功能
- Langfuse 自建部署 (192.168.0.110:3100)
- langfuse_client.py - LLM 呼叫追蹤包裝
- OpenClaw 整合 Langfuse trace
## 模型升級 (統帥批准)
- 生產預設: llama3.2:3b → qwen2.5:7b-instruct
- 摘要任務: llama3.2:3b (速度優先)
## 配置更新
- requirements.txt: +langfuse>=2.0.0
- config.py: +LANGFUSE_* 設定
- models.json: 更新 Ollama 模型配置
- K8s: Secret + ConfigMap 更新
## 審查通過
- 模組化檢查 ✅
- 核心測試 31/31 ✅
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-26 00:32:19 +08:00
OG T
41bd213a8c
fix(nginx): Route /api/sentry-tunnel to Next.js frontend
...
Sentry Tunnel is a Next.js API Route, not FastAPI endpoint.
Must be handled by frontend server to avoid 404.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-25 00:05:51 +08:00
OG T
22cada563b
fix(config): Share Redis DB 0 with OpenClaw
...
- Change REDIS_URL from DB 10 to DB 0
- AWOOOI and OpenClaw now share the same Redis database
- Incidents created by OpenClaw visible in AWOOOI UI
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 18:44:34 +08:00
OG T
d08290b433
feat(k8s): Add Sentry and Harbor egress to NetworkPolicy ( #38 )
...
- Allow egress to 192.168.0.110:9000 (Sentry Self-Hosted)
- Allow egress to 192.168.0.110:5000 (Harbor Registry)
- Enables Sentry Tunnel API Route to forward errors
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 17:51:06 +08:00
OG T
580c38de94
fix(cd): Fix kustomize image replacement with full image names
...
The kustomize edit set image command requires the OLD_IMAGE to match
exactly what's in the deployment YAML files, including the tag.
Changes:
- Use full image name with :IMAGE_TAG_PLACEHOLDER suffix
- Update kustomization.yaml to match deployment YAML format
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-24 14:05:31 +08:00
OG T
2b1264df05
docs: 完整治理架構 ADR-010/011/012 + CLAUDE.md 鐵律更新
...
2026-03-23 重大事故修復與治理:
1. ADR-010: Secrets 集中管理 (Bitwarden + Sealed Secrets)
2. ADR-011: NetworkPolicy 變更治理 (偵測 + 告警 + 人工決策)
3. ADR-012: 危險操作治理 (Tier 分級 + CI/CD 攔截 + 審計)
4. UX-001: 告警疲勞解決方案 (時間衰減 + 智慧分組)
CLAUDE.md 更新:
- 新增最高優先級鐵律 (禁止 ClawBot、OpenClaw 核心、禁止危險 API)
- 新增任務開始前必讀 Memory 對照表
事故教訓:
- Telegram Token 連續三次被 logOut 失效
- AWOOOI API 程式碼呼叫 logOut 導致災難
- 已停用 AWOOOI API Telegram,OpenClaw 為唯一 Gateway
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 19:44:56 +08:00
OG T
7478dc0254
feat(phase6-9): Complete modular architecture and Agent Teams
...
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context
Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture
DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies
Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback
Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 18:40:36 +08:00
OG T
342a0f611a
feat(k8s): enable Signal Worker (Phase 8 go-live)
...
Enable Signal Worker to process Redis Streams signals
and trigger Incident Engine for alert aggregation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 01:08:46 +08:00
OG T
b00f318450
fix(api): correct OTEL gRPC endpoint format and SignOz query table
...
Root cause analysis:
1. OTEL gRPC endpoint had http:// prefix which is invalid for gRPC
2. SignOz query was targeting wrong table (signoz_metrics.distributed_samples_v4)
3. Should query signoz_traces.distributed_signoz_index_v2 for trace data
Fixes:
- Remove http:// prefix from OTEL_EXPORTER_OTLP_ENDPOINT (gRPC needs host:port)
- Update SignOz client to query traces table instead of metrics table
- Fix timestamp format (nanoseconds for DateTime64(9))
- statusCode: 0=Unset, 1=Ok, 2=Error
This should enable OTEL traces to reach SigNoz and GlobalPulse to show real metrics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:41:51 +08:00
OG T
21ce7056fa
fix(otel): correct OTEL endpoint to port 24317 and fix NetworkPolicy
...
- SigNoz OTEL Collector maps container:4317 to host:24317
- Updated NetworkPolicy egress to allow 24317/24318
- Updated ConfigMap with correct OTEL_EXPORTER_OTLP_ENDPOINT
- Fixed OpenClaw port from 8089 to 8088
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-23 00:06:07 +08:00
OG T
a2f7d128f3
fix: 域名正統化 - https://awoooi.wooo.work
...
- CORS 加入正式域名
- NEXT_PUBLIC_API_URL 設為 https://awoooi.wooo.work
- pydantic-settings WHITELIST 改用 property 避免 JSON 解析
- Nginx 已配置指向 K3s Worker (121)
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 23:28:36 +08:00
OG T
13200076aa
fix(ci): AIOPS 正統模式 - 直寫 Telegram Token + Worker 暫停
...
- Telegram 通知沿用 AIOPS 直寫 Token 寫法
- Worker replicas=0 暫停 (Phase 6.5 完善後啟用)
- 簡化 rollout 流程
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 20:05:02 +08:00
OG T
5156800217
fix(k8s): AI_FALLBACK_ORDER 也改用 JSON array 格式
...
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 19:37:51 +08:00
OG T
721cfd1e3b
fix(k8s): CORS_ORIGINS 使用 JSON array 格式
...
pydantic-settings 對 list[str] 欄位要求 JSON 格式
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 19:26:26 +08:00
OG T
d4fbdb0331
fix(k8s): correct image registry path to 192.168.0.110:5000
...
harbor.wooo.work TLS 證書問題,改用 IP 直連
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 19:17:58 +08:00
OG T
f037812f15
feat(phase8): CI/CD Pipeline 與 K8s 部署自動化
...
Phase 8 CI/CD 藍圖:
- GitHub Actions deploy-prod.yml (沿用 AIOPS 成熟模式)
- Signal Worker K8s Deployment
- Telegram Notify 閉環
- Bootstrap 自動化腳本
架構鐵律:
- Build: 110 金庫 (Harbor + Self-Hosted Runner)
- Deploy: 120 K3s Master
- 嚴禁 Docker Compose,K8s 唯一合法部署
Co-Authored-By: Claude <noreply@anthropic.com >
2026-03-22 18:01:01 +08:00