Commit Graph

49 Commits

Author SHA1 Message Date
OG T
962b1e75a5 refactor: Rename ClawBot → OpenClaw across documentation
- Update .awoooi-agent-rules.md (4 occurrences)
- Update docs/api/openapi.yaml (all schema references)
- Update apps/web/tailwind.config.ts (comment)
- Update apps/api/src/core/config.py (comment)

Legacy CLAWBOT_URL field kept for backward compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:05:53 +08:00
OG T
b0302329f4 fix(web): Pass decision prop to DualStateIncidentCard
Root cause: mapToDualState() was missing decision field,
causing Y/n buttons to be permanently disabled.

Now correctly passes incident.decision to the card component.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:54:42 +08:00
OG T
0aaf6a276b feat(api,web): Phase 6.5 DecisionManager with dual-engine fallback
Backend:
- Add DecisionManager with state machine (INIT→ANALYZING→READY→EXECUTING)
- Implement Expert System rules engine (100% local, never fails)
- Dual-engine: LLM (primary) + Expert System (fallback)
- Auto-generate decision_token for each incident
- 30-second timeout guarantee

Frontend:
- Use decision.state to unlock [Y/n] buttons
- Display AI action suggestion in card
- Show source indicator [AI] or [EXP]
- Generate proposal on-demand if needed

Fixes: UI locked with hourglass when LLM times out

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:19:55 +08:00
OG T
c01742ef82 fix(web): Phase 6.5c+ enhance [Y/n] tactile feedback & diagnostics
- Add active:scale-95 active:bg-neutral-800 for physical click feedback
- Add disabled:opacity-30 for clearer disabled state
- Add tooltip "大腦分析中..." when proposalId is missing
- Add comprehensive console.log diagnostics for authorization flow
- Add reason parameter "Authorized via WarRoom" for audit trail
- Implement optimistic UI with immediate loading state transition

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:07:10 +08:00
OG T
7db5108a1f feat(web): Phase 7.0 minimalist 5-pillar navigation
- Refactor sidebar to Nothing.tech visual compliance
- Add defensive route stubs for /authorizations, /knowledge-base, /settings
- Dynamic badge for pending approvals count
- Ultra-minimal borders (0.5px), no shadows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:02:21 +08:00
OG T
eee4ab9b36 feat(api): Phase 6.6 implement k8s execution engine with subprocess
ActionExecutor enhancements:
- Add execute_kubectl_command() using asyncio.create_subprocess_shell
- Security: Only kubectl commands allowed, forbidden patterns blocked
- Shadow Mode: Simulate execution without actual kubectl calls
- Capture stdout/stderr with PIPE, handle timeout gracefully

New execute_approved_proposal() function:
- Background task entry point for approved proposals
- Read approval from Redis/DB, verify status='approved'
- Extract kubectl_command from metadata
- Execute via execute_kubectl_command()
- Update status to 'executed' or 'failed' with execution_log

Security guardrails:
- Forbid delete namespace/ns, rm -rf, drop database
- Forbid batch deletion patterns
- 60 second default timeout

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:46:47 +08:00
OG T
28fa8e6af4 feat(web): Phase 6.5c implement [Y/n] execution wiring
DualStateIncidentCard:
- Add proposalId prop for approval actions
- Add onApprovalChange callback for status updates
- Implement handleApprove() calling POST /api/v1/approvals/{id}/sign
- Implement handleReject() calling POST /api/v1/approvals/{id}/reject
- Add ButtonState management (idle/loading/approved/rejected/error)
- Loading spinner during API call
- Success state: green "已授權" / red "已拒絕"
- Error state: orange "錯誤" with auto-recovery

API Client:
- Fix endpoint mismatch: rename approveApproval to signApproval
- Use correct endpoint /sign instead of /approve
- Add signer parameter for multi-sig support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:37:56 +08:00
OG T
a769738499 feat(api): Phase 6.4h replace mock DI with real ProposalService
- Remove MockEngine and embedded Proposal/Guardrails classes
- Import real ProposalService with OpenClaw LLM integration
- Use get_real_proposal_service() for dependency injection
- ProposalService integrates:
  - OpenClaw LLM (Ollama → Gemini → Claude fallback)
  - Redis Working Memory
  - PostgreSQL Episodic Memory
  - TrustEngine risk assessment
- Add llm_provider, llm_confidence, kubectl_command to response
- Map ApprovalRiskLevel to Tier (LOW=1, MEDIUM=2, CRITICAL=3)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:25:39 +08:00
OG T
be8ed1f7ba fix(web): resolve interface mismatch + add defensive null checks
- P0/P1/P2 now map to 'alert' status (was P0/P1 only)
- Tier mapping: P0=Tier3, P1=Tier2, P2=Tier1
- Added null/undefined guards in mapToDualState()
- Optional chaining on incidents array access
- Safe fallback for missing serviceName, message, timestamp

Fixes frontend warroom showing no cards despite API returning data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:17:58 +08:00
OG T
a825aa9634 fix(ci): exclude secrets.yaml from kubectl apply loop
Prevents CI/CD from overwriting manually patched K8s secrets.
Secrets should be managed separately (GitHub Secrets / sealed-secrets).

Root cause: 03-secrets.yaml contains CHANGE_ME placeholders,
causing pods to crash with "password authentication failed".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:16:27 +08:00
OG T
0aa80c1d32 fix(docker): embed mock types for Docker build compatibility
Remove lewooogo-brain local dependency that breaks Docker context.
Inline Proposal/Guardrails definitions in proposals.py mock.

Phase 6.4i will address proper monorepo Docker packaging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:01:20 +08:00
OG T
cb5d0ecfe4 feat(phase-6.4g-6.5b): API Synaptic Integration + Dual-State WarRoom UI
Phase 6.4g (API 突觸對接):
- lewooogo-brain dependency binding in apps/api/pyproject.toml
- POST /api/v1/incidents/{id}/propose route (proposals.py)
- Guardrails integration (8/8 tests passed)

Phase 6.5a (視覺皮層建置):
- DualStateIncidentCard.tsx with Nothing.tech visual compliance
- Ping radar animation for alert state
- Tier-based decision layer UI (AI 執行中 / 等待親核)

Phase 6.5b (神經網路串接):
- Main warroom page integration (page.tsx)
- IncidentResponse → DualState mapper function
- Empty state: "系統穩定。0 活躍異常。"

Tests:
- test_guardrails.py (8/8)
- test_incident_engine.py (6/6)
- test_skill_loader.py (6/6)
- Frontend build: 0 errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 11:58:28 +08:00
OG T
8eaf2acb0d docs(skills): add guardrails and dry-run principles
- Skill 03: Add proposal guardrails (forbidden commands, namespace binding)
- Skill 04: Add idempotency and garbage collection awareness
- Skill 05: Add dry-run first principle for destructive operations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:59:20 +08:00
OG T
c4dae39dfa docs(skills): add 2026-03-23 production incident learnings
New sections added:

01-frontend-aesthetics:
- Polling + Operation Race Condition pattern

02-lewooogo-backend-core:
- Worker Redis dedicated connection pool (socket_timeout=None)
- SQLite prohibition decree
- Function rename global search requirement

04-awoooi-devops-commander:
- NetworkPolicy Pod Selector (system label)
- Zombie consumer group cleanup
- PostgreSQL initialization checklist

05-awoooi-sre-qa (updated earlier):
- CrashLoopBackOff diagnosis
- Telegram health check
- Frontend race condition diagnosis

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:46:14 +08:00
OG T
1576f2ab20 fix(db): eliminate SQLite brain-split, force PostgreSQL
Root cause: Worker used SQLITE_DATABASE_URL causing "no such table: incidents"
because each Pod had isolated SQLite file, not shared PostgreSQL.

Fixes:
- db/base.py: Use DATABASE_URL (PostgreSQL) instead of SQLITE_DATABASE_URL
- Added SQLite prohibition guard with logging
- Added pool_size and pool_pre_ping for production stability

New: packages/lewooogo-data PgMemoryProvider (Phase 6.4d)
- Episodic Memory implementation for PostgreSQL
- init_pg_engine() with auto table creation
- SQLite forbidden by Commander's decree

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:02:43 +08:00
OG T
9f353343c9 fix(worker): dedicated Redis pool with unlimited timeout for XREADGROUP
Root cause: Worker shared Redis pool with API (socket_timeout=5s),
but XREADGROUP blocks for 5s causing timeout errors every cycle.

Fix:
- Add init_worker_redis_pool() with socket_timeout=None
- Worker now uses get_worker_redis() for XREADGROUP operations
- API continues using get_redis() with short timeout

Also destroyed 50 zombie consumers via:
  XGROUP DESTROY stream:awoooi_signals awoooi_workers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:42:11 +08:00
OG T
80d0ef4a8f feat(packages): Phase 6.4a-c leWOOOgo modular architecture
New packages:
- packages/lewooogo-brain: AI reasoning & decision engine
  - IProposalEngine interface (ABC)
  - IIncidentProcessor interface (ABC)
  - Pydantic models: Proposal, Guardrails, Incident, Signal

- packages/lewooogo-data: Memory provider abstraction
  - IMemoryProvider interface (ABC)
  - IDualMemoryProvider for Working + Episodic memory
  - Generic type support for flexible data models

Documentation:
- ADR-008: Python modular packages architecture decision
- ARCHITECTURE_MEMORY.md: Module map index for AI developers
- LOGBOOK.md: Updated milestones and Phase 6.4 status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:32:07 +08:00
OG T
d050dd1ecc docs(skills): add production debugging patterns from 2026-03-23 incidents
New sections in 05-awoooi-sre-qa.md:
- Worker CrashLoopBackOff diagnosis procedure
- Telegram alert system health check
- Frontend race condition diagnosis (Polling vs API)
- Import name mismatch detection pattern

Lessons learned from:
- 7+ hour outage due to undetected worker crash
- Approval card flicker due to Zustand polling race condition

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:21:13 +08:00
OG T
6d7486634b fix(worker): correct redis function names causing CrashLoopBackOff
signal_worker.py was importing non-existent init_redis/close_redis
Correct names are init_redis_pool/close_redis_pool

Root cause of:
- No Telegram alerts for 7+ hours
- No new approval cards
- No incident processing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:19:01 +08:00
OG T
68f4cf51b6 fix(web): resolve approval card race condition with polling
Race condition between polling (5s interval) and sign/reject operations
caused cards to flicker and reappear after being approved.

Fix:
- Pause polling during sign/reject API calls
- Resume polling after 1 second delay to allow backend state sync
- Apply same pattern to both signApproval and rejectApproval

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:13:35 +08:00
OG T
de5796522f fix(api): fix optimization_suggestions dict access in proposal generation
The optimization_suggestions field is list[dict], not list[object].
Use .get() to access dict keys instead of attribute access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 01:40:00 +08:00
OG T
141df533cc feat(api): Phase 6.4 LLM-based proposal generation with cache
- Add _call_with_cache wrapper in OpenClaw (Redis-based LLM cache)
- Add generate_incident_proposal method for incident analysis
- Integrate ProposalService with OpenClaw LLM
- Fallback to template-based proposals if LLM fails
- Include LLM metadata (provider, confidence, cache status) in proposals

憲法條款: 必須使用快取保護算力資源,嚴禁無快取裸奔調用

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 01:33:46 +08:00
OG T
1c66a05335 feat(qa): add Playwright frontend visual verification script
- Create apps/web/scripts/verify-frontend.js (無頭偵察兵)
- Detects Console errors (zero-error policy)
- Verifies DOM content (INC-*, RPS, Latency)
- Takes full-page screenshot for evidence
- Update SRE Skill with mandatory verification step

Constitutional Law 14/15: No curl-only verification allowed!

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 01:19:08 +08:00
OG T
342a0f611a feat(k8s): enable Signal Worker (Phase 8 go-live)
Enable Signal Worker to process Redis Streams signals
and trigger Incident Engine for alert aggregation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 01:08:46 +08:00
OG T
e4bd030882 fix(api): use INTERVAL syntax to avoid ClickHouse Decimal overflow
The toDateTime64(nanoseconds, 9) caused Decimal overflow.
Switched to simpler `now() - INTERVAL X MINUTE` syntax.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:57:45 +08:00
OG T
7f2adab78b fix(api): query correct SigNoz traces table (v3 not v2)
The SignOz trace data is stored in distributed_signoz_index_v3,
not v2. This fixes GlobalPulse showing all zeros.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:50:15 +08:00
OG T
b00f318450 fix(api): correct OTEL gRPC endpoint format and SignOz query table
Root cause analysis:
1. OTEL gRPC endpoint had http:// prefix which is invalid for gRPC
2. SignOz query was targeting wrong table (signoz_metrics.distributed_samples_v4)
3. Should query signoz_traces.distributed_signoz_index_v2 for trace data

Fixes:
- Remove http:// prefix from OTEL_EXPORTER_OTLP_ENDPOINT (gRPC needs host:port)
- Update SignOz client to query traces table instead of metrics table
- Fix timestamp format (nanoseconds for DateTime64(9))
- statusCode: 0=Unset, 1=Ok, 2=Error

This should enable OTEL traces to reach SigNoz and GlobalPulse to show real metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:41:51 +08:00
OG T
fea6524f35 feat(ci): upgrade Telegram notification UX with HTML + Inline Keyboard
- Replace flat text format with structured HTML layout
- Add emoji section headers and visual separators
- Replace raw URLs with Inline Keyboard buttons
- Success: "查看部署紀錄" + "開啟正式站" buttons
- Failure: Only "查看部署紀錄" button
- Use JSON payload for proper Telegram API formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:37:26 +08:00
OG T
7c1480186f docs: update LOGBOOK with Phase 8 fixes and Claude Skills
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:21:17 +08:00
OG T
c66b4dfb22 feat(agents): implement 6 core Claude skills for auto-pilot validation
Skills created:
- 01-awoooi-frontend-aesthetics.md: Nothing.tech visual standards + i18n
- 02-lewooogo-backend-core.md: FastAPI four iron laws + OTEL
- 03-openclaw-cognitive-expert.md: Incident Engine + Multi-Sig + Redis
- 04-awoooi-devops-commander.md: K3s + Docker + Tier 1-3 authorization
- 05-awoooi-sre-qa.md: Playwright + automated QA (no human testing)
- 06-awoooi-monorepo-master.md: Git + dependencies + LOGBOOK updates

Added skill routing to .awoooi-agent-rules.md Session startup flow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:20:51 +08:00
OG T
21ce7056fa fix(otel): correct OTEL endpoint to port 24317 and fix NetworkPolicy
- SigNoz OTEL Collector maps container:4317 to host:24317
- Updated NetworkPolicy egress to allow 24317/24318
- Updated ConfigMap with correct OTEL_EXPORTER_OTLP_ENDPOINT
- Fixed OpenClaw port from 8089 to 8088

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 00:06:07 +08:00
OG T
551a305fcf fix(config): rename _OPENCLAW_TG_USER_WHITELIST_RAW to comply with pydantic v2
Pydantic v2 does not allow field names with leading underscores.
Changed from @property pattern to method pattern.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-22 23:40:09 +08:00
OG T
a2f7d128f3 fix: 域名正統化 - https://awoooi.wooo.work
- CORS 加入正式域名
- NEXT_PUBLIC_API_URL 設為 https://awoooi.wooo.work
- pydantic-settings WHITELIST 改用 property 避免 JSON 解析
- Nginx 已配置指向 K3s Worker (121)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 23:28:36 +08:00
OG T
c2b33a99a3 fix(config): 避免 pydantic-settings 自動 JSON 解析 WHITELIST
使用 str + property 取代 list[int] + validator
解決 K8s Secret 注入時的解析錯誤

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 23:18:50 +08:00
OG T
13200076aa fix(ci): AIOPS 正統模式 - 直寫 Telegram Token + Worker 暫停
- Telegram 通知沿用 AIOPS 直寫 Token 寫法
- Worker replicas=0 暫停 (Phase 6.5 完善後啟用)
- 簡化 rollout 流程

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 20:05:02 +08:00
OG T
0c80f6a996 fix(worker): add standalone entry point for K8s deployment
- 新增 __main__ 入口點
- 寫入 health files for K8s probes
- Graceful shutdown 處理

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:56:15 +08:00
OG T
5156800217 fix(k8s): AI_FALLBACK_ORDER 也改用 JSON array 格式
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:37:51 +08:00
OG T
721cfd1e3b fix(k8s): CORS_ORIGINS 使用 JSON array 格式
pydantic-settings 對 list[str] 欄位要求 JSON 格式

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:26:26 +08:00
OG T
d4fbdb0331 fix(k8s): correct image registry path to 192.168.0.110:5000
harbor.wooo.work TLS 證書問題,改用 IP 直連

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:17:58 +08:00
OG T
241e105d72 fix(ci): exclude kustomization.yaml from kubectl apply
kustomization.yaml 是給 -k 用的,不能直接 apply

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:09:31 +08:00
OG T
cc9c9366e7 fix(web): skip ESLint/TypeScript during Docker build
CI/CD 分離策略:
- ESLint 在獨立 lint job 執行
- TypeScript 在獨立 type-check job 執行
- Build 時跳過以加速 Docker 建置

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 19:06:29 +08:00
OG T
c6a36ab673 fix: simplify Telegram notification format 2026-03-22 19:04:50 +08:00
OG T
d96bd03128 fix: add root monorepo config files (pnpm-workspace.yaml) 2026-03-22 19:00:45 +08:00
OG T
b5d4b50c52 fix: use IP for Harbor registry (avoid TLS cert issue) 2026-03-22 18:59:02 +08:00
OG T
196d269b92 feat: add all application source code
- apps/api: FastAPI backend with Dockerfile
- apps/web: Next.js frontend with Dockerfile
- apps/sensor: Signal collection agent
- packages: shared packages

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 18:57:44 +08:00
OG T
a840bf975b Revert "ci: temp switch to ubuntu-latest for initial test"
This reverts commit 4bf0422363.
2026-03-22 18:37:41 +08:00
OG T
4bf0422363 ci: temp switch to ubuntu-latest for initial test 2026-03-22 18:36:47 +08:00
OG T
f037812f15 feat(phase8): CI/CD Pipeline 與 K8s 部署自動化
Phase 8 CI/CD 藍圖:
- GitHub Actions deploy-prod.yml (沿用 AIOPS 成熟模式)
- Signal Worker K8s Deployment
- Telegram Notify 閉環
- Bootstrap 自動化腳本

架構鐵律:
- Build: 110 金庫 (Harbor + Self-Hosted Runner)
- Deploy: 120 K3s Master
- 嚴禁 Docker Compose,K8s 唯一合法部署

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 18:01:01 +08:00
OG T
ccdf757edd chore: initial commit for AWOOOI project
Phase 0 Day 1 - Project initialization:
- Independent repository (Option A)
- .awoooi-agent-rules.md (AI development contract)
- Project skeleton (apps/web, apps/api, packages, docs)
- ADR template for architecture decisions
- LOGBOOK for progress tracking

Strategic decision: 2026-03-19 Operation Cyber-Shell
Reference: /wooo-aiops/docs/meetings/2026-03-19_FRONTEND_RESTRUCTURE_STRATEGY.md

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-03-19 19:16:12 +08:00