DualStateIncidentCard:
- Add proposalId prop for approval actions
- Add onApprovalChange callback for status updates
- Implement handleApprove() calling POST /api/v1/approvals/{id}/sign
- Implement handleReject() calling POST /api/v1/approvals/{id}/reject
- Add ButtonState management (idle/loading/approved/rejected/error)
- Loading spinner during API call
- Success state: green "已授權" / red "已拒絕"
- Error state: orange "錯誤" with auto-recovery
API Client:
- Fix endpoint mismatch: rename approveApproval to signApproval
- Use correct endpoint /sign instead of /approve
- Add signer parameter for multi-sig support
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove lewooogo-brain local dependency that breaks Docker context.
Inline Proposal/Guardrails definitions in proposals.py mock.
Phase 6.4i will address proper monorepo Docker packaging.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Worker used SQLITE_DATABASE_URL causing "no such table: incidents"
because each Pod had isolated SQLite file, not shared PostgreSQL.
Fixes:
- db/base.py: Use DATABASE_URL (PostgreSQL) instead of SQLITE_DATABASE_URL
- Added SQLite prohibition guard with logging
- Added pool_size and pool_pre_ping for production stability
New: packages/lewooogo-data PgMemoryProvider (Phase 6.4d)
- Episodic Memory implementation for PostgreSQL
- init_pg_engine() with auto table creation
- SQLite forbidden by Commander's decree
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Worker shared Redis pool with API (socket_timeout=5s),
but XREADGROUP blocks for 5s causing timeout errors every cycle.
Fix:
- Add init_worker_redis_pool() with socket_timeout=None
- Worker now uses get_worker_redis() for XREADGROUP operations
- API continues using get_redis() with short timeout
Also destroyed 50 zombie consumers via:
XGROUP DESTROY stream:awoooi_signals awoooi_workers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New packages:
- packages/lewooogo-brain: AI reasoning & decision engine
- IProposalEngine interface (ABC)
- IIncidentProcessor interface (ABC)
- Pydantic models: Proposal, Guardrails, Incident, Signal
- packages/lewooogo-data: Memory provider abstraction
- IMemoryProvider interface (ABC)
- IDualMemoryProvider for Working + Episodic memory
- Generic type support for flexible data models
Documentation:
- ADR-008: Python modular packages architecture decision
- ARCHITECTURE_MEMORY.md: Module map index for AI developers
- LOGBOOK.md: Updated milestones and Phase 6.4 status
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New sections in 05-awoooi-sre-qa.md:
- Worker CrashLoopBackOff diagnosis procedure
- Telegram alert system health check
- Frontend race condition diagnosis (Polling vs API)
- Import name mismatch detection pattern
Lessons learned from:
- 7+ hour outage due to undetected worker crash
- Approval card flicker due to Zustand polling race condition
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
signal_worker.py was importing non-existent init_redis/close_redis
Correct names are init_redis_pool/close_redis_pool
Root cause of:
- No Telegram alerts for 7+ hours
- No new approval cards
- No incident processing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Race condition between polling (5s interval) and sign/reject operations
caused cards to flicker and reappear after being approved.
Fix:
- Pause polling during sign/reject API calls
- Resume polling after 1 second delay to allow backend state sync
- Apply same pattern to both signApproval and rejectApproval
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The optimization_suggestions field is list[dict], not list[object].
Use .get() to access dict keys instead of attribute access.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable Signal Worker to process Redis Streams signals
and trigger Incident Engine for alert aggregation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The toDateTime64(nanoseconds, 9) caused Decimal overflow.
Switched to simpler `now() - INTERVAL X MINUTE` syntax.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SignOz trace data is stored in distributed_signoz_index_v3,
not v2. This fixes GlobalPulse showing all zeros.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause analysis:
1. OTEL gRPC endpoint had http:// prefix which is invalid for gRPC
2. SignOz query was targeting wrong table (signoz_metrics.distributed_samples_v4)
3. Should query signoz_traces.distributed_signoz_index_v2 for trace data
Fixes:
- Remove http:// prefix from OTEL_EXPORTER_OTLP_ENDPOINT (gRPC needs host:port)
- Update SignOz client to query traces table instead of metrics table
- Fix timestamp format (nanoseconds for DateTime64(9))
- statusCode: 0=Unset, 1=Ok, 2=Error
This should enable OTEL traces to reach SigNoz and GlobalPulse to show real metrics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace flat text format with structured HTML layout
- Add emoji section headers and visual separators
- Replace raw URLs with Inline Keyboard buttons
- Success: "查看部署紀錄" + "開啟正式站" buttons
- Failure: Only "查看部署紀錄" button
- Use JSON payload for proper Telegram API formatting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- SigNoz OTEL Collector maps container:4317 to host:24317
- Updated NetworkPolicy egress to allow 24317/24318
- Updated ConfigMap with correct OTEL_EXPORTER_OTLP_ENDPOINT
- Fixed OpenClaw port from 8089 to 8088
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pydantic v2 does not allow field names with leading underscores.
Changed from @property pattern to method pattern.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>