Commit Graph

3678 Commits

Author SHA1 Message Date
OG T
6f049877fc fix(lint): ruff auto-fix + lewooogo-core src 加入 git
- Python: ruff --fix 修復 280 個 lint 錯誤
- lewooogo-core: src/ 目錄未追蹤,導致 CI eslint 失敗

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:51:37 +08:00
OG T
f78aab8b2a fix(api): DecisionToken 狀態同步 (Y/n 持久化修復)
根本原因:
- resolve_incident_after_approval 只更新 Incident.decision.state
- 沒有更新獨立儲存的 DecisionToken (decision:{token} key)
- 導致下次 poll 時 get_or_create_decision 返回 READY 狀態的舊 token
- 前端繼續顯示 Y/n 按鈕

修復:
- 在 resolve_incident_after_approval 中同時更新 DecisionToken 狀態為 COMPLETED
- 確保整個決策鏈路狀態一致

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:46:21 +08:00
OG T
8542632cff fix(ci): Harbor HTTP registry + Telegram secrets
CD 修復:
- 修復 buildx HTTP vs HTTPS 問題 (insecure registry 設定)
- 移除 UAT 環境 (違反 Memory 鐵律)
- 新增 Production 部署 Telegram 通知
- 修復 deploy-prod.yml 硬編碼 Token (改用 secrets)

docs:
- 新增 guidelines/ 結構化指引目錄
- ARCHITECTURE.md, FRONTEND.md, OPERATIONS.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:40:40 +08:00
OG T
00d94ca71c docs: CLAUDE.md 引用 HARD_RULES.md (禁止爆滿)
結構:
- CLAUDE.md: 精簡索引,只放引用連結
- docs/HARD_RULES.md: 詳細規則

這是早就溝通好的做法,不應該忘記。

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:32:35 +08:00
OG T
dc30c70e57 docs(CLAUDE.md): 新增絕對禁止規則 (Hard Rules)
問題:
- Memory 有記錄但沒有實際遵守
- CI workflow 被改成 ubuntu-latest 違反 Memory 鐵律
- 長期記憶形同虛設

修復:
- 直接在 CLAUDE.md 寫死禁止項目
- 新增修改前檢查清單
- 這些規則會在每次 Session 自動載入

禁止項目:
- runs-on: ubuntu-latest → self-hosted
- Telegram logOut() → 禁止
- 前端硬編碼 → next-intl
- SQLite → PostgreSQL
- CORS * → 白名單
- 假數據 → 真實 API

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:31:04 +08:00
OG T
fc995be6e3 fix(ci): 改用 self-hosted runner (GitHub 帳單問題)
問題:
- CI workflow 不知何時被改成 ubuntu-latest
- 導致 GitHub Actions 因帳單問題失敗

修復:
- 全部改回 self-hosted (awoooi-110)

鐵律:
- Memory 記錄: feedback_github_billing.md
- 禁止使用 GitHub 雲端 Runner

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:29:38 +08:00
OG T
7d8eb26ebe feat(telegram): 新增心跳監控防止沉默盲點
功能:
- send_heartbeat(): 每 30 分鐘發送系統狀態
- start_heartbeat_monitor(): 背景心跳監控
- 沉默告警: 超過 2 小時沒訊息自動告警

目的:
- 避免 Telegram 長時間沒訊息被當成「系統穩定」
- 主動驗證告警鏈路是否正常運作

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:26:08 +08:00
OG T
eca3759fde fix(telegram): 修復 Signal Worker 流程 Telegram 通知斷鏈
問題:
- Phase 6 Signal Worker 新架構沒有整合 Telegram 推送
- 決策就緒時 Telegram 完全沒收到通知
- 這是嚴重的監控盲點!

修復:
- 新增 _push_decision_to_telegram() 推送函數
- DecisionManager 決策 READY 時自動推送
- 非阻塞執行 (asyncio.create_task)

Telegram 通知內容:
- 告警來源 (LLM/Expert System)
- 受影響服務
- 建議動作
- 風險等級
- 信心分數

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:22:26 +08:00
OG T
29ceb786ca feat(web): 全局戰情室顯示真實 AI 決策鏈
問題:
- ThinkingTerminal 使用 DEMO_DECISION_CHAIN 假數據
- 用戶無法看到 OpenClaw AI 的真實推理過程

修復:
- 新增 convertToDecisionChain() 轉換 API 格式
- 從 incident.decision.proposal_data 提取真實 AI 資料
- 顯示: 決策引擎來源、推理過程、建議動作、信心分數

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:17:52 +08:00
OG T
bd1f94dd72 fix(worker): 初始化 PostgreSQL 連線池 - 修復 Incident DB 持久化
問題:
- Signal Worker 沒有初始化 PostgreSQL,導致 incidents 表可能不存在
- Incident 只寫入 Redis,未持久化到 PostgreSQL
- 審核後無法正確更新 DB 狀態

修復:
- 在 Signal Worker 啟動時呼叫 init_db() 建立表
- 在關閉時呼叫 close_db() 釋放連線池
- 增加 PostgreSQL 初始化日誌

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:13:49 +08:00
OG T
c8558cda9e fix(api): resolve 時 DB 記錄不存在視為成功
根因: Incident 可能因 DB 寫入失敗只存在於 Redis
修復: 只要 Redis 更新成功就算成功 (API 只讀 Redis)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 23:09:46 +08:00
OG T
d60cb54c08 fix(api): resolve_incident_after_approval 使用直接更新邏輯
原因: 透過 _persist_incident 間接更新失敗
修復: 改用直接 Redis + DB 更新 (與 debug endpoint 相同邏輯)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 22:31:18 +08:00
OG T
8ef8347f2c chore: debug endpoint 更詳細的錯誤追蹤 2026-03-23 22:23:24 +08:00
OG T
58f3339561 chore(api): 新增 debug endpoint 測試 incident resolve
臨時測試端點,用於驗證 resolve_incident_after_approval 邏輯

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 22:15:16 +08:00
OG T
03ca124967 fix(api): _persist_incident 新增顯式 commit + 追蹤日誌
根因: DB 變更未被 commit,導致 Incident 狀態更新不持久化

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 22:02:00 +08:00
OG T
65fa1168b8 feat(api): ApprovalRequestResponse 新增 metadata 欄位
讓前端/API 可見 incident_id,用於除錯和關聯追蹤

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 21:51:05 +08:00
OG T
ac3bf97920 fix(api): 簽核後更新 Incident 狀態為 RESOLVED
根因: 簽核成功後 Incident.status 未更新,導致刷新頁面後 Y/n 按鈕重現

修復:
- proposal_service.py: 新增 resolve_incident_after_approval() 方法
- approvals.py: sign_approval 成功後呼叫更新 Incident 狀態
- 使用 metadata.incident_id 反查關聯的 Incident

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 21:37:50 +08:00
OG T
7db42ffdac fix(web): Y按鈕簽核回應解析錯誤 - result.status → result.approval.status
根因: API 回傳 {approval: {status: 'approved'}} 但前端誤檢查 result.status

修復:
- dual-state-incident-card.tsx: 正確解析 result.approval.status
- api-client.ts: 更新回傳型別與後端對齊

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 21:20:41 +08:00
OG T
4c41a6728f fix(web): Fix API contract mismatch for sign/reject endpoints
- signApproval: send signer_id, signer_name, comment (not signer, reason)
- rejectApproval: send rejector_id, rejector_name, reason

Fixes 422 Unprocessable Entity on Y/n button click

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 21:04:44 +08:00
OG T
2b1264df05 docs: 完整治理架構 ADR-010/011/012 + CLAUDE.md 鐵律更新
2026-03-23 重大事故修復與治理:

1. ADR-010: Secrets 集中管理 (Bitwarden + Sealed Secrets)
2. ADR-011: NetworkPolicy 變更治理 (偵測 + 告警 + 人工決策)
3. ADR-012: 危險操作治理 (Tier 分級 + CI/CD 攔截 + 審計)
4. UX-001: 告警疲勞解決方案 (時間衰減 + 智慧分組)

CLAUDE.md 更新:
- 新增最高優先級鐵律 (禁止 ClawBot、OpenClaw 核心、禁止危險 API)
- 新增任務開始前必讀 Memory 對照表

事故教訓:
- Telegram Token 連續三次被 logOut 失效
- AWOOOI API 程式碼呼叫 logOut 導致災難
- 已停用 AWOOOI API Telegram,OpenClaw 為唯一 Gateway

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 19:44:56 +08:00
OG T
e23493741a fix(telegram): respect C-Suite decision - OpenClaw is sole brain
架構修正 2026-03-23 (遵循 C-Suite 決議):
- 鐵律: .188 為唯一大腦,禁止腦分裂
- OpenClaw (192.168.0.188) = 唯一 Telegram Gateway
- AWOOOI API (K8s) = Web API + Sensor,不做 Polling
- TELEGRAM_ENABLE_POLLING 預設 False

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 19:25:08 +08:00
OG T
3e730f16d4 fix(ci): Add Docker login step for Harbor authentication
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:53:23 +08:00
OG T
2aef693c0d fix(ci): Use monorepo root as Docker build context for API
Phase 6.4i requires the API Dockerfile to copy local packages
(lewooogo-brain, lewooogo-data) from the packages/ directory.
Changed build context from 'apps/api' to '.' (root) to allow
the Dockerfile to access the entire monorepo structure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:43:07 +08:00
OG T
7478dc0254 feat(phase6-9): Complete modular architecture and Agent Teams
Phase 6.4 - Modular Architecture:
- Add lewooogo-brain adapters for LLM providers
- Add lewooogo-data dual memory (Redis + PostgreSQL)
- Implement consensus engine for multi-agent decisions
- Add incident memory service for historical context

Phase 9 - Agent Teams (Claude Agent SDK):
- Add base agent class with Claude Sonnet 4 integration
- Implement action planner, blast radius, and security agents
- Add agent API endpoints and proposal workflow
- Integrate ADR-009 OpenClaw Agent Teams architecture

DevOps & CI/CD:
- Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml)
- Add pre-commit hooks and secrets baseline
- Add docker-compose for local development
- Update Kubernetes network policies

Frontend Improvements:
- Add auto-healing error boundary component
- Update i18n messages for agent features
- Enhance dual-state incident card with execution feedback

Documentation:
- Add 7 ADRs covering MCP, design system, architecture decisions
- Update ARCHITECTURE_MEMORY.md with modular design
- Add GLOBAL_RULES.md and SOUL.md for project identity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 18:40:36 +08:00
OG T
6eccb45757 fix(api): Use in-cluster K8s config for executor in K8s pods
- Try load_incluster_config() first (for pods running in K8s)
- Fallback to kubeconfig file (for local development)
- Fixes "K8s connection not available" error in production

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:45:58 +08:00
OG T
182410a995 docs(skills): Add Chinese action parsing lesson to Skill 03
Record the 2026-03-23 incident where Y button failed because
LLM generated Chinese action strings that couldn't be parsed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:28:40 +08:00
OG T
3de8a7701d feat(web): Phase 6.5c UX improvements for Y/n execution feedback
- Show actual error message on screen (not just hover tooltip)
- Add retry button after error/timeout
- Add 30-second timeout warning with "超時" state
- Remove auto-dismiss of error (let user see and retry)
- Truncate long error messages with full text in tooltip

Fixes P0 UX issue: Users can now see what went wrong

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:28:35 +08:00
OG T
d1fb3aa010 fix(api): Expand Chinese action parsing for K8s executor
- Add 擴展...副本數 pattern (scale variant)
- Add 重新啟動 without 服務 suffix
- Auto-detect StatefulSet Pod names (xxx-N) for DELETE_POD
- Strip -deployment suffix from resource names

Fixes Y button execution failure when LLM generates Chinese actions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:15:49 +08:00
OG T
41127f1e8b docs: Add CLAUDE.md for Claude Code auto-load configuration
- Skills index and routing table
- Core rules (simplified)
- Props mapping lesson from Y/n button incident

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:05:58 +08:00
OG T
962b1e75a5 refactor: Rename ClawBot → OpenClaw across documentation
- Update .awoooi-agent-rules.md (4 occurrences)
- Update docs/api/openapi.yaml (all schema references)
- Update apps/web/tailwind.config.ts (comment)
- Update apps/api/src/core/config.py (comment)

Legacy CLAWBOT_URL field kept for backward compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 14:05:53 +08:00
OG T
b0302329f4 fix(web): Pass decision prop to DualStateIncidentCard
Root cause: mapToDualState() was missing decision field,
causing Y/n buttons to be permanently disabled.

Now correctly passes incident.decision to the card component.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:54:42 +08:00
OG T
0aaf6a276b feat(api,web): Phase 6.5 DecisionManager with dual-engine fallback
Backend:
- Add DecisionManager with state machine (INIT→ANALYZING→READY→EXECUTING)
- Implement Expert System rules engine (100% local, never fails)
- Dual-engine: LLM (primary) + Expert System (fallback)
- Auto-generate decision_token for each incident
- 30-second timeout guarantee

Frontend:
- Use decision.state to unlock [Y/n] buttons
- Display AI action suggestion in card
- Show source indicator [AI] or [EXP]
- Generate proposal on-demand if needed

Fixes: UI locked with hourglass when LLM times out

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:19:55 +08:00
OG T
c01742ef82 fix(web): Phase 6.5c+ enhance [Y/n] tactile feedback & diagnostics
- Add active:scale-95 active:bg-neutral-800 for physical click feedback
- Add disabled:opacity-30 for clearer disabled state
- Add tooltip "大腦分析中..." when proposalId is missing
- Add comprehensive console.log diagnostics for authorization flow
- Add reason parameter "Authorized via WarRoom" for audit trail
- Implement optimistic UI with immediate loading state transition

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:07:10 +08:00
OG T
7db5108a1f feat(web): Phase 7.0 minimalist 5-pillar navigation
- Refactor sidebar to Nothing.tech visual compliance
- Add defensive route stubs for /authorizations, /knowledge-base, /settings
- Dynamic badge for pending approvals count
- Ultra-minimal borders (0.5px), no shadows

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 13:02:21 +08:00
OG T
eee4ab9b36 feat(api): Phase 6.6 implement k8s execution engine with subprocess
ActionExecutor enhancements:
- Add execute_kubectl_command() using asyncio.create_subprocess_shell
- Security: Only kubectl commands allowed, forbidden patterns blocked
- Shadow Mode: Simulate execution without actual kubectl calls
- Capture stdout/stderr with PIPE, handle timeout gracefully

New execute_approved_proposal() function:
- Background task entry point for approved proposals
- Read approval from Redis/DB, verify status='approved'
- Extract kubectl_command from metadata
- Execute via execute_kubectl_command()
- Update status to 'executed' or 'failed' with execution_log

Security guardrails:
- Forbid delete namespace/ns, rm -rf, drop database
- Forbid batch deletion patterns
- 60 second default timeout

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:46:47 +08:00
OG T
28fa8e6af4 feat(web): Phase 6.5c implement [Y/n] execution wiring
DualStateIncidentCard:
- Add proposalId prop for approval actions
- Add onApprovalChange callback for status updates
- Implement handleApprove() calling POST /api/v1/approvals/{id}/sign
- Implement handleReject() calling POST /api/v1/approvals/{id}/reject
- Add ButtonState management (idle/loading/approved/rejected/error)
- Loading spinner during API call
- Success state: green "已授權" / red "已拒絕"
- Error state: orange "錯誤" with auto-recovery

API Client:
- Fix endpoint mismatch: rename approveApproval to signApproval
- Use correct endpoint /sign instead of /approve
- Add signer parameter for multi-sig support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:37:56 +08:00
OG T
a769738499 feat(api): Phase 6.4h replace mock DI with real ProposalService
- Remove MockEngine and embedded Proposal/Guardrails classes
- Import real ProposalService with OpenClaw LLM integration
- Use get_real_proposal_service() for dependency injection
- ProposalService integrates:
  - OpenClaw LLM (Ollama → Gemini → Claude fallback)
  - Redis Working Memory
  - PostgreSQL Episodic Memory
  - TrustEngine risk assessment
- Add llm_provider, llm_confidence, kubectl_command to response
- Map ApprovalRiskLevel to Tier (LOW=1, MEDIUM=2, CRITICAL=3)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:25:39 +08:00
OG T
be8ed1f7ba fix(web): resolve interface mismatch + add defensive null checks
- P0/P1/P2 now map to 'alert' status (was P0/P1 only)
- Tier mapping: P0=Tier3, P1=Tier2, P2=Tier1
- Added null/undefined guards in mapToDualState()
- Optional chaining on incidents array access
- Safe fallback for missing serviceName, message, timestamp

Fixes frontend warroom showing no cards despite API returning data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:17:58 +08:00
OG T
a825aa9634 fix(ci): exclude secrets.yaml from kubectl apply loop
Prevents CI/CD from overwriting manually patched K8s secrets.
Secrets should be managed separately (GitHub Secrets / sealed-secrets).

Root cause: 03-secrets.yaml contains CHANGE_ME placeholders,
causing pods to crash with "password authentication failed".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:16:27 +08:00
OG T
0aa80c1d32 fix(docker): embed mock types for Docker build compatibility
Remove lewooogo-brain local dependency that breaks Docker context.
Inline Proposal/Guardrails definitions in proposals.py mock.

Phase 6.4i will address proper monorepo Docker packaging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 12:01:20 +08:00
OG T
cb5d0ecfe4 feat(phase-6.4g-6.5b): API Synaptic Integration + Dual-State WarRoom UI
Phase 6.4g (API 突觸對接):
- lewooogo-brain dependency binding in apps/api/pyproject.toml
- POST /api/v1/incidents/{id}/propose route (proposals.py)
- Guardrails integration (8/8 tests passed)

Phase 6.5a (視覺皮層建置):
- DualStateIncidentCard.tsx with Nothing.tech visual compliance
- Ping radar animation for alert state
- Tier-based decision layer UI (AI 執行中 / 等待親核)

Phase 6.5b (神經網路串接):
- Main warroom page integration (page.tsx)
- IncidentResponse → DualState mapper function
- Empty state: "系統穩定。0 活躍異常。"

Tests:
- test_guardrails.py (8/8)
- test_incident_engine.py (6/6)
- test_skill_loader.py (6/6)
- Frontend build: 0 errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 11:58:28 +08:00
OG T
8eaf2acb0d docs(skills): add guardrails and dry-run principles
- Skill 03: Add proposal guardrails (forbidden commands, namespace binding)
- Skill 04: Add idempotency and garbage collection awareness
- Skill 05: Add dry-run first principle for destructive operations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:59:20 +08:00
OG T
c4dae39dfa docs(skills): add 2026-03-23 production incident learnings
New sections added:

01-frontend-aesthetics:
- Polling + Operation Race Condition pattern

02-lewooogo-backend-core:
- Worker Redis dedicated connection pool (socket_timeout=None)
- SQLite prohibition decree
- Function rename global search requirement

04-awoooi-devops-commander:
- NetworkPolicy Pod Selector (system label)
- Zombie consumer group cleanup
- PostgreSQL initialization checklist

05-awoooi-sre-qa (updated earlier):
- CrashLoopBackOff diagnosis
- Telegram health check
- Frontend race condition diagnosis

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:46:14 +08:00
OG T
1576f2ab20 fix(db): eliminate SQLite brain-split, force PostgreSQL
Root cause: Worker used SQLITE_DATABASE_URL causing "no such table: incidents"
because each Pod had isolated SQLite file, not shared PostgreSQL.

Fixes:
- db/base.py: Use DATABASE_URL (PostgreSQL) instead of SQLITE_DATABASE_URL
- Added SQLite prohibition guard with logging
- Added pool_size and pool_pre_ping for production stability

New: packages/lewooogo-data PgMemoryProvider (Phase 6.4d)
- Episodic Memory implementation for PostgreSQL
- init_pg_engine() with auto table creation
- SQLite forbidden by Commander's decree

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 10:02:43 +08:00
OG T
9f353343c9 fix(worker): dedicated Redis pool with unlimited timeout for XREADGROUP
Root cause: Worker shared Redis pool with API (socket_timeout=5s),
but XREADGROUP blocks for 5s causing timeout errors every cycle.

Fix:
- Add init_worker_redis_pool() with socket_timeout=None
- Worker now uses get_worker_redis() for XREADGROUP operations
- API continues using get_redis() with short timeout

Also destroyed 50 zombie consumers via:
  XGROUP DESTROY stream:awoooi_signals awoooi_workers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:42:11 +08:00
OG T
80d0ef4a8f feat(packages): Phase 6.4a-c leWOOOgo modular architecture
New packages:
- packages/lewooogo-brain: AI reasoning & decision engine
  - IProposalEngine interface (ABC)
  - IIncidentProcessor interface (ABC)
  - Pydantic models: Proposal, Guardrails, Incident, Signal

- packages/lewooogo-data: Memory provider abstraction
  - IMemoryProvider interface (ABC)
  - IDualMemoryProvider for Working + Episodic memory
  - Generic type support for flexible data models

Documentation:
- ADR-008: Python modular packages architecture decision
- ARCHITECTURE_MEMORY.md: Module map index for AI developers
- LOGBOOK.md: Updated milestones and Phase 6.4 status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:32:07 +08:00
OG T
d050dd1ecc docs(skills): add production debugging patterns from 2026-03-23 incidents
New sections in 05-awoooi-sre-qa.md:
- Worker CrashLoopBackOff diagnosis procedure
- Telegram alert system health check
- Frontend race condition diagnosis (Polling vs API)
- Import name mismatch detection pattern

Lessons learned from:
- 7+ hour outage due to undetected worker crash
- Approval card flicker due to Zustand polling race condition

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:21:13 +08:00
OG T
6d7486634b fix(worker): correct redis function names causing CrashLoopBackOff
signal_worker.py was importing non-existent init_redis/close_redis
Correct names are init_redis_pool/close_redis_pool

Root cause of:
- No Telegram alerts for 7+ hours
- No new approval cards
- No incident processing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:19:01 +08:00
OG T
68f4cf51b6 fix(web): resolve approval card race condition with polling
Race condition between polling (5s interval) and sign/reject operations
caused cards to flicker and reappear after being approved.

Fix:
- Pause polling during sign/reject API calls
- Resume polling after 1 second delay to allow backend state sync
- Apply same pattern to both signApproval and rejectApproval

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 09:13:35 +08:00
OG T
de5796522f fix(api): fix optimization_suggestions dict access in proposal generation
The optimization_suggestions field is list[dict], not list[object].
Use .get() to access dict keys instead of attribute access.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-23 01:40:00 +08:00