awoooi

Author	SHA1	Message	Date
OG T	6f049877fc	fix(lint): ruff auto-fix + lewooogo-core src 加入 git - Python: ruff --fix 修復 280 個 lint 錯誤 - lewooogo-core: src/ 目錄未追蹤，導致 CI eslint 失敗 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:51:37 +08:00
OG T	f78aab8b2a	fix(api): DecisionToken 狀態同步 (Y/n 持久化修復) 根本原因: - resolve_incident_after_approval 只更新 Incident.decision.state - 沒有更新獨立儲存的 DecisionToken (decision:{token} key) - 導致下次 poll 時 get_or_create_decision 返回 READY 狀態的舊 token - 前端繼續顯示 Y/n 按鈕修復: - 在 resolve_incident_after_approval 中同時更新 DecisionToken 狀態為 COMPLETED - 確保整個決策鏈路狀態一致 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:46:21 +08:00
OG T	8542632cff	fix(ci): Harbor HTTP registry + Telegram secrets CD 修復: - 修復 buildx HTTP vs HTTPS 問題 (insecure registry 設定) - 移除 UAT 環境 (違反 Memory 鐵律) - 新增 Production 部署 Telegram 通知 - 修復 deploy-prod.yml 硬編碼 Token (改用 secrets) docs: - 新增 guidelines/ 結構化指引目錄 - ARCHITECTURE.md, FRONTEND.md, OPERATIONS.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:40:40 +08:00
OG T	00d94ca71c	docs: CLAUDE.md 引用 HARD_RULES.md (禁止爆滿) 結構: - CLAUDE.md: 精簡索引，只放引用連結 - docs/HARD_RULES.md: 詳細規則這是早就溝通好的做法，不應該忘記。 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:32:35 +08:00
OG T	dc30c70e57	docs(CLAUDE.md): 新增絕對禁止規則 (Hard Rules) 問題: - Memory 有記錄但沒有實際遵守 - CI workflow 被改成 ubuntu-latest 違反 Memory 鐵律 - 長期記憶形同虛設修復: - 直接在 CLAUDE.md 寫死禁止項目 - 新增修改前檢查清單 - 這些規則會在每次 Session 自動載入禁止項目: - runs-on: ubuntu-latest → self-hosted - Telegram logOut() → 禁止 - 前端硬編碼 → next-intl - SQLite → PostgreSQL - CORS * → 白名單 - 假數據 → 真實 API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:31:04 +08:00
OG T	fc995be6e3	fix(ci): 改用 self-hosted runner (GitHub 帳單問題) 問題: - CI workflow 不知何時被改成 ubuntu-latest - 導致 GitHub Actions 因帳單問題失敗修復: - 全部改回 self-hosted (awoooi-110) 鐵律: - Memory 記錄: feedback_github_billing.md - 禁止使用 GitHub 雲端 Runner Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:29:38 +08:00
OG T	7d8eb26ebe	feat(telegram): 新增心跳監控防止沉默盲點功能: - send_heartbeat(): 每 30 分鐘發送系統狀態 - start_heartbeat_monitor(): 背景心跳監控 - 沉默告警: 超過 2 小時沒訊息自動告警目的: - 避免 Telegram 長時間沒訊息被當成「系統穩定」 - 主動驗證告警鏈路是否正常運作 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:26:08 +08:00
OG T	eca3759fde	fix(telegram): 修復 Signal Worker 流程 Telegram 通知斷鏈問題: - Phase 6 Signal Worker 新架構沒有整合 Telegram 推送 - 決策就緒時 Telegram 完全沒收到通知 - 這是嚴重的監控盲點！修復: - 新增 _push_decision_to_telegram() 推送函數 - DecisionManager 決策 READY 時自動推送 - 非阻塞執行 (asyncio.create_task) Telegram 通知內容: - 告警來源 (LLM/Expert System) - 受影響服務 - 建議動作 - 風險等級 - 信心分數 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:22:26 +08:00
OG T	29ceb786ca	feat(web): 全局戰情室顯示真實 AI 決策鏈問題: - ThinkingTerminal 使用 DEMO_DECISION_CHAIN 假數據 - 用戶無法看到 OpenClaw AI 的真實推理過程修復: - 新增 convertToDecisionChain() 轉換 API 格式 - 從 incident.decision.proposal_data 提取真實 AI 資料 - 顯示: 決策引擎來源、推理過程、建議動作、信心分數 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:17:52 +08:00
OG T	bd1f94dd72	fix(worker): 初始化 PostgreSQL 連線池 - 修復 Incident DB 持久化問題: - Signal Worker 沒有初始化 PostgreSQL，導致 incidents 表可能不存在 - Incident 只寫入 Redis，未持久化到 PostgreSQL - 審核後無法正確更新 DB 狀態修復: - 在 Signal Worker 啟動時呼叫 init_db() 建立表 - 在關閉時呼叫 close_db() 釋放連線池 - 增加 PostgreSQL 初始化日誌 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:13:49 +08:00
OG T	c8558cda9e	fix(api): resolve 時 DB 記錄不存在視為成功根因: Incident 可能因 DB 寫入失敗只存在於 Redis 修復: 只要 Redis 更新成功就算成功 (API 只讀 Redis) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 23:09:46 +08:00
OG T	d60cb54c08	fix(api): resolve_incident_after_approval 使用直接更新邏輯原因: 透過 _persist_incident 間接更新失敗修復: 改用直接 Redis + DB 更新 (與 debug endpoint 相同邏輯) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 22:31:18 +08:00
OG T	8ef8347f2c	chore: debug endpoint 更詳細的錯誤追蹤	2026-03-23 22:23:24 +08:00
OG T	58f3339561	chore(api): 新增 debug endpoint 測試 incident resolve 臨時測試端點，用於驗證 resolve_incident_after_approval 邏輯 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 22:15:16 +08:00
OG T	03ca124967	fix(api): _persist_incident 新增顯式 commit + 追蹤日誌根因: DB 變更未被 commit，導致 Incident 狀態更新不持久化 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 22:02:00 +08:00
OG T	65fa1168b8	feat(api): ApprovalRequestResponse 新增 metadata 欄位讓前端/API 可見 incident_id，用於除錯和關聯追蹤 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 21:51:05 +08:00
OG T	ac3bf97920	fix(api): 簽核後更新 Incident 狀態為 RESOLVED 根因: 簽核成功後 Incident.status 未更新，導致刷新頁面後 Y/n 按鈕重現修復: - proposal_service.py: 新增 resolve_incident_after_approval() 方法 - approvals.py: sign_approval 成功後呼叫更新 Incident 狀態 - 使用 metadata.incident_id 反查關聯的 Incident Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 21:37:50 +08:00
OG T	7db42ffdac	fix(web): Y按鈕簽核回應解析錯誤 - result.status → result.approval.status 根因: API 回傳 {approval: {status: 'approved'}} 但前端誤檢查 result.status 修復: - dual-state-incident-card.tsx: 正確解析 result.approval.status - api-client.ts: 更新回傳型別與後端對齊 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 21:20:41 +08:00
OG T	4c41a6728f	fix(web): Fix API contract mismatch for sign/reject endpoints - signApproval: send signer_id, signer_name, comment (not signer, reason) - rejectApproval: send rejector_id, rejector_name, reason Fixes 422 Unprocessable Entity on Y/n button click Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 21:04:44 +08:00
OG T	2b1264df05	docs: 完整治理架構 ADR-010/011/012 + CLAUDE.md 鐵律更新 2026-03-23 重大事故修復與治理： 1. ADR-010: Secrets 集中管理 (Bitwarden + Sealed Secrets) 2. ADR-011: NetworkPolicy 變更治理 (偵測 + 告警 + 人工決策) 3. ADR-012: 危險操作治理 (Tier 分級 + CI/CD 攔截 + 審計) 4. UX-001: 告警疲勞解決方案 (時間衰減 + 智慧分組) CLAUDE.md 更新: - 新增最高優先級鐵律 (禁止 ClawBot、OpenClaw 核心、禁止危險 API) - 新增任務開始前必讀 Memory 對照表事故教訓: - Telegram Token 連續三次被 logOut 失效 - AWOOOI API 程式碼呼叫 logOut 導致災難 - 已停用 AWOOOI API Telegram，OpenClaw 為唯一 Gateway Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 19:44:56 +08:00
OG T	e23493741a	fix(telegram): respect C-Suite decision - OpenClaw is sole brain 架構修正 2026-03-23 (遵循 C-Suite 決議): - 鐵律: .188 為唯一大腦，禁止腦分裂 - OpenClaw (192.168.0.188) = 唯一 Telegram Gateway - AWOOOI API (K8s) = Web API + Sensor，不做 Polling - TELEGRAM_ENABLE_POLLING 預設 False Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 19:25:08 +08:00
OG T	3e730f16d4	fix(ci): Add Docker login step for Harbor authentication Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 18:53:23 +08:00
OG T	2aef693c0d	fix(ci): Use monorepo root as Docker build context for API Phase 6.4i requires the API Dockerfile to copy local packages (lewooogo-brain, lewooogo-data) from the packages/ directory. Changed build context from 'apps/api' to '.' (root) to allow the Dockerfile to access the entire monorepo structure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 18:43:07 +08:00
OG T	7478dc0254	feat(phase6-9): Complete modular architecture and Agent Teams Phase 6.4 - Modular Architecture: - Add lewooogo-brain adapters for LLM providers - Add lewooogo-data dual memory (Redis + PostgreSQL) - Implement consensus engine for multi-agent decisions - Add incident memory service for historical context Phase 9 - Agent Teams (Claude Agent SDK): - Add base agent class with Claude Sonnet 4 integration - Implement action planner, blast radius, and security agents - Add agent API endpoints and proposal workflow - Integrate ADR-009 OpenClaw Agent Teams architecture DevOps & CI/CD: - Add GitHub Actions CI/CD workflows (ci.yaml, cd.yaml) - Add pre-commit hooks and secrets baseline - Add docker-compose for local development - Update Kubernetes network policies Frontend Improvements: - Add auto-healing error boundary component - Update i18n messages for agent features - Enhance dual-state incident card with execution feedback Documentation: - Add 7 ADRs covering MCP, design system, architecture decisions - Update ARCHITECTURE_MEMORY.md with modular design - Add GLOBAL_RULES.md and SOUL.md for project identity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 18:40:36 +08:00
OG T	6eccb45757	fix(api): Use in-cluster K8s config for executor in K8s pods - Try load_incluster_config() first (for pods running in K8s) - Fallback to kubeconfig file (for local development) - Fixes "K8s connection not available" error in production Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:45:58 +08:00
OG T	182410a995	docs(skills): Add Chinese action parsing lesson to Skill 03 Record the 2026-03-23 incident where Y button failed because LLM generated Chinese action strings that couldn't be parsed Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:28:40 +08:00
OG T	3de8a7701d	feat(web): Phase 6.5c UX improvements for Y/n execution feedback - Show actual error message on screen (not just hover tooltip) - Add retry button after error/timeout - Add 30-second timeout warning with "超時" state - Remove auto-dismiss of error (let user see and retry) - Truncate long error messages with full text in tooltip Fixes P0 UX issue: Users can now see what went wrong Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:28:35 +08:00
OG T	d1fb3aa010	fix(api): Expand Chinese action parsing for K8s executor - Add 擴展...副本數 pattern (scale variant) - Add 重新啟動 without 服務 suffix - Auto-detect StatefulSet Pod names (xxx-N) for DELETE_POD - Strip -deployment suffix from resource names Fixes Y button execution failure when LLM generates Chinese actions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:15:49 +08:00
OG T	41127f1e8b	docs: Add CLAUDE.md for Claude Code auto-load configuration - Skills index and routing table - Core rules (simplified) - Props mapping lesson from Y/n button incident Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:05:58 +08:00
OG T	962b1e75a5	refactor: Rename ClawBot → OpenClaw across documentation - Update .awoooi-agent-rules.md (4 occurrences) - Update docs/api/openapi.yaml (all schema references) - Update apps/web/tailwind.config.ts (comment) - Update apps/api/src/core/config.py (comment) Legacy CLAWBOT_URL field kept for backward compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 14:05:53 +08:00
OG T	b0302329f4	fix(web): Pass decision prop to DualStateIncidentCard Root cause: mapToDualState() was missing decision field, causing Y/n buttons to be permanently disabled. Now correctly passes incident.decision to the card component. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 13:54:42 +08:00
OG T	0aaf6a276b	feat(api,web): Phase 6.5 DecisionManager with dual-engine fallback Backend: - Add DecisionManager with state machine (INIT→ANALYZING→READY→EXECUTING) - Implement Expert System rules engine (100% local, never fails) - Dual-engine: LLM (primary) + Expert System (fallback) - Auto-generate decision_token for each incident - 30-second timeout guarantee Frontend: - Use decision.state to unlock [Y/n] buttons - Display AI action suggestion in card - Show source indicator [AI] or [EXP] - Generate proposal on-demand if needed Fixes: UI locked with hourglass when LLM times out Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 13:19:55 +08:00
OG T	c01742ef82	fix(web): Phase 6.5c+ enhance [Y/n] tactile feedback & diagnostics - Add active:scale-95 active:bg-neutral-800 for physical click feedback - Add disabled:opacity-30 for clearer disabled state - Add tooltip "大腦分析中..." when proposalId is missing - Add comprehensive console.log diagnostics for authorization flow - Add reason parameter "Authorized via WarRoom" for audit trail - Implement optimistic UI with immediate loading state transition Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 13:07:10 +08:00
OG T	7db5108a1f	feat(web): Phase 7.0 minimalist 5-pillar navigation - Refactor sidebar to Nothing.tech visual compliance - Add defensive route stubs for /authorizations, /knowledge-base, /settings - Dynamic badge for pending approvals count - Ultra-minimal borders (0.5px), no shadows Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 13:02:21 +08:00
OG T	eee4ab9b36	feat(api): Phase 6.6 implement k8s execution engine with subprocess ActionExecutor enhancements: - Add execute_kubectl_command() using asyncio.create_subprocess_shell - Security: Only kubectl commands allowed, forbidden patterns blocked - Shadow Mode: Simulate execution without actual kubectl calls - Capture stdout/stderr with PIPE, handle timeout gracefully New execute_approved_proposal() function: - Background task entry point for approved proposals - Read approval from Redis/DB, verify status='approved' - Extract kubectl_command from metadata - Execute via execute_kubectl_command() - Update status to 'executed' or 'failed' with execution_log Security guardrails: - Forbid delete namespace/ns, rm -rf, drop database - Forbid batch deletion patterns - 60 second default timeout Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:46:47 +08:00
OG T	28fa8e6af4	feat(web): Phase 6.5c implement [Y/n] execution wiring DualStateIncidentCard: - Add proposalId prop for approval actions - Add onApprovalChange callback for status updates - Implement handleApprove() calling POST /api/v1/approvals/{id}/sign - Implement handleReject() calling POST /api/v1/approvals/{id}/reject - Add ButtonState management (idle/loading/approved/rejected/error) - Loading spinner during API call - Success state: green "已授權" / red "已拒絕" - Error state: orange "錯誤" with auto-recovery API Client: - Fix endpoint mismatch: rename approveApproval to signApproval - Use correct endpoint /sign instead of /approve - Add signer parameter for multi-sig support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:37:56 +08:00
OG T	a769738499	feat(api): Phase 6.4h replace mock DI with real ProposalService - Remove MockEngine and embedded Proposal/Guardrails classes - Import real ProposalService with OpenClaw LLM integration - Use get_real_proposal_service() for dependency injection - ProposalService integrates: - OpenClaw LLM (Ollama → Gemini → Claude fallback) - Redis Working Memory - PostgreSQL Episodic Memory - TrustEngine risk assessment - Add llm_provider, llm_confidence, kubectl_command to response - Map ApprovalRiskLevel to Tier (LOW=1, MEDIUM=2, CRITICAL=3) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:25:39 +08:00
OG T	be8ed1f7ba	fix(web): resolve interface mismatch + add defensive null checks - P0/P1/P2 now map to 'alert' status (was P0/P1 only) - Tier mapping: P0=Tier3, P1=Tier2, P2=Tier1 - Added null/undefined guards in mapToDualState() - Optional chaining on incidents array access - Safe fallback for missing serviceName, message, timestamp Fixes frontend warroom showing no cards despite API returning data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:17:58 +08:00
OG T	a825aa9634	fix(ci): exclude secrets.yaml from kubectl apply loop Prevents CI/CD from overwriting manually patched K8s secrets. Secrets should be managed separately (GitHub Secrets / sealed-secrets). Root cause: 03-secrets.yaml contains CHANGE_ME placeholders, causing pods to crash with "password authentication failed". Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:16:27 +08:00
OG T	0aa80c1d32	fix(docker): embed mock types for Docker build compatibility Remove lewooogo-brain local dependency that breaks Docker context. Inline Proposal/Guardrails definitions in proposals.py mock. Phase 6.4i will address proper monorepo Docker packaging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 12:01:20 +08:00
OG T	cb5d0ecfe4	feat(phase-6.4g-6.5b): API Synaptic Integration + Dual-State WarRoom UI Phase 6.4g (API 突觸對接): - lewooogo-brain dependency binding in apps/api/pyproject.toml - POST /api/v1/incidents/{id}/propose route (proposals.py) - Guardrails integration (8/8 tests passed) Phase 6.5a (視覺皮層建置): - DualStateIncidentCard.tsx with Nothing.tech visual compliance - Ping radar animation for alert state - Tier-based decision layer UI (AI 執行中 / 等待親核) Phase 6.5b (神經網路串接): - Main warroom page integration (page.tsx) - IncidentResponse → DualState mapper function - Empty state: "系統穩定。0 活躍異常。" Tests: - test_guardrails.py (8/8) - test_incident_engine.py (6/6) - test_skill_loader.py (6/6) - Frontend build: 0 errors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 11:58:28 +08:00
OG T	8eaf2acb0d	docs(skills): add guardrails and dry-run principles - Skill 03: Add proposal guardrails (forbidden commands, namespace binding) - Skill 04: Add idempotency and garbage collection awareness - Skill 05: Add dry-run first principle for destructive operations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 10:59:20 +08:00
OG T	c4dae39dfa	docs(skills): add 2026-03-23 production incident learnings New sections added: 01-frontend-aesthetics: - Polling + Operation Race Condition pattern 02-lewooogo-backend-core: - Worker Redis dedicated connection pool (socket_timeout=None) - SQLite prohibition decree - Function rename global search requirement 04-awoooi-devops-commander: - NetworkPolicy Pod Selector (system label) - Zombie consumer group cleanup - PostgreSQL initialization checklist 05-awoooi-sre-qa (updated earlier): - CrashLoopBackOff diagnosis - Telegram health check - Frontend race condition diagnosis Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 10:46:14 +08:00
OG T	1576f2ab20	fix(db): eliminate SQLite brain-split, force PostgreSQL Root cause: Worker used SQLITE_DATABASE_URL causing "no such table: incidents" because each Pod had isolated SQLite file, not shared PostgreSQL. Fixes: - db/base.py: Use DATABASE_URL (PostgreSQL) instead of SQLITE_DATABASE_URL - Added SQLite prohibition guard with logging - Added pool_size and pool_pre_ping for production stability New: packages/lewooogo-data PgMemoryProvider (Phase 6.4d) - Episodic Memory implementation for PostgreSQL - init_pg_engine() with auto table creation - SQLite forbidden by Commander's decree Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 10:02:43 +08:00
OG T	9f353343c9	fix(worker): dedicated Redis pool with unlimited timeout for XREADGROUP Root cause: Worker shared Redis pool with API (socket_timeout=5s), but XREADGROUP blocks for 5s causing timeout errors every cycle. Fix: - Add init_worker_redis_pool() with socket_timeout=None - Worker now uses get_worker_redis() for XREADGROUP operations - API continues using get_redis() with short timeout Also destroyed 50 zombie consumers via: XGROUP DESTROY stream:awoooi_signals awoooi_workers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 09:42:11 +08:00
OG T	80d0ef4a8f	feat(packages): Phase 6.4a-c leWOOOgo modular architecture New packages: - packages/lewooogo-brain: AI reasoning & decision engine - IProposalEngine interface (ABC) - IIncidentProcessor interface (ABC) - Pydantic models: Proposal, Guardrails, Incident, Signal - packages/lewooogo-data: Memory provider abstraction - IMemoryProvider interface (ABC) - IDualMemoryProvider for Working + Episodic memory - Generic type support for flexible data models Documentation: - ADR-008: Python modular packages architecture decision - ARCHITECTURE_MEMORY.md: Module map index for AI developers - LOGBOOK.md: Updated milestones and Phase 6.4 status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 09:32:07 +08:00
OG T	d050dd1ecc	docs(skills): add production debugging patterns from 2026-03-23 incidents New sections in 05-awoooi-sre-qa.md: - Worker CrashLoopBackOff diagnosis procedure - Telegram alert system health check - Frontend race condition diagnosis (Polling vs API) - Import name mismatch detection pattern Lessons learned from: - 7+ hour outage due to undetected worker crash - Approval card flicker due to Zustand polling race condition Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 09:21:13 +08:00
OG T	6d7486634b	fix(worker): correct redis function names causing CrashLoopBackOff signal_worker.py was importing non-existent init_redis/close_redis Correct names are init_redis_pool/close_redis_pool Root cause of: - No Telegram alerts for 7+ hours - No new approval cards - No incident processing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 09:19:01 +08:00
OG T	68f4cf51b6	fix(web): resolve approval card race condition with polling Race condition between polling (5s interval) and sign/reject operations caused cards to flicker and reappear after being approved. Fix: - Pause polling during sign/reject API calls - Resume polling after 1 second delay to allow backend state sync - Apply same pattern to both signApproval and rejectApproval Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 09:13:35 +08:00
OG T	de5796522f	fix(api): fix optimization_suggestions dict access in proposal generation The optimization_suggestions field is list[dict], not list[object]. Use .get() to access dict keys instead of attribute access. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-23 01:40:00 +08:00

... 70 71 72 73 74

3678 Commits