Commit Graph

9 Commits

Author SHA1 Message Date
OG T
670cd5df86 refactor(flywheel): 首席架構師審查修正 C1/C2/I1/I2/I3/I4/M1
Some checks are pending
CD Pipeline / build-and-deploy (push) Has started running
C1 — Repository 層修正 (積木化鐵律):
  新增 PlaybookEmbeddingRepository (pgvector UPSERT)
  playbook_embedding_service 改透過 Repository 存取 DB,不再直接 db.execute(text(...))

C2 — Router 層業務邏輯移入 Service 層:
  create_incident_for_approval + extract_affected_services (去掉底線前綴) 移入 incident_service.py
  webhooks.py 改從 incident_service import,自身不再含業務邏輯

I1 — _infra_jobs 提升為 module-level frozenset (_INFRA_JOB_NAMES),避免每次呼叫重建

I2 — _persist_embeddings_to_db 補齊 PlaybookRAGService / list[Playbook] 型別標注

I3 — embedding 格式顯式化: "[" + ",".join(str(float(x)) for x in embedding) + "]"
  防止 pgvector 因格式差異靜默解析失敗

I4 — import asyncio 移至 main.py 頂層,移除 try 區塊內重複 import

M1 — similarity.py: 移除死代碼 `if union > 0 else 0.0`
  union 在兩個集合都非空時不可能為 0

2026-04-10 Asia/Taipei — Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:35:10 +08:00
OG T
c6edfb5614 fix(flywheel): 四階段系統性修復 AUTO_REPAIR NO_MATCH 斷層
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Phase 1 — affected_services 污染根治
  - webhooks.py: _extract_affected_services() 從 labels 精準萃取服務名
    (component > job > pod deployment name > clean target_resource > [])
  - create_incident_for_approval: alert_labels 完整保留進 Signal
  - alert_name 從 alertname 取,不再用 "custom"

Phase 2 — Playbook alertname 變體擴充
  - alert_rules.yaml: 5 條規則新增 HostHighCpuLoad、KubePodCrashLooping 等變體
  - scripts/update_playbook_alert_variants.py: Redis index 已執行更新 

Phase 3 — Jaccard 通用型 Playbook 豁免
  - similarity.py: affected_services=[] → 1.0 豁免(基礎設施 Playbook 不針對特定服務)
  - severity_range=[] → 1.0 豁免(適用所有嚴重度)

Phase 4 — Playbook Embedding 持久化(冷啟動修復)
  - migrations/flywheel_playbook_embeddings.sql: pgvector 持久化表
  - services/playbook_embedding_service.py: 啟動時重建 Redis 向量快取 + 同步 DB
  - main.py: lifespan 啟動時 asyncio.create_task 非阻塞執行

2026-04-10 Asia/Taipei — Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:04:56 +08:00
OG T
cd37befbe6 fix(models): 全面替換 datetime.UTC → timezone.utc 相容 Python 3.10
Some checks failed
CD Pipeline / build-and-deploy (push) Has been cancelled
Type Sync Check / check-type-sync (push) Successful in 59s
terminal.py, incident.py, utils/timezone.py 同樣問題。
CI runner Python 3.10 無 UTC 常數,導致所有模型靜默 import 失敗。

# 2026-04-06 ogt: 完整修復,不再有漏網之魚

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 12:40:27 +08:00
OG T
a94bb57d8b feat(types): ADR-046 IncidentConverter + IncidentEngineAdapter
實作 ADR-046 Option B: IncidentConverter 轉換層,解決
BrainIncident (lewooogo-brain) 與 LocalIncident (apps/api) 型別邊界問題。

變更:
- 新增 src/utils/incident_converter.py
  - brain_to_local(): BrainIncident → LocalIncident
  - local_to_brain(): LocalIncident → BrainIncident
  - ESCALATED → MITIGATING 映射 (brain 無 ESCALATED)
- incident_engine.py: 新增 IncidentEngineAdapter 包裝層
  - process_signal() / get_incident() 輸出轉換為 LocalIncident
  - get_incident_engine() 返回 IncidentEngineAdapter
- incident_memory.py: 加入 brain_to_local import,更新 _record_to_incident 說明
- ADR-046: 標記三個轉換點全部完成

解鎖: #123 proposal_service.py 清理 (下一步)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 22:47:54 +08:00
OG T
e1e3bba296 refactor(api): Phase 22 技術債修復 - 業務邏輯分層
Some checks failed
E2E Health Check / e2e-health (push) Has been cancelled
P2.3: LearningService.get_learning_summary() 業務邏輯移至 Service 層
- Repository 只提供原始統計數據
- Service 計算 best_action 和 learning_status

P2.6: Playbook similarity 計算邏輯抽取
- 新增 src/utils/similarity.py
- Repository 從 utils 導入,不再定義演算法

2026-03-31 Claude Code (首席架構師技術債修復)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-31 18:55:06 +08:00
OG T
938df7f291 fix(api): 全面清除假信心分數 - 遵循 feedback_confidence_truthfulness.md
🔴 違規修正: 規則匹配/Expert System 不是 AI 分析,confidence 必須 = 0.0

修正檔案:
- agents/action_planner.py: 0.9 → 0.0
- agents/blast_radius.py: 0.85/0.5/0.9 → 0.0
- agents/security.py: 計算公式 → 0.0
- signoz_webhook.py: 0.7 → 0.0
- auto_approve.py: default 0.5 → 0.0
- ci_auto_repair.py: 整個計算函數 → return 0.0
- error_analyzer_service.py: default 0.5 → 0.0
- intent_classifier.py: 計算公式 → 0.0
- openclaw.py: default 0.5 → 0.0
- resource_resolver.py: 0.8 → 0.0
- k8s_naming.py: 0.9/0.7 → 0.0

只有 LLM 真實分析返回的 confidence 才能 > 0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-29 16:00:46 +08:00
OG T
96c3ddd8c4 feat(api): Phase 18.1 K8s 資源名稱驗證 (ADR-016)
三層防禦架構確保 kubectl 指令有效:
1. Webhook 入口正規化 (webhooks.py)
2. OpenClaw 產生指令前驗證 (openclaw.py)
3. 靜態映射表 + 模糊匹配 (k8s_naming.py, resource_resolver.py)

新增:
- src/utils/k8s_naming.py: RFC 1123 正規化 + 靜態映射
- src/services/resource_resolver.py: MCP K8s Tool 動態驗證
- docs/adr/ADR-016-k8s-resource-naming.md: 契約文檔
- scripts/e2e_tool_call_verification.py: E2E 驗證腳本 v2.0

修改:
- webhooks.py: Phase 18.1.7 入口正規化
- openclaw.py: Phase 18.1.6 產生指令前驗證
- Skill 03 v1.4: 新增 K8s 資源驗證章節

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-26 11:22:47 +08:00
OG T
749b8bc554 fix(api): 修復時區 import 排序與未使用變數 lint 錯誤
- 修正 import 順序 (standard → third-party → local)
- 修復 datetime/timedelta 未定義錯誤
- 移除未使用的 imports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 09:26:58 +08:00
OG T
2a2dac865a feat(api): 統一使用台北時區 UTC+8 (禁止 UTC)
- 新增 src/utils/timezone.py 時區工具函式
- 修改 11 個後端檔案,全部改用 now_taipei()
- 更新 HARD_RULES.md 加入時區鐵律章節
- 更新 Skills 02/04 加入時區禁令

🔴 HARD RULE: 禁止 datetime.utcnow() / datetime.now(UTC)
 正確做法: from src.utils.timezone import now_taipei

Memory: feedback_timezone_taipei.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-25 09:08:34 +08:00