Some checks are pending
CD Pipeline / build-and-deploy (push) Has started running
C1 — Repository 層修正 (積木化鐵律): 新增 PlaybookEmbeddingRepository (pgvector UPSERT) playbook_embedding_service 改透過 Repository 存取 DB,不再直接 db.execute(text(...)) C2 — Router 層業務邏輯移入 Service 層: create_incident_for_approval + extract_affected_services (去掉底線前綴) 移入 incident_service.py webhooks.py 改從 incident_service import,自身不再含業務邏輯 I1 — _infra_jobs 提升為 module-level frozenset (_INFRA_JOB_NAMES),避免每次呼叫重建 I2 — _persist_embeddings_to_db 補齊 PlaybookRAGService / list[Playbook] 型別標注 I3 — embedding 格式顯式化: "[" + ",".join(str(float(x)) for x in embedding) + "]" 防止 pgvector 因格式差異靜默解析失敗 I4 — import asyncio 移至 main.py 頂層,移除 try 區塊內重複 import M1 — similarity.py: 移除死代碼 `if union > 0 else 0.0` union 在兩個集合都非空時不可能為 0 2026-04-10 Asia/Taipei — Claude Sonnet 4.6 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
106 lines
3.2 KiB
Python
106 lines
3.2 KiB
Python
"""
|
||
Similarity Calculation Utils
|
||
=============================
|
||
Phase 22 P2: 將相似度計算邏輯從 Repository 移出
|
||
|
||
設計原則:
|
||
- 演算法邏輯應獨立於資料存取層
|
||
- Repository 只負責 CRUD,不負責演算法
|
||
- Service 層可以使用這些工具函數
|
||
|
||
版本: v1.1
|
||
建立: 2026-03-31 (台北時區)
|
||
建立者: Claude Code (首席架構師技術債修復)
|
||
更新: 2026-04-10 (台北時區) Claude Sonnet 4.6
|
||
- Phase 3 飛輪修復: affected_services 空集合豁免
|
||
Playbook.affected_services=[] 代表通用型基礎設施 Playbook,
|
||
不針對特定服務(如 high-cpu-restart 適用所有主機 CPU 告警),
|
||
給予 1.0 豁免分,不因服務名不匹配而拉低整體相似度。
|
||
- severity 豁免: Playbook.severity_range=[] 代表適用所有嚴重度
|
||
"""
|
||
|
||
from src.models.playbook import SymptomPattern
|
||
|
||
|
||
def calculate_jaccard_similarity(set_a: set, set_b: set) -> float:
|
||
"""
|
||
計算 Jaccard 相似度
|
||
|
||
Jaccard = |A ∩ B| / |A ∪ B|
|
||
|
||
Args:
|
||
set_a: 集合 A
|
||
set_b: 集合 B
|
||
|
||
Returns:
|
||
float: 0.0 ~ 1.0
|
||
"""
|
||
if not set_a and not set_b:
|
||
return 1.0 # 兩個空集合視為完全相同
|
||
if not set_a or not set_b:
|
||
return 0.0
|
||
|
||
intersection = len(set_a & set_b)
|
||
union = len(set_a | set_b)
|
||
return intersection / union
|
||
|
||
|
||
def calculate_symptom_similarity(
|
||
pattern_a: SymptomPattern,
|
||
pattern_b: SymptomPattern,
|
||
) -> float:
|
||
"""
|
||
計算症狀相似度
|
||
|
||
算法: 加權 Jaccard 相似度 + 通用型 Playbook 豁免
|
||
|
||
維度權重:
|
||
- alert_names: 0.35 (最重要)
|
||
- affected_services: 0.30
|
||
- severity: 0.15
|
||
- keywords: 0.20
|
||
|
||
豁免規則 (Phase 3 飛輪修復, 2026-04-10):
|
||
- pattern_b.affected_services 為空 → 通用型 Playbook,services 維度給 1.0
|
||
(high-cpu-restart、crashloop-pod-delete 等基礎設施 Playbook 不針對特定服務)
|
||
- pattern_b.severity_range 為空 → 適用所有嚴重度,severity 維度給 1.0
|
||
|
||
Returns:
|
||
float: 0.0 ~ 1.0 相似度分數
|
||
"""
|
||
weights = {
|
||
"alert_names": 0.35,
|
||
"affected_services": 0.30,
|
||
"severity": 0.15,
|
||
"keywords": 0.20,
|
||
}
|
||
|
||
scores = {
|
||
"alert_names": calculate_jaccard_similarity(
|
||
set(pattern_a.alert_names),
|
||
set(pattern_b.alert_names),
|
||
),
|
||
# 通用型 Playbook 豁免:Playbook 沒有限定服務 → 任何服務都適用 → 1.0
|
||
"affected_services": (
|
||
1.0
|
||
if not pattern_b.affected_services
|
||
else calculate_jaccard_similarity(
|
||
set(pattern_a.affected_services),
|
||
set(pattern_b.affected_services),
|
||
)
|
||
),
|
||
# 通用型 Playbook 豁免:Playbook 沒有限定嚴重度 → 任何嚴重度都適用 → 1.0
|
||
"severity": (
|
||
1.0
|
||
if not pattern_b.severity_range
|
||
or bool(set(pattern_a.severity_range) & set(pattern_b.severity_range))
|
||
else 0.0
|
||
),
|
||
"keywords": calculate_jaccard_similarity(
|
||
set(pattern_a.keywords),
|
||
set(pattern_b.keywords),
|
||
),
|
||
}
|
||
|
||
return sum(weights[k] * scores[k] for k in weights)
|