Files
awoooi/apps/api/src/utils/similarity.py
OG T 670cd5df86
Some checks are pending
CD Pipeline / build-and-deploy (push) Has started running
refactor(flywheel): 首席架構師審查修正 C1/C2/I1/I2/I3/I4/M1
C1 — Repository 層修正 (積木化鐵律):
  新增 PlaybookEmbeddingRepository (pgvector UPSERT)
  playbook_embedding_service 改透過 Repository 存取 DB,不再直接 db.execute(text(...))

C2 — Router 層業務邏輯移入 Service 層:
  create_incident_for_approval + extract_affected_services (去掉底線前綴) 移入 incident_service.py
  webhooks.py 改從 incident_service import,自身不再含業務邏輯

I1 — _infra_jobs 提升為 module-level frozenset (_INFRA_JOB_NAMES),避免每次呼叫重建

I2 — _persist_embeddings_to_db 補齊 PlaybookRAGService / list[Playbook] 型別標注

I3 — embedding 格式顯式化: "[" + ",".join(str(float(x)) for x in embedding) + "]"
  防止 pgvector 因格式差異靜默解析失敗

I4 — import asyncio 移至 main.py 頂層,移除 try 區塊內重複 import

M1 — similarity.py: 移除死代碼 `if union > 0 else 0.0`
  union 在兩個集合都非空時不可能為 0

2026-04-10 Asia/Taipei — Claude Sonnet 4.6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 11:35:10 +08:00

106 lines
3.2 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
Similarity Calculation Utils
=============================
Phase 22 P2: 將相似度計算邏輯從 Repository 移出
設計原則:
- 演算法邏輯應獨立於資料存取層
- Repository 只負責 CRUD不負責演算法
- Service 層可以使用這些工具函數
版本: v1.1
建立: 2026-03-31 (台北時區)
建立者: Claude Code (首席架構師技術債修復)
更新: 2026-04-10 (台北時區) Claude Sonnet 4.6
- Phase 3 飛輪修復: affected_services 空集合豁免
Playbook.affected_services=[] 代表通用型基礎設施 Playbook
不針對特定服務(如 high-cpu-restart 適用所有主機 CPU 告警),
給予 1.0 豁免分,不因服務名不匹配而拉低整體相似度。
- severity 豁免: Playbook.severity_range=[] 代表適用所有嚴重度
"""
from src.models.playbook import SymptomPattern
def calculate_jaccard_similarity(set_a: set, set_b: set) -> float:
"""
計算 Jaccard 相似度
Jaccard = |A ∩ B| / |A B|
Args:
set_a: 集合 A
set_b: 集合 B
Returns:
float: 0.0 ~ 1.0
"""
if not set_a and not set_b:
return 1.0 # 兩個空集合視為完全相同
if not set_a or not set_b:
return 0.0
intersection = len(set_a & set_b)
union = len(set_a | set_b)
return intersection / union
def calculate_symptom_similarity(
pattern_a: SymptomPattern,
pattern_b: SymptomPattern,
) -> float:
"""
計算症狀相似度
算法: 加權 Jaccard 相似度 + 通用型 Playbook 豁免
維度權重:
- alert_names: 0.35 (最重要)
- affected_services: 0.30
- severity: 0.15
- keywords: 0.20
豁免規則 (Phase 3 飛輪修復, 2026-04-10):
- pattern_b.affected_services 為空 → 通用型 Playbookservices 維度給 1.0
high-cpu-restart、crashloop-pod-delete 等基礎設施 Playbook 不針對特定服務)
- pattern_b.severity_range 為空 → 適用所有嚴重度severity 維度給 1.0
Returns:
float: 0.0 ~ 1.0 相似度分數
"""
weights = {
"alert_names": 0.35,
"affected_services": 0.30,
"severity": 0.15,
"keywords": 0.20,
}
scores = {
"alert_names": calculate_jaccard_similarity(
set(pattern_a.alert_names),
set(pattern_b.alert_names),
),
# 通用型 Playbook 豁免Playbook 沒有限定服務 → 任何服務都適用 → 1.0
"affected_services": (
1.0
if not pattern_b.affected_services
else calculate_jaccard_similarity(
set(pattern_a.affected_services),
set(pattern_b.affected_services),
)
),
# 通用型 Playbook 豁免Playbook 沒有限定嚴重度 → 任何嚴重度都適用 → 1.0
"severity": (
1.0
if not pattern_b.severity_range
or bool(set(pattern_a.severity_range) & set(pattern_b.severity_range))
else 0.0
),
"keywords": calculate_jaccard_similarity(
set(pattern_a.keywords),
set(pattern_b.keywords),
),
}
return sum(weights[k] * scores[k] for k in weights)