V10.489 strengthen PChome manual review matching

This commit is contained in:
OoO
2026-05-29 11:24:27 +08:00
parent 81701e85f4
commit c1b2fa82f5
7 changed files with 217 additions and 2 deletions

View File

@@ -4,6 +4,8 @@
================================================================================
【已完成】
- V10.489 補 PChome 低分同款人工覆核回收與 gate-pass 風險邊界TS6 超美白香氛誘霜 120g/ml、W 修護保養蝸牛特潤修護面膜 6 片、Derma 大地 Eco 植萃護膚油 2 入,從低信心升成 `identity_review` 人工覆核候選Clarins 輕盈美體護理油 vs 身體調和護理油、台塑生醫嬰兒沐浴/洗髮組合數量反轉、isLeaf 私密慕絲香型數量不一致改 hard vetoHOOOME 大理石暖燈 vs 泛稱經典款改留 `variant_selection_review`。正式價差表仍需人工採用才會寫入。測試:完整 `pytest` 1289 passed / 9 skipped。
- V10.488 新增市場情報 MCP Fetch Run Receipt 安全預覽 gate只審核操作員 dry-run receipt不執行 CLI、不抓外站、不寫 DB。
- V10.486 補 PChome near-threshold 風險邊界NEW DIRECTIONS 甜杏仁油 vs 酪梨油直接 `core_ingredient_line_conflict` hard vetoCOCODOR 經典擴香瓶多款任選 vs generic、KAMERIA 足膜任選三款 vs 單一涼感足膜、Hakugen 白元入浴劑橘盒/綠盒不同變體都保留 `variant_selection_review`,不進可採用 gate。Production 已部署 `/health=V10.486`240 筆 near-threshold audit `gate_pass 83→79`、`identity_veto 0→1`、`still_low 157→160`。測試:`tests/test_marketplace_product_matcher.py`、`tests/test_competitor_match_attempts_persistence.py`、`tests/test_competitor_match_attempt_rescore_audit.py` 通過。
- V10.485 補 NITORI 香氛噴霧器短型號防線read-only near-threshold pilot 找到唯一 gate pass 為 5510 vs J82 LBR不應入隊matcher 現在會把 `J82` 這類短英數型號納入 NITORI diffuser model conflict與 5510 / YX168 等不同型號一樣 hard veto。Production 已部署 `/health=V10.485`120 筆 near-threshold audit 由 `gate_pass=1` 變 `gate_pass=0`accepted audit `scanned=89 / gate_pass=89 / still_low=0`。測試:`tests/test_marketplace_product_matcher.py`、`tests/test_competitor_match_attempts_persistence.py`、`tests/test_competitor_match_attempt_rescore_audit.py` 通過。
- V10.484 拆分 PChome manual gatePOWERMAN 男性私密養護液 30ml、PHYSIOGEL AI 冰鎮精華露 200ml 2入、TS6 緊彈水嫩凝膠 40g、DERMA 寶寶洗髮沐浴露 150/500ml、Clarins 黃金亮眼萃 20ml、Cetaphil 長效潤膚乳 237/473ml 等明確同款可走 `exact / total_price / price_alert_exact`COCODOR 大豆蠟燭單側多款任選改留 `variant_selection_review`Pavaruni 雙側 20 香味蠟燭不受新型錄保護誤傷。Production 曾部署 `/health=V10.484`,並退回 COCODOR 舊 accepted 風險 1 筆。測試:`tests/test_marketplace_product_matcher.py`、`tests/test_competitor_match_attempts_persistence.py`、`tests/test_competitor_match_attempt_rescore_audit.py` 通過。

View File

@@ -350,7 +350,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
# ==========================================
# 系統版本與路徑
# ==========================================
SYSTEM_VERSION = "V10.488"
SYSTEM_VERSION = "V10.489"
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
public_url = PUBLIC_URL # 用於模板顯示

View File

@@ -55,6 +55,7 @@
- 2026-05-24 追記:同步 PChome rescore audit 最新狀態口徑與單位價 multiplier 修正後的 `services/marketplace_product_matcher.py` 行數;此處只更新 inventory不變更拆分策略。
- 2026-05-24 追記:同步 PChome review queue 決策信封合併後的 `services/competitor_intel_repository.py` 行數;此處只更新 inventory不變更拆分策略。
- 2026-05-25 追記:同步背景 embedding 讀取 `host_health_probes` skip guard 後的 `services/ollama_service.py` 行數;此處只更新 inventory不變更 Ollama 路由決策。
- 2026-05-29 追記:同步 PChome near-threshold / focused identity 回收系列後的 `services/marketplace_product_matcher.py` 行數;此處只更新 inventory不變更拆分策略。
## 達到或超過 800 行檔案清單
@@ -81,7 +82,7 @@
| 940 | `services/import_service.py` | P2 import service | validators / import writers / report builders |
| 1071 | `services/telegram_templates.py` | P2 Telegram templates | alert template groups / channel-specific formatting / reusable render helpers |
| 867 | `services/token_report_service.py` | P2 token report service | query / aggregation / chart payload / notification formatting |
| 3786 | `services/marketplace_product_matcher.py` | P2 marketplace matcher | identity parsing / unit-comparable scoring / search term quality / persistence normalization |
| 4865 | `services/marketplace_product_matcher.py` | P2 marketplace matcher | identity parsing / unit-comparable scoring / search term quality / persistence normalization |
| 865 | `routes/daily_sales_routes.py` | P2 Daily Sales Blueprint | route glue / export helpers / daily query and formatting service |
| 1266 | `services/ollama_service.py` | P2 Ollama client | host health / request client / fallback policy / response parsing |
| 849 | `services/pchome_crawler.py` | P2 PChome crawler | search fetch / parsing / fallback source handling / rate limit policy |

View File

@@ -83,6 +83,8 @@
- 2026-05-25 23:45 CST 起,`V10.484` 拆分 manual gate exact 與型錄風險POWERMAN 男性私密養護液 30ml、PHYSIOGEL AI 冰鎮精華露 200ml 2入、TS6 緊彈水嫩凝膠 40g、DERMA 寶寶洗髮沐浴露 150/500ml、Clarins 黃金亮眼萃 20ml、Cetaphil 長效潤膚乳 237/473ml 等明確同款可走 `exact / total_price / price_alert_exact`COCODOR 大豆蠟燭單側多款任選保留 `variant_selection_review`Pavaruni 雙側 20 香味蠟燭保持 total-price exact。測試`tests/test_marketplace_product_matcher.py``tests/test_competitor_match_attempts_persistence.py``tests/test_competitor_match_attempt_rescore_audit.py` 通過。
- 2026-05-25 23:55 CST 起,`V10.485` 補 NITORI 香氛噴霧器短型號防線near-threshold read-only pilot 中唯一 gate pass 為 5510 vs J82 LBR已判定不該入隊matcher 將 `J82` 這類短英數型號納入 NITORI diffuser model conflict與 5510 / YX168 等不同型號一樣 hard veto。Production 已部署 `/health=V10.485`120 筆 near-threshold audit 由 `gate_pass=1``gate_pass=0`accepted audit `scanned=89 / gate_pass=89 / still_low=0`
- 2026-05-29 起,`V10.486` 補 PChome near-threshold 風險邊界NEW DIRECTIONS 甜杏仁油 vs 酪梨油直接 hard vetoCOCODOR 經典擴香瓶多款任選、KAMERIA 足膜任選三款、Hakugen 白元入浴劑橘盒/綠盒不同變體都保留 `variant_selection_review`,不進可採用 gate。Production 已部署 `/health=V10.486`240 筆 near-threshold audit `gate_pass 83→79``identity_veto 0→1``still_low 157→160`
- 2026-05-29 起,`V10.488` 新增市場情報 MCP Fetch Run Receipt 安全預覽 gate只審核操作員 dry-run receipt不執行 CLI、不抓外站、不寫 DB。
- 2026-05-29 起,`V10.489` 補 PChome 低分同款人工覆核回收與 gate-pass 風險邊界TS6 超美白香氛誘霜 120g/ml、W 修護保養蝸牛特潤修護面膜 6 片、Derma 大地 Eco 植萃護膚油 2 入,從低信心升成 `identity_review` 候選Clarins 輕盈美體護理油 vs 身體調和護理油、台塑生醫嬰兒沐浴/洗髮組合數量反轉、isLeaf 私密慕絲香型數量不一致改 hard vetoHOOOME 大理石暖燈 vs 泛稱經典款只留 `variant_selection_review`,不進 total-price accepted。
## 3. 12 Agent 決策信封整合

View File

@@ -13,6 +13,8 @@
## 📅 詳細更新日誌 (考古存檔)
### 2026-05-24PChome 近門檻身份回收第二輪
- **V10.489 PChome 低分同款人工覆核回收與 gate-pass 風險邊界**: `marketplace_product_matcher` 新增三個窄範圍 focused identityTS6 超美白香氛誘霜 120g/ml、W 修護保養蝸牛特潤修護面膜 6 片、Derma 大地 Eco 植萃護膚油 2 入。這些樣本只升到 `identity_review / manual_review`,不進 `price_alert_exact`;同版補 Clarins 身體油不同線、命名組合品數量反轉、isLeaf 香型數量不一致 hard vetoHOOOME 大理石暖燈單側設計差留人工覆核。
- **V10.488 市場情報 MCP Fetch Run Receipt gate**: 新增 `/api/market_intel/mcp_fetch_run_receipt` 與 UI preview只審核操作員 shell dry-run 後貼回的 receiptAPI 不執行 CLI、不抓外站、不寫檔、不開 DB、不掛 scheduler且會阻擋 secret/token 欄位與 side-effect flags。
- **V10.473 背景 embedding 讀取 host_health skip**: `OllamaService.generate_embedding(..., allow_111_fallback=False)` 會先查最近 `host_health_probes`;若 GCP-A/GCP-B 在 20 分鐘視窗內已由 runtime probe 標成 unhealthy背景 embedding 直接跳過該節點並開短暫 GCP circuit不等待 30 秒 timeout、不落 111。DB 讀取失敗時 fail-open 回原本 retry避免觀測層阻斷 embedding。
- **V10.472 GCP Ollama failover rootless 診斷**: 新增 `scripts/ops/diagnose_ollama_gcp_failover.sh` 與 DevOps SOP可不需 root 檢查 GCP-A/GCP-B/111 direct、110 proxy `11435/11436` 與 GCP-B `bge-m3` runtime。現況確認GCP-A `22/11434` refused、GCP-B `22/11434` open 但 SSH key denied、GCP-B embed OK、110:11435 502、110:11436 OKprimary 修復需 GCP/SSH 或 110 root 權限。
- **V10.471 GCP-B embedding timeout 校準**: GCP-B `bge-m3` `/api/embed` 直接實測約 6.4s / 7.3s / 23.5s,原 `OLLAMA_EMBED_MAX_TIMEOUT=15` 與 host health `OLLAMA_HOST_HEALTH_EMBED_TIMEOUT=8` 會誤判慢但成功的 embedding預設改為 30s。背景 embedding 仍只跑 GCP-A/GCP-B不落 111。

View File

@@ -1386,6 +1386,58 @@ def _has_pack_quantity_difference(left: ProductIdentity, right: ProductIdentity)
return False
NAMED_COMPONENT_QUANTITY_GROUPS = (
("嬰兒沐浴精", "嬰幼童洗髮精"),
("魅惑麋香", "湛藍海洋", "花妍巧語", "絲絨玫瑰"),
)
def _named_component_quantity_map(identity: ProductIdentity, terms: Iterable[str]) -> dict[str, int]:
text = identity.searchable_name
present_terms = tuple(term for term in terms if term in text)
if len(present_terms) < 2:
return {}
quantities: dict[str, int] = {}
for term in present_terms:
term_index = text.find(term)
if term_index < 0:
continue
suffix = text[term_index + len(term):term_index + len(term) + 28]
explicit_count = re.search(
r"(?:\d+(?:\.\d+)?\s*(?:ml|g|mg|毫升|公克|毫克))?\s*(?:x|乘)\s*(\d+)",
suffix,
flags=re.I,
)
if explicit_count:
quantities[term] = int(explicit_count.group(1))
if len(quantities) == len(present_terms):
return quantities
pack_counts = [
count
for count, unit in identity.counts
if _count_unit_family(unit) in COUNT_UNITS or unit in COUNT_UNITS
]
if not quantities and pack_counts and max(pack_counts) == len(present_terms) and re.search(r"[+//、]", text):
return {term: 1 for term in present_terms}
return {}
def _has_named_component_quantity_conflict(left: ProductIdentity, right: ProductIdentity) -> bool:
"""同名組合品若命名元件相同但數量反轉,不能視為同一價格標的。"""
for terms in NAMED_COMPONENT_QUANTITY_GROUPS:
left_quantities = _named_component_quantity_map(left, terms)
right_quantities = _named_component_quantity_map(right, terms)
shared_terms = set(left_quantities) & set(right_quantities)
if len(shared_terms) < 2:
continue
if any(left_quantities[term] != right_quantities[term] for term in shared_terms):
return True
return False
def _spec_score(left: ProductIdentity, right: ProductIdentity) -> tuple[float, bool, tuple[str, ...]]:
volume_score, volume_conflict = _spec_component(left.volumes_ml, right.volumes_ml)
weight_score, weight_conflict = _spec_component(left.weights_g, right.weights_g)
@@ -1977,6 +2029,9 @@ def score_marketplace_match(
reasons.append("catalog_count_omission")
if _has_pack_quantity_difference(left, right):
reasons.append("pack_quantity_difference")
named_component_quantity_conflict = _has_named_component_quantity_conflict(left, right)
if named_component_quantity_conflict:
reasons.append("named_component_quantity_conflict")
variant_descriptor_conflict = _has_variant_descriptor_conflict(left, right, shared_anchor)
sun_protection_line_conflict = (
variant_descriptor_conflict
@@ -2037,6 +2092,9 @@ def score_marketplace_match(
ingredient_line_conflict = _has_core_ingredient_line_conflict(left, right)
if ingredient_line_conflict:
reasons.append("core_ingredient_line_conflict")
clarins_body_oil_line_conflict = _has_clarins_body_oil_line_conflict(left, right)
if clarins_body_oil_line_conflict:
reasons.append("clarins_body_oil_line_conflict")
branded_powder_line_conflict = _has_branded_powder_line_conflict(left, right)
if branded_powder_line_conflict:
reasons.append("branded_powder_line_conflict")
@@ -2046,6 +2104,9 @@ def score_marketplace_match(
selection1990_wax_lamp_design_conflict = _has_selection1990_wax_lamp_design_conflict(left, right)
if selection1990_wax_lamp_design_conflict:
reasons.append("selection1990_wax_lamp_design_conflict")
hooome_wax_lamp_design_gap = _has_hooome_wax_lamp_design_gap(left, right)
if hooome_wax_lamp_design_gap:
reasons.append("hooome_wax_lamp_design_gap")
wax_lamp_size_letter_conflict = _has_wax_lamp_size_letter_conflict(left, right)
if wax_lamp_size_letter_conflict:
reasons.append("size_letter_variant_conflict")
@@ -2085,6 +2146,7 @@ def score_marketplace_match(
or relove_private_cleanser_variant_gap
or candle_catalog_selection_gap
or bath_additive_variant_gap
or hooome_wax_lamp_design_gap
or makeup_catalog_selection_gap
or loreal_serum_variant_gap
or sebamed_shampoo_variant_catalog_gap
@@ -2101,6 +2163,8 @@ def score_marketplace_match(
hard_veto = True
if multi_component_count_conflict:
hard_veto = True
if named_component_quantity_conflict:
hard_veto = True
if _has_refill_pack(left) != _has_refill_pack(right):
hard_veto = True
if accessory_case_conflict:
@@ -2149,6 +2213,8 @@ def score_marketplace_match(
hard_veto = True
if ingredient_line_conflict:
hard_veto = True
if clarins_body_oil_line_conflict:
hard_veto = True
if branded_powder_line_conflict:
hard_veto = True
if cleanser_lotion_line_conflict:
@@ -3282,6 +3348,29 @@ def _has_core_ingredient_line_conflict(left: ProductIdentity, right: ProductIden
return bool(left_groups and right_groups and not (left_groups & right_groups))
def _has_clarins_body_oil_line_conflict(left: ProductIdentity, right: ProductIdentity) -> bool:
if not ({"clarins", "克蘭詩"} & (left.brand_tokens & right.brand_tokens)):
return False
pair_text = f"{left.searchable_name} {right.searchable_name}"
if not any(term in pair_text for term in ("護理油", "身體油", "美體油", "調和護理油")):
return False
line_groups = {
"contour_lightweight": ("輕盈美體", "美體護理油", "contour"),
"tonic_body": ("身體調和", "調和護理油", "孕期身體調和", "tonic"),
}
left_groups = {
group
for group, terms in line_groups.items()
if any(term in left.searchable_name for term in terms)
}
right_groups = {
group
for group, terms in line_groups.items()
if any(term in right.searchable_name for term in terms)
}
return bool(left_groups and right_groups and left_groups.isdisjoint(right_groups))
def _has_branded_powder_line_conflict(left: ProductIdentity, right: ProductIdentity) -> bool:
if not ({"港香蘭"} & (left.brand_tokens & right.brand_tokens)):
return False
@@ -3332,6 +3421,18 @@ def _has_selection1990_wax_lamp_design_conflict(left: ProductIdentity, right: Pr
return bool(left_groups and right_groups and left_groups.isdisjoint(right_groups))
def _has_hooome_wax_lamp_design_gap(left: ProductIdentity, right: ProductIdentity) -> bool:
if "hooome" not in (left.brand_tokens & right.brand_tokens):
return False
pair_text = f"{left.searchable_name} {right.searchable_name}"
if not any(term in pair_text for term in ("香氛蠟燭暖燈", "蠟燭暖燈", "融蠟燈")):
return False
concrete_design_terms = ("大理石", "雲石", "原木", "半圓罩")
left_designs = {term for term in concrete_design_terms if term in left.searchable_name}
right_designs = {term for term in concrete_design_terms if term in right.searchable_name}
return bool(left_designs or right_designs) and left_designs != right_designs
def _standalone_size_letter_tokens(identity: ProductIdentity) -> set[str]:
text = identity.searchable_name
return {
@@ -3860,6 +3961,16 @@ def _has_focused_low_score_exact_identity_line(left: ProductIdentity, right: Pro
and _has_shared_weight(left, right, 40)
):
return "ts6_private_elastic_gel_40g"
if (
{"ts6", "護一生"} & (left.brand_tokens & right.brand_tokens)
and "超美" in left_text
and "超美" in right_text
and "香氛誘霜" in left_text
and "香氛誘霜" in right_text
and (120.0 in set(left.weights_g) or 120.0 in set(left.volumes_ml))
and (120.0 in set(right.weights_g) or 120.0 in set(right.volumes_ml))
):
return "ts6_private_white_fragrance_cream_120"
if (
{"ts6", "護一生"} & (left.brand_tokens & right.brand_tokens)
and "淨白植感慕斯" in left_text
@@ -4037,6 +4148,24 @@ def _has_focused_low_score_exact_identity_line(left: ProductIdentity, right: Pro
and _has_shared_volume(left, right, 150)
):
return "derma_eco_skin_oil"
if (
{"derma", "丹麥德瑪"} & (left.brand_tokens & right.brand_tokens)
and "大地" in left_text
and "大地" in right_text
and "植萃" in left_text
and "植萃" in right_text
and "護膚油" in left_text
and "護膚油" in right_text
and _has_exact_count_alignment(left, right)
):
return "derma_eco_skin_oil_2pack_review"
if (
{"修護保養"} & (left.brand_tokens & right.brand_tokens)
and "蝸牛特潤修護面膜" in left_text
and "蝸牛特潤修護面膜" in right_text
and _has_shared_count(left, right, 6, "")
):
return "w_repair_snail_mask_6pcs_review"
if (
{"yuskin", "悠斯晶"} & (left.brand_tokens & right.brand_tokens)
and "乳霜" in left_text

View File

@@ -1213,6 +1213,38 @@ def test_marketplace_matcher_keeps_ambiguous_ts6_white_mousse_packaging_out_of_t
assert diagnostics.alert_tier != "price_alert_exact"
def test_marketplace_matcher_promotes_safe_low_score_identity_review_samples():
from services.marketplace_product_matcher import score_marketplace_match
cases = [
(
"【TS6 護一生】超美白香氛誘霜120g 私密保養(私密美白)",
"TS6 護一生超美 白香氛誘霜(120ml)",
"focused_exact_identity_ts6_private_white_fragrance_cream_120",
),
(
"【W 修護保養】蝸牛特潤修護面膜6片 醫美術後保養 修護 保濕 皮秒(保濕優 修護強 隱形面膜 雷射術後必備)",
"【W修護保養】蝸牛特潤修護面膜 28ml 6片裝",
"focused_exact_identity_w_repair_snail_mask_6pcs_review",
),
(
"【Derma 丹麥德瑪】大地有機植萃撫紋護膚油-2入組(天然成分 適合孕哺期間使用)",
"Derma 大地 Eco 植萃護膚油2入組",
"focused_exact_identity_derma_eco_skin_oil_2pack_review",
),
]
for momo_name, competitor_name, expected_reason in cases:
diagnostics = score_marketplace_match(momo_name, competitor_name)
assert diagnostics.hard_veto is False
assert diagnostics.score >= 0.76
assert diagnostics.price_basis == "manual_review"
assert diagnostics.alert_tier == "identity_review"
assert diagnostics.match_type in {"comparable", "exact"}
assert expected_reason in diagnostics.reasons
assert "variant_selection_review" not in diagnostics.reasons
def test_marketplace_matcher_keeps_one_sided_candle_catalog_selection_in_review():
from services.marketplace_product_matcher import score_marketplace_match
@@ -1293,6 +1325,53 @@ def test_marketplace_matcher_sends_bath_additive_box_variants_to_review():
assert "variant_selection_review" in diagnostics.reasons
def test_marketplace_matcher_rejects_clarins_body_oil_line_conflict():
from services.marketplace_product_matcher import score_marketplace_match
diagnostics = score_marketplace_match(
"【CLARINS 克蘭詩】輕盈美體護理油100ml",
"CLARINS 克蘭詩身體調和護理油100ml",
)
assert diagnostics.hard_veto is True
assert diagnostics.comparison_mode == "not_comparable"
assert "clarins_body_oil_line_conflict" in diagnostics.reasons
def test_marketplace_matcher_rejects_named_component_quantity_reversal():
from services.marketplace_product_matcher import score_marketplace_match
baby_kit = score_marketplace_match(
"【台塑生醫】嬰兒沐浴洗髮超值3件組(嬰兒沐浴精500g*1+嬰幼童洗髮精500g*2)",
"台塑生醫FORTE 嬰幼童洗髮精500g*1+嬰兒沐浴精500g*2",
)
private_mousse = score_marketplace_match(
"【isLeaf】韓國男性私密激淨慕絲200ml二入組(魅惑麋香+湛藍海洋)",
"韓國isLeaf 男性私密激淨慕絲200ml 湛藍海洋x2+魅惑麋香x2",
)
for diagnostics in (baby_kit, private_mousse):
assert diagnostics.hard_veto is True
assert diagnostics.comparison_mode == "not_comparable"
assert "named_component_quantity_conflict" in diagnostics.reasons
def test_marketplace_matcher_sends_hooome_one_sided_design_gap_to_review():
from services.marketplace_product_matcher import score_marketplace_match
diagnostics = score_marketplace_match(
"【HOOOME】經典大理石 香氛蠟燭暖燈-黑色",
"HOOOME 經典款香氛蠟燭暖燈 黑色",
)
assert diagnostics.hard_veto is False
assert diagnostics.score >= 0.76
assert diagnostics.price_basis == "manual_review"
assert diagnostics.alert_tier == "identity_review"
assert "hooome_wax_lamp_design_gap" in diagnostics.reasons
assert "variant_selection_review" in diagnostics.reasons
def test_marketplace_matcher_keeps_kiehls_no1_lip_balm_as_product_line_not_color_number():
from services.marketplace_product_matcher import score_marketplace_match