補強 PChome 精準同款 total price 通道

This commit is contained in:
OoO
2026-05-25 12:01:57 +08:00
parent 7e29c00eb8
commit 5c0ff7f8cf
7 changed files with 191 additions and 6 deletions

View File

@@ -4,6 +4,7 @@
================================================================================
【已完成】
- V10.467 補 PChome focused exact total-price 安全通道:針對正式近門檻樣本中已確認同品牌、同品名、同規格/同入數的 3W CLINIC 粉底液 2入、花美水凝膠 3支、The Ordinary 咖啡因 EGCG 30ml、KUSSEN 屁屁膏 3入、Bone 擴香禮盒、1990 融燭燈白色款與 CANMAKE 淚袋盤,從 `exact/manual_review` 收斂為 `exact/total_price`;未放寬 `MIN_MATCH_SCORE`DASHING DIVA、唇彩、香味、色號/款式敏感商品仍維持 variant / veto 保護。
- V10.466 修正 rescore audit duplicate 判斷:只在「最新 attempt 已是同候選 `rescore_accepted_current`」時跳過;若歷史曾 accepted、但後續 crawler 又追加低信心列,允許重新 materialize避免 Dashboard latest-state 仍停在 `true_low_confidence`。Production pilot 已將 SKU `14756069`、`11159042`、`13842560`、`8394210`、`15192547`、`10509765`、`10603780` 送入人工覆核隊列;只寫 `competitor_match_attempts``competitor_prices` / `competitor_price_history` 未變。
- V10.465 修正 embedding fallback-disabled 控制流:`allow_111_fallback=False` 時若 resolver 回 111不再直接退出或只試單台 GCP-B會強制改試尚未嘗試的 GCP-A/GCP-B背景 embedding 仍不落 111。
- V10.464 補 rescore audit 精準 SKU pilot`audit_competitor_match_attempt_rescore.py --sku` 可只掃指定 SKU再搭配 `--apply-accepted` 只把通過新版 matcher 的目標 SKU 追加到 `rescore_accepted_current` 人工覆核隊列,不寫正式價格表。

View File

@@ -325,7 +325,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
# ==========================================
# 系統版本與路徑
# ==========================================
SYSTEM_VERSION = "V10.466"
SYSTEM_VERSION = "V10.467"
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
public_url = PUBLIC_URL # 用於模板顯示

View File

@@ -2,7 +2,7 @@
> **最後更新**: 2026-05-25 (台北時間)
> **狀態**: 🟢 四 AI Agent 自動化閉環已落地LLM 路由紅線升級為 Ollama-first 三主機級聯Gemini 備援預設關閉
> **適用版本**: V10.466
> **適用版本**: V10.467
---

View File

@@ -20,6 +20,7 @@
- 2026-05-25 08:18 CST 狀態:`main` 已推 Gitea 並部署到 188正式 `/health``V10.462`。本輪 recreate `momo-app``scheduler``telegram-bot`;未使用 `--remove-orphans`,未碰 `momo-db`。Smoke 通過:三個 app 容器 healthy、Dashboard / AI 中樞 / API / 前端 confirm 均改用「PChome 補抓產線 / 補抓未搜尋 / 未搜尋補抓」、Gemini hard disabled 且 24 小時 `ai_calls` 無 Gemini provider、Ollama 順序維持 GCP-A → GCP-B → 1115 分鐘三容器錯誤 log 未見 Traceback / ERROR / IntegrityError。
- 2026-05-25 08:38 CST 狀態:`main` 已推 Gitea 並部署到 188正式 `/health``V10.464`。本輪 recreate `momo-app``scheduler``telegram-bot`;未使用 `--remove-orphans`,未碰 `momo-db`。Smoke 通過:三個 app 容器 healthy、`/``/?filter=pchome_review``/daily_sales``/growth_analysis``/observability/ppt_audit_history`、PChome rescore queue API HTTP 200。DR.WU 三筆 SKU read-only rescore 全數 `gate_pass=3/3``--apply-accepted` 後 latest 狀態為 `rescore_accepted_current``best_match_score=1.0``price_basis=total_price`;整體 latest counts 變為 `true_low_confidence=778``rescore_accepted_current=34`。5 分鐘 log 未見 Traceback但有既有 `[Embed] all hosts failed` 錯誤,需列入下一輪 Ollama embedding 健康檢查。
- 2026-05-25 10:04 CST 狀態:`main` 已推 Gitea 並部署到 188正式 `/health``V10.465`。本輪 recreate `momo-app``scheduler``telegram-bot`;未使用 `--remove-orphans`,未碰 `momo-db`。Smoke 通過:三個 app 容器 healthy、`/``/daily_sales``/growth_analysis``/observability/ppt_audit_history`、PChome rescore queue API HTTP 200容器內 routing smoke 證明 resolver 回 111 且 `allow_111_fallback=false` 時會改試 GCP-A/GCP-B輸出 `tried=['http://34.143.170.20:11434','http://34.21.145.224:11434']`;真實短 embedding 在 GCP-A `/api/version` timeout、GCP-B 200 情境下成功回 1024 維向量,耗時 4.59 秒。3 分鐘三容器錯誤 log 未見 Traceback / ERROR / CRITICAL。
- 2026-05-25 12:05 CST 狀態:`main` 已部署到 188正式 `/health``V10.467`,待推 Gitea。兩段變更已合併驗證V10.466 rescore duplicate 改看 latest-state7 筆 SKU 最新 attempt 全為 `rescore_accepted_current``competitor_prices` / `competitor_price_history` 目標計數未變V10.467 focused exact matcher 在容器內回 `exact / total_price / price_alert_exact`。本輪 recreate `momo-app``scheduler``telegram-bot`;未使用 `--remove-orphans`,未碰 `momo-db`。Smoke 通過:三容器 healthy、PChome rescore queue API HTTP 200、Gemini 24 小時無 provider 紀錄、Ollama env 順序維持 GCP-A → GCP-B → 111、3 分鐘三容器 log 未見 Traceback / ERROR / CRITICAL / IntegrityError。
## 1. MOMO / PChome 核心比價準確率
@@ -54,6 +55,7 @@
- 2026-05-25 08:25 CST 起,`DR.WU / DR WU / DRWU / 達爾膚` 視為同一品牌 alias正式樣本中的 DR.WU 玻尿酸保濕精華乳 50ML、2入組與杏仁酸亮白煥膚精華 18% 30ML 2入組在不調整全域門檻下可由 brandless identity review 回到 exact total-price lane。
- 2026-05-25 08:36 CST production pilotSKU `10362820``10653216``10653329` 已從 `true_low_confidence` materialize 為 `rescore_accepted_current`,只進人工覆核隊列,不寫 `competitor_prices`
- 2026-05-25 11:55 CST 起rescore audit duplicate 判斷只看最新 attempt若歷史已有 accepted 但後續 crawler 又追加低信心列,可重新 materialize 成最新 `rescore_accepted_current`。Production pilot 已將 SKU `14756069``11159042``13842560``8394210``15192547``10509765``10603780` 入人工覆核隊列;正式 `competitor_prices` / `competitor_price_history` 未寫入或改變。
- 2026-05-25 12:20 CST 起matcher 新增 `focused_exact_total_price_safe` 窄範圍通道;目前只覆蓋 3W CLINIC 粉底液 2入、花美水凝膠 3支、The Ordinary 咖啡因 EGCG 30ml、KUSSEN 屁屁膏 3入、Bone 擴香禮盒、1990 融燭燈白色款與 CANMAKE 淚袋盤等已確認同款樣本。這讓高信心 `exact/manual_review` 能轉為 `exact/total_price` 供 rescore pilot 入人工覆核DASHING DIVA、唇彩、香味、色號/款式敏感商品仍不放行。
## 3. 12 Agent 決策信封整合

View File

@@ -13,6 +13,7 @@
## 📅 詳細更新日誌 (考古存檔)
### 2026-05-24PChome 近門檻身份回收第二輪
- **V10.467 Focused exact total-price 安全通道**: `marketplace_product_matcher` 新增窄範圍 `focused_exact_total_price_safe` lane僅針對正式近門檻樣本中同品牌、同品名、同規格/同入數的 3W CLINIC 粉底液 2入、花美水凝膠 3支、The Ordinary 咖啡因 EGCG 30ml、KUSSEN 屁屁膏 3入、Bone 擴香禮盒、1990 融燭燈白色款與 CANMAKE 淚袋盤,讓 `exact/manual_review` 可升到 `exact/total_price/price_alert_exact`;未放寬 `MIN_MATCH_SCORE`DASHING DIVA、唇彩、香味、色號/款式敏感商品仍維持 variant / veto 保護。
- **V10.466 Rescore latest-state duplicate 修正與 7 SKU pilot**: `materialize_rescore_accept_reviews()` 的 duplicate 判斷改看最新 attempt而不是歷史任一 accepted若後續 crawler 又把同 SKU/候選覆蓋成 `true_low_confidence`,可重新追加 `rescore_accepted_current` 讓 Dashboard latest-state 正確進人工覆核。Production pilot 已將 SKU `14756069``11159042``13842560``8394210``15192547``10509765``10603780` materialize 到人工覆核隊列;`competitor_prices` 目標計數維持 7、`competitor_price_history` 目標計數維持 210未寫正式價差表。
- **V10.465 Embedding GCP fallback 修正**: `OllamaService.generate_embedding(..., allow_111_fallback=False)` 若 resolver 因 unhealthy cache 回 111會強制改試尚未嘗試的 GCP-A/GCP-B不再直接 `break` 造成 `tried=[]` 或只試單台 GCP-B背景 embedding 仍不允許落 111。
- **V10.464 Rescore SKU pilot 篩選**: `audit_competitor_match_attempt_rescore.py``fetch_match_attempt_rescore_rows()` 增加 `--sku` / `skus` 篩選,可針對 DR.WU 這類明確 cohort 做 3-10 筆精準 materialize不必為了 pilot 掃整批 `true_low_confidence`

View File

@@ -487,6 +487,18 @@ FOCUSED_IDENTITY_BRANDLESS_REVIEW_REASONS = {
"the_forest_maple_diffuser_flower_brandless",
}
FOCUSED_IDENTITY_TOTAL_PRICE_REASONS = {
"3w_clinic_collagen_foundation_50ml_2pack",
"hanamisui_moisture_original_gel_1_7g_3pack",
"hanamisui_inclear_private_gel_1_7g_3pack",
"the_ordinary_caffeine_egcg_30ml",
"kussen_baby_butt_cream_50ml_3pack",
"bone_diffuser_gift_3pack",
"selection1990_half_dome_wax_lamp_white",
"selection1990_bendable_wax_lamp_white",
"canmake_tear_bag_palette",
}
SEARCH_BROAD_ANCHORS = {
"乳霜",
"面霜",
@@ -1807,13 +1819,24 @@ def _classify_match_quality(
return "no_match", "none", "suppress"
direct_spec_evidence = spec_score >= 0.85 or bool(shared_models)
focused_total_price_safe = "focused_exact_total_price_safe" in reason_set
strong_identity_evidence = (
brand_score >= 0.95
and type_score >= 0.55
and score >= 0.86
and (direct_spec_evidence or (shared_anchor and token_score >= 0.62 and sequence_score >= 0.58))
(
brand_score >= 0.95
and type_score >= 0.55
and score >= 0.86
and (direct_spec_evidence or (shared_anchor and token_score >= 0.62 and sequence_score >= 0.58))
)
or (
focused_total_price_safe
and brand_score >= 0.95
and type_score >= 0.55
and score >= 0.86
)
)
if strong_identity_evidence and not catalog_count_omission:
if focused_total_price_safe and "variant_selection_review" not in reason_set:
return "exact", "total_price", "price_alert_exact"
if multi_component_pair or "variant_selection_review" in reason_set:
return "exact", "manual_review", "identity_review"
return "exact", "total_price", "price_alert_exact"
@@ -2024,6 +2047,22 @@ def score_marketplace_match(
)
)
)
focused_exact_total_price_safe = (
focused_exact_line_reason in FOCUSED_IDENTITY_TOTAL_PRICE_REASONS
and brand_score >= 0.95
and not hard_veto
and spec_score >= 0.45
and token_score >= 0.30
and sequence_score >= 0.40
and (
not variant_descriptor_conflict
or focused_exact_line_reason == "hanamisui_inclear_private_gel_1_7g_3pack"
)
and "variant_selection_review" not in reasons
)
if focused_exact_total_price_safe:
reasons.append("focused_exact_total_price_safe")
reasons.append(f"focused_exact_identity_{focused_exact_line_reason}")
comparison_mode = "exact_identity"
if _is_unit_comparable_candidate(
@@ -3393,6 +3432,86 @@ def _has_focused_low_score_exact_identity_line(left: ProductIdentity, right: Pro
and "24張入" in right_text
):
return "gatsby_body_wipes_24"
if (
{"3w", "clinic"} <= (left.brand_tokens & right.brand_tokens)
and "膠原蛋白粉底液" in left_text
and "膠原蛋白粉底液" in right_text
and _has_shared_volume(left, right, 50)
and _has_exact_count_alignment(left, right)
):
return "3w_clinic_collagen_foundation_50ml_2pack"
if (
"花美水" in (left.brand_tokens & right.brand_tokens)
and "moisture" in (left.brand_tokens & right.brand_tokens)
and "保濕修護" in left_text
and "保濕修護" in right_text
and "精華凝膠" in left_text
and "精華凝膠" in right_text
and ("原黃金" in left_text and "原黃金" in right_text)
and _has_shared_weight(left, right, 1.7)
and _has_exact_count_alignment(left, right)
):
return "hanamisui_moisture_original_gel_1_7g_3pack"
if (
"花美水" in (left.brand_tokens & right.brand_tokens)
and "inclear" in (left.brand_tokens & right.brand_tokens)
and ("櫻克麗兒" in left_text and "櫻克麗兒" in right_text)
and ("私密淨化凝膠" in left_text and "私密淨化凝膠" in right_text)
and _has_shared_weight(left, right, 1.7)
and _has_exact_count_alignment(left, right)
):
return "hanamisui_inclear_private_gel_1_7g_3pack"
if (
"ordinary" in (left.brand_tokens & right.brand_tokens)
and "咖啡因" in left_text
and "咖啡因" in right_text
and "egcg" in left_raw
and "egcg" in right_raw
and "兒茶眼部配方" in left_text
and "兒茶眼部配方" in right_text
and _has_shared_volume(left, right, 30)
):
return "the_ordinary_caffeine_egcg_30ml"
if (
{"kussen", "葵森"} & (left.brand_tokens & right.brand_tokens)
and "寶寶益菌屁屁膏" in left_text
and "寶寶益菌屁屁膏" in right_text
and _has_shared_volume(left, right, 50)
and _has_exact_count_alignment(left, right)
):
return "kussen_baby_butt_cream_50ml_3pack"
if (
"bone" in (left.brand_tokens & right.brand_tokens)
and "擴香禮盒三入組" in left_text
and "擴香禮盒三入組" in right_text
and all(component in left_text and component in right_text for component in ("原木麋鹿", "搖搖貓頭鷹", "薰衣草精油"))
and _has_exact_count_alignment(left, right)
):
return "bone_diffuser_gift_3pack"
if (
{"1990", "選物"} <= (left.brand_tokens & right.brand_tokens)
and "現代簡約半圓罩融燭燈" in left_text
and "現代簡約半圓罩融燭燈" in right_text
and "白色款" in left_text
and "白色款" in right_text
):
return "selection1990_half_dome_wax_lamp_white"
if (
{"1990", "選物"} <= (left.brand_tokens & right.brand_tokens)
and "歐式可彎融燭燈" in left_text
and "歐式可彎融燭燈" in right_text
and "白色款" in left_text
and "白色款" in right_text
):
return "selection1990_bendable_wax_lamp_white"
if (
"canmake" in (left.brand_tokens & right.brand_tokens)
and "淚袋專用盤" in left_text
and "淚袋專用盤" in right_text
and "淚袋眼影盤" in left_text
and "淚袋眼影盤" in right_text
):
return "canmake_tear_bag_palette"
return ""

View File

@@ -527,6 +527,68 @@ def test_marketplace_matcher_promotes_ludeya_line_with_platform_name_drift():
assert "shared_identity_anchor" in diagnostics.reasons or "shared_identity_anchor_no_spec" in diagnostics.reasons
def test_marketplace_matcher_promotes_focused_exact_pack_rows_to_total_price():
from services.marketplace_product_matcher import score_marketplace_match
cases = [
(
"【3W CLINIC】膠原蛋白粉底液50mlX2入(膠原 保濕 清透 服貼 長效持久)",
"【韓國 3W CLINIC】膠原蛋白粉底液50mlx2入",
"focused_exact_identity_3w_clinic_collagen_foundation_50ml_2pack",
),
(
"【花美水】Moisture保濕修護精華凝膠-原黃金(1.7g x 3支/盒)",
"【花美水】Moisture 保濕修護 精華凝膠(原黃金)(1.7g*3支入)/盒",
"focused_exact_identity_hanamisui_moisture_original_gel_1_7g_3pack",
),
(
"【花美水】Inclear 櫻克麗兒一次性私密淨化凝膠(1.7g x 3支/盒)",
"【花美水】Inclear櫻克麗兒私密淨化凝膠(1.7g*3支)/盒",
"focused_exact_identity_hanamisui_inclear_private_gel_1_7g_3pack",
),
(
"【The Ordinary】Caffeine Solution 咖啡因 + EGCG兒茶眼部配方30ml",
"The Ordinary 5%咖啡因 + EGCG兒茶眼部配方 (30ml)",
"focused_exact_identity_the_ordinary_caffeine_egcg_30ml",
),
(
"【KUSSEN 葵森】寶寶益菌屁屁膏 50ml 3入(易敏肌 屁屁霜 紅屁屁 尿布膏 尿布區照護)",
"【KUSSEN 葵森】寶寶益菌屁屁膏 50ml 3入",
"focused_exact_identity_kussen_baby_butt_cream_50ml_3pack",
),
(
"【Bone 蹦克】擴香禮盒三入組 原木麋鹿+搖搖貓頭鷹+薰衣草精油(交換禮物 香氛 擴香木 )",
"Bone / 擴香禮盒三入組 - 原木麋鹿+搖搖貓頭鷹+薰衣草精油",
"focused_exact_identity_bone_diffuser_gift_3pack",
),
(
"【1990選物】現代簡約半圓罩融燭燈 香氛蠟燭暖燈-白色款( 送禮) 生日禮物 香氛蠟燭燈)",
"【1990選物】現代簡約半圓罩融燭燈 香氛蠟燭暖燈-白色款",
"focused_exact_identity_selection1990_half_dome_wax_lamp_white",
),
(
"【1990選物】歐式可彎融燭燈 香氛蠟燭暖燈-白色款( 送禮) 生日禮物 香氛蠟燭燈)",
"【1990選物】歐式可彎融燭燈 香氛蠟燭暖燈-白色款",
"focused_exact_identity_selection1990_bendable_wax_lamp_white",
),
(
"【CANMAKE】淚袋專用盤(淚袋眼影盤)",
"台隆手創館 CANMAKE淚袋專用盤(淚袋眼影盤)",
"focused_exact_identity_canmake_tear_bag_palette",
),
]
for momo_name, competitor_name, expected_reason in cases:
diagnostics = score_marketplace_match(momo_name, competitor_name, momo_price=1000, competitor_price=900)
assert diagnostics.score >= 0.76, (momo_name, diagnostics)
assert diagnostics.hard_veto is False
assert diagnostics.match_type == "exact"
assert diagnostics.price_basis == "total_price"
assert diagnostics.alert_tier == "price_alert_exact"
assert expected_reason in diagnostics.reasons
assert "focused_exact_total_price_safe" in diagnostics.reasons
def test_marketplace_matcher_promotes_recipe_box_marketing_line_drift():
from services.marketplace_product_matcher import score_marketplace_match