修正 PChome rescore 重複入隊判斷

This commit is contained in:
OoO
2026-05-25 12:00:12 +08:00
parent 61816c90e8
commit 7e29c00eb8
7 changed files with 63 additions and 9 deletions

View File

@@ -4,6 +4,7 @@
================================================================================
【已完成】
- V10.466 修正 rescore audit duplicate 判斷:只在「最新 attempt 已是同候選 `rescore_accepted_current`」時跳過;若歷史曾 accepted、但後續 crawler 又追加低信心列,允許重新 materialize避免 Dashboard latest-state 仍停在 `true_low_confidence`。Production pilot 已將 SKU `14756069`、`11159042`、`13842560`、`8394210`、`15192547`、`10509765`、`10603780` 送入人工覆核隊列;只寫 `competitor_match_attempts``competitor_prices` / `competitor_price_history` 未變。
- V10.465 修正 embedding fallback-disabled 控制流:`allow_111_fallback=False` 時若 resolver 回 111不再直接退出或只試單台 GCP-B會強制改試尚未嘗試的 GCP-A/GCP-B背景 embedding 仍不落 111。
- V10.464 補 rescore audit 精準 SKU pilot`audit_competitor_match_attempt_rescore.py --sku` 可只掃指定 SKU再搭配 `--apply-accepted` 只把通過新版 matcher 的目標 SKU 追加到 `rescore_accepted_current` 人工覆核隊列,不寫正式價格表。
- V10.463 補 DR.WU / 達爾膚品牌 alias同規格 `DR.WU 達爾膚` 與 `DR.WU` 候選不再被當成 brandless identity review會以既有 exact_identity / total_price / price_alert_exact 閘門處理;未調整 `MIN_MATCH_SCORE`,保留 variant / hard veto 保護。

View File

@@ -325,7 +325,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
# ==========================================
# 系統版本與路徑
# ==========================================
SYSTEM_VERSION = "V10.465"
SYSTEM_VERSION = "V10.466"
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
public_url = PUBLIC_URL # 用於模板顯示

View File

@@ -1,8 +1,8 @@
# MOMO PRO — AI 競價情報模組 Single Source of Truth
> **最後更新**: 2026-05-24 (台北時間)
> **最後更新**: 2026-05-25 (台北時間)
> **狀態**: 🟢 四 AI Agent 自動化閉環已落地LLM 路由紅線升級為 Ollama-first 三主機級聯Gemini 備援預設關閉
> **適用版本**: V10.465
> **適用版本**: V10.466
---

View File

@@ -53,6 +53,7 @@
- 2026-05-25 08:30 CST 起rescore audit 支援 `--sku` repeatable 精準篩選production pilot 可只指定 3-10 個 SKU 執行 read-only audit 或 `--apply-accepted`,避免寬範圍掃描誤把不同 cohort 混在同一次驗證。
- 2026-05-25 08:25 CST 起,`DR.WU / DR WU / DRWU / 達爾膚` 視為同一品牌 alias正式樣本中的 DR.WU 玻尿酸保濕精華乳 50ML、2入組與杏仁酸亮白煥膚精華 18% 30ML 2入組在不調整全域門檻下可由 brandless identity review 回到 exact total-price lane。
- 2026-05-25 08:36 CST production pilotSKU `10362820``10653216``10653329` 已從 `true_low_confidence` materialize 為 `rescore_accepted_current`,只進人工覆核隊列,不寫 `competitor_prices`
- 2026-05-25 11:55 CST 起rescore audit duplicate 判斷只看最新 attempt若歷史已有 accepted 但後續 crawler 又追加低信心列,可重新 materialize 成最新 `rescore_accepted_current`。Production pilot 已將 SKU `14756069``11159042``13842560``8394210``15192547``10509765``10603780` 入人工覆核隊列;正式 `competitor_prices` / `competitor_price_history` 未寫入或改變。
## 3. 12 Agent 決策信封整合

View File

@@ -13,6 +13,7 @@
## 📅 詳細更新日誌 (考古存檔)
### 2026-05-24PChome 近門檻身份回收第二輪
- **V10.466 Rescore latest-state duplicate 修正與 7 SKU pilot**: `materialize_rescore_accept_reviews()` 的 duplicate 判斷改看最新 attempt而不是歷史任一 accepted若後續 crawler 又把同 SKU/候選覆蓋成 `true_low_confidence`,可重新追加 `rescore_accepted_current` 讓 Dashboard latest-state 正確進人工覆核。Production pilot 已將 SKU `14756069``11159042``13842560``8394210``15192547``10509765``10603780` materialize 到人工覆核隊列;`competitor_prices` 目標計數維持 7、`competitor_price_history` 目標計數維持 210未寫正式價差表。
- **V10.465 Embedding GCP fallback 修正**: `OllamaService.generate_embedding(..., allow_111_fallback=False)` 若 resolver 因 unhealthy cache 回 111會強制改試尚未嘗試的 GCP-A/GCP-B不再直接 `break` 造成 `tried=[]` 或只試單台 GCP-B背景 embedding 仍不允許落 111。
- **V10.464 Rescore SKU pilot 篩選**: `audit_competitor_match_attempt_rescore.py``fetch_match_attempt_rescore_rows()` 增加 `--sku` / `skus` 篩選,可針對 DR.WU 這類明確 cohort 做 3-10 筆精準 materialize不必為了 pilot 掃整批 `true_low_confidence`
- **V10.463 DR.WU / 達爾膚品牌 alias**: `marketplace_product_matcher``DR.WU / DR WU / DRWU / 達爾膚` 正規化,讓正式樣本中同規格玻尿酸保濕精華乳、杏仁酸亮白煥膚精華不再因品牌 token 不同被降成 brandless identity review測試鎖住 exact / total_price / price_alert_exact。

View File

@@ -231,20 +231,22 @@ def _ensure_attempt_table(conn) -> None:
def _already_materialized(conn, *, source: str, sku: str, candidate_id: str) -> bool:
row = conn.execute(text("""
SELECT 1
SELECT attempt_status, COALESCE(best_competitor_product_id, '') AS candidate_id
FROM competitor_match_attempts
WHERE sku = :sku
AND source = :source
AND attempt_status = :attempt_status
AND COALESCE(best_competitor_product_id, '') = :candidate_id
ORDER BY attempted_at DESC, id DESC
LIMIT 1
"""), {
"sku": sku,
"source": source,
"attempt_status": RESCORE_ACCEPTED_CURRENT_STATUS,
"candidate_id": candidate_id,
}).first()
return row is not None
if row is None:
return False
return (
row.attempt_status == RESCORE_ACCEPTED_CURRENT_STATUS
and row.candidate_id == candidate_id
)
def materialize_rescore_accept_reviews(

View File

@@ -210,6 +210,55 @@ def test_match_attempt_rescore_materializes_accepted_current_for_manual_review()
assert "matcher_rescore=accepted_current" in stored[0]["error_message"]
def test_match_attempt_rescore_materialize_allows_requeue_when_latest_is_low_confidence():
from services.competitor_match_attempt_rescore_audit import materialize_rescore_accept_reviews
engine = create_engine("sqlite:///:memory:")
rows = [{
"sku": "10509765",
"attempt_status": "true_low_confidence",
"momo_product_id": 10509765,
"momo_product_name": "【悠斯晶】經典乳霜120g(2入組)",
"momo_price": 599,
"candidate_count": 1,
"best_competitor_product_id": "YUSKIN-120G-2",
"best_competitor_product_name": "【Yuskin悠斯晶】經典乳霜 2盒組(120g/盒)",
"best_competitor_price": 540,
"best_match_score": 0.779,
}]
with engine.begin() as conn:
initial_stats = materialize_rescore_accept_reviews(conn, rows)
duplicate_stats = materialize_rescore_accept_reviews(conn, rows)
conn.execute(text("""
INSERT INTO competitor_match_attempts
(sku, source, attempt_status, momo_product_id, momo_product_name,
momo_price, candidate_count, best_competitor_product_id,
best_competitor_product_name, best_competitor_price,
best_match_score, diagnostic_codes, error_message, attempted_at)
VALUES
('10509765', 'pchome', 'true_low_confidence', 10509765,
'【悠斯晶】經典乳霜120g(2入組)', 599, 1, 'YUSKIN-120G-2',
'【Yuskin悠斯晶】經典乳霜 2盒組(120g/盒)', 540, 0.779,
'["strong_exact_spec_match"]', 'later crawler low-confidence row',
CURRENT_TIMESTAMP)
"""))
requeue_stats = materialize_rescore_accept_reviews(conn, rows)
latest_status = conn.execute(text("""
SELECT attempt_status
FROM competitor_match_attempts
WHERE sku = '10509765'
ORDER BY attempted_at DESC, id DESC
LIMIT 1
""")).scalar_one()
assert initial_stats["materialized"] == 1
assert duplicate_stats["skipped_duplicate"] == 1
assert requeue_stats["materialized"] == 1
assert requeue_stats["skipped_duplicate"] == 0
assert latest_status == "rescore_accepted_current"
def test_match_attempt_rescore_retracts_variant_review_from_accepted_queue():
from services.competitor_match_attempt_rescore_audit import (
fetch_variant_rescore_accept_review_rows,