V10.560 串接手動比價補強流程

2026-06-01 21:22:52 +08:00
parent 42b1c25418
commit 07db301c54
5 changed files with 77 additions and 13 deletions
--- a/TODO_NEXT_STEPS.txt
+++ b/TODO_NEXT_STEPS.txt
@@ -4,6 +4,7 @@
 ================================================================================

 【已完成】
+   - V10.560 串起手動 PChome 比價補強三段式流程：`/api/ai/pchome-match/backfill` 現在不只跑近門檻重評與未配對補抓，也會先用小批次 `run_expired_identity_refresh()` 刷新已知 `identity_v2` 舊價格，讓操作員按一次補強就能同時處理「舊 identity 新鮮度」、「near-threshold low_score」與「pending identity」三條主線。結果 payload 新增 `stale_identity_refresh` 分段統計，方便後續 Dashboard / 簡報 / AI 決策知道覆蓋率改善是來自刷新、重評或補抓。
   - V10.559 收斂 retryable 有效身份新鮮度：`_fetch_retryable_candidate_skus()` 不再把 `expires_at IS NULL` 的舊 PChome `identity_v2` 當成有效阻擋條件，只有明確 `expires_at > CURRENT_TIMESTAMP` 的新鮮 identity 才會阻止 near-threshold revalidation。未知新鮮度仍走 V10.551 的 expired / recovery 刷新入口，重評後仍必須通過最新版 matcher、hard-veto、auto write safety 與既有正式候選覆寫保護，避免為了拉覆蓋率犧牲準確率。
   - V10.558 補 legacy focused identity reason 回刷窄門：舊 attempt 若沒有新版 `focused_exact_total_price_safe` marker，但已帶具名 `focused_exact_identity_*` 且該 identity 屬於 matcher total-price safe set，並且舊分數已達全域 `MIN_MATCH_SCORE`，可進近門檻重評。這補上歷史資料缺 marker 的漏接情境；仍要求無 hard veto、`exact_identity`、無 commercial / variant / count / bundle 阻擋，最後由最新版 matcher 決定是否能寫正式價差。
   - V10.557 收緊 focused reason-based 回刷 guard：上一版 reason-based 回刷現在不只要求 `focused_exact_total_price_safe`，還必須同時命中一條具名 `focused_exact_identity_*` 且該 identity 屬於 matcher 的 total-price safe set。這避免未來只有總開關、缺少具名身份證據的舊 attempt 被納入回刷；rom&nd / Solone / Summer’s Eve 等 review-only focused line 仍被測試鎖在自動價差線外。
--- a/config.py
+++ b/config.py
@@ -402,7 +402,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
 # ==========================================
 # 系統版本與路徑
 # ==========================================
-SYSTEM_VERSION = "V10.559"
+SYSTEM_VERSION = "V10.561"
 LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
 public_url = PUBLIC_URL  # 用於模板顯示

--- a/docs/memory/history_logs.md
+++ b/docs/memory/history_logs.md
@@ -13,6 +13,7 @@
 ## 📅 詳細更新日誌 (考古存檔)

 ### 2026-06-01：PChome 比價新鮮度操作閉環
+- **V10.560 手動 PChome 比價補強三段式串接**: `/api/ai/pchome-match/backfill` 與每日 scheduler 口徑對齊，手動執行時先小批次刷新過期 `identity_v2`，再跑近門檻候選重評，最後補抓高優先未配對商品。回傳結果新增 `stale_identity_refresh` 分段統計，讓後續 Dashboard、簡報與 AI 決策能區分覆蓋率改善來自舊 identity 新鮮度回補、matcher 回刷，還是 fresh search 補抓。
 - **V10.559 retryable 有效身份新鮮度收斂**: `_fetch_retryable_candidate_skus()` 的既有 identity 阻擋條件改成只接受 `cp.expires_at > CURRENT_TIMESTAMP`，不再讓 `expires_at IS NULL` 的未知新鮮度舊配對壓住近門檻候選回刷。未知新鮮度仍由 expired identity refresh / recovery 路徑處理，最後寫入仍必須通過現行 matcher、hard-veto、auto write safety 與 stronger existing production match 保護。
 - **V10.558 legacy focused identity reason 回刷補漏**: `_fetch_retryable_candidate_skus()` 在 V10.557 的具名 identity guard 之外，補上歷史資料缺 marker 的情境：舊 attempt 若沒有新版 `focused_exact_total_price_safe`，但已有具名 `focused_exact_identity_*` 且該 identity 屬於 matcher total-price safe set，並且舊分數已達全域 `MIN_MATCH_SCORE`，可進近門檻重評。仍要求無 hard veto、`exact_identity`、無 commercial / variant / count / bundle 阻擋，最後由最新版 matcher 決定是否能寫正式價差。
 - **V10.557 focused reason-based 回刷具名 identity guard**: V10.555 的結構化 reason 回刷再收緊，`_fetch_retryable_candidate_skus()` 不只要求 `focused_exact_total_price_safe`，還必須同時命中一條具名 `focused_exact_identity_*` 且該 identity 來自 matcher 的 total-price safe set。這避免只有總開關、缺少身份線索的舊 attempt 被納入回刷；rom&nd、Solone、Summer’s Eve 等 review-only focused line 仍被測試鎖在自動價差線外。
--- a/routes/ai_routes.py
+++ b/routes/ai_routes.py
@@ -1748,6 +1748,19 @@ def _feeder_result_payload(result):
    }


+def _empty_feeder_result_payload():
+    return {
+        'total_skus': 0,
+        'matched': 0,
+        'skipped_no_result': 0,
+        'skipped_low_score': 0,
+        'errors': 0,
+        'history_written': 0,
+        'attempts_written': 0,
+        'duration_sec': 0.0,
+    }
+
+
 def _pick_result_payload(result):
    return {
        'candidates': int(getattr(result, 'candidates', 0) or 0),
@@ -1756,18 +1769,53 @@ def _pick_result_payload(result):
    }


-def _combined_feeder_payload(revalidation_result, feeder_result):
+def _combined_feeder_payload(revalidation_result, feeder_result, stale_refresh_result=None):
+    stale_refresh_payload = (
+        _feeder_result_payload(stale_refresh_result)
+        if stale_refresh_result is not None
+        else _empty_feeder_result_payload()
+    )
    revalidation_payload = _feeder_result_payload(revalidation_result)
    feeder_payload = _feeder_result_payload(feeder_result)
    return {
-        'total_skus': revalidation_payload['total_skus'] + feeder_payload['total_skus'],
-        'matched': revalidation_payload['matched'] + feeder_payload['matched'],
-        'skipped_no_result': revalidation_payload['skipped_no_result'] + feeder_payload['skipped_no_result'],
-        'skipped_low_score': revalidation_payload['skipped_low_score'] + feeder_payload['skipped_low_score'],
-        'errors': revalidation_payload['errors'] + feeder_payload['errors'],
-        'history_written': revalidation_payload['history_written'] + feeder_payload['history_written'],
-        'attempts_written': revalidation_payload['attempts_written'] + feeder_payload['attempts_written'],
-        'duration_sec': round(revalidation_payload['duration_sec'] + feeder_payload['duration_sec'], 2),
+        'total_skus': (
+            stale_refresh_payload['total_skus']
+            + revalidation_payload['total_skus']
+            + feeder_payload['total_skus']
+        ),
+        'matched': (
+            stale_refresh_payload['matched']
+            + revalidation_payload['matched']
+            + feeder_payload['matched']
+        ),
+        'skipped_no_result': (
+            stale_refresh_payload['skipped_no_result']
+            + revalidation_payload['skipped_no_result']
+            + feeder_payload['skipped_no_result']
+        ),
+        'skipped_low_score': (
+            stale_refresh_payload['skipped_low_score']
+            + revalidation_payload['skipped_low_score']
+            + feeder_payload['skipped_low_score']
+        ),
+        'errors': stale_refresh_payload['errors'] + revalidation_payload['errors'] + feeder_payload['errors'],
+        'history_written': (
+            stale_refresh_payload['history_written']
+            + revalidation_payload['history_written']
+            + feeder_payload['history_written']
+        ),
+        'attempts_written': (
+            stale_refresh_payload['attempts_written']
+            + revalidation_payload['attempts_written']
+            + feeder_payload['attempts_written']
+        ),
+        'duration_sec': round(
+            stale_refresh_payload['duration_sec']
+            + revalidation_payload['duration_sec']
+            + feeder_payload['duration_sec'],
+            2,
+        ),
+        'stale_identity_refresh': stale_refresh_payload,
        'retryable_candidate_revalidation': revalidation_payload,
        'unmatched_priority_backfill': feeder_payload,
    }
@@ -1818,6 +1866,13 @@ def api_pchome_match_backfill():

            engine = create_engine(DATABASE_PATH)
            feeder = CompetitorPriceFeeder(engine=engine)
+            stale_refresh_limit = max(5, min(40, max(5, limit // 3)))
+            update_pchome_backfill_run(
+                run_id,
+                stage='refreshing_stale',
+                message=f'正在先刷新 {stale_refresh_limit} 筆已知 PChome identity，補回可用比價新鮮度',
+            )
+            stale_refresh_result = feeder.run_expired_identity_refresh(limit=stale_refresh_limit)
            revalidation_limit = min(limit, 80)
            update_pchome_backfill_run(
                run_id,
@@ -1835,7 +1890,11 @@ def api_pchome_match_backfill():
                message=f'正在補抓 {unmatched_limit} 筆高優先尚未搜尋商品',
            )
            result = feeder.run_unmatched_priority(limit=unmatched_limit)
-            result_payload = _combined_feeder_payload(revalidation_result, result)
+            result_payload = _combined_feeder_payload(
+                revalidation_result,
+                result,
+                stale_refresh_result=stale_refresh_result,
+            )
            update_pchome_backfill_run(
                run_id,
                stage='generating_picks',
@@ -1859,7 +1918,7 @@ def api_pchome_match_backfill():
                result=result_payload,
                pick_result=pick_payload,
                message=(
-                    f"PChome 補抓完成：比對 {result_payload['total_skus']} 筆、"
+                    f"PChome 比價補強完成：刷新/重評/補抓 {result_payload['total_skus']} 筆、"
                    f"新增/更新 {result_payload['matched']} 筆、"
                    f"AI 挑品寫入 {pick_payload['written']} 筆"
                ),
@@ -1887,7 +1946,7 @@ def api_pchome_match_backfill():

    return jsonify({
        'success': True,
-        'message': f'已啟動 PChome 未搜尋補抓，優先處理 {limit} 筆高價未配對商品；完成後會重算 AI 挑品清單',
+        'message': f'已啟動 PChome 比價補強，會先刷新舊 identity，再重評近門檻與補抓 {limit} 筆高價未配對商品；完成後會重算 AI 挑品清單',
        'limit': limit,
        'data': _get_pchome_backfill_status_payload(),
    }), 202
--- a/tests/test_frontend_v2_assets.py
+++ b/tests/test_frontend_v2_assets.py
@@ -530,6 +530,9 @@ def test_ai_product_pick_agent_uses_real_competitor_data_and_dashboard_action():
    assert "stale_recovery_preview" in route_source
    assert "status['coverage'] = _build_pchome_backfill_coverage_payload()" in route_source
    assert "run_unmatched_priority(limit=unmatched_limit)" in route_source
+    assert "stale_refresh_limit = max(5, min(40, max(5, limit // 3)))" in route_source
+    assert "stale_refresh_result = feeder.run_expired_identity_refresh(limit=stale_refresh_limit)" in route_source
+    assert "stale_identity_refresh" in route_source
    assert "run_expired_identity_refresh(limit=limit)" in route_source
    assert "run_expired_identity_search_recovery(limit=limit)" in route_source
    assert "stage='refreshing_stale'" in route_source