V10.566 新增市場情報來源治理 gate
All checks were successful
CD Pipeline / deploy (push) Successful in 1m14s

This commit is contained in:
OoO
2026-06-01 22:03:15 +08:00
parent 6cf2d23521
commit 462e354a71
14 changed files with 1490 additions and 235 deletions

View File

@@ -6,6 +6,7 @@
【已完成】
- V10.565 補 PChome 覆蓋率操作建議:`/api/ai/pchome-match/backfill/status` 會把低覆蓋率拆成 `operation_backlog`,分別列出刷新舊 identity、重評近門檻、補抓未配對、人工覆核、單位價覆核與過期搜尋救援預覽同時回傳 `recommended_next_action`Dashboard 狀態摘要會顯示「建議執行比價補強 / 刷新過期 identity / 處理覆核」等下一步,讓覆蓋率 KPI 直接連到可執行行動。
- V10.563 收斂正式 preview 假可救候選M.A.C 超持妝輕透濾鏡蜜粉若只有 PChome 端出現明確色號(例如 `#絕絕紫`),會標成 `variant_selection_review` 並維持 `true_low_confidence`,不再佔 recoverable 池SAUGELLA 賽吉兒菁萃潔浴凝露新增潤澤 / 日用型 / 加強 / 黃金女郎型變體互斥,避免同品線不同私密清潔款式被誤救成 matched。
- V10.566 新增市場情報 Professional Source Governance把 robots/REP、sitemap/lastmod、JSON-LD / schema.org structured data、canonical URL、rate limit、公開資料邊界、provenance、snapshot hash 與 idempotency key 變成可審核 source contract。新增 `/api/market_intel/mcp_professional_source_governance` 與市場情報頁卡片、deployment readiness smoke targetAPI/UI 只審核操作員貼回的治理摘要,不抓外站、不讀 robots/sitemap、不開 DB、不寫檔、不掛 scheduler後續 fetch target review 才能引用通過治理的來源。
- V10.561 補 PChome 比價補強前端分段回饋Dashboard 的 PChome 卡片從「補抓產線」改為「比價補強產線」,按鈕與確認文案同步說明會先刷新舊 identity、再重評近門檻與補抓未配對結果區新增刷新 / 重評 / 補抓三段 matched/total 摘要,避免後端已完成分段統計但操作員仍只看到一個籠統成功數。
- V10.560 串起手動 PChome 比價補強三段式流程:`/api/ai/pchome-match/backfill` 現在不只跑近門檻重評與未配對補抓,也會先用小批次 `run_expired_identity_refresh()` 刷新已知 `identity_v2` 舊價格,讓操作員按一次補強就能同時處理「舊 identity 新鮮度」、「near-threshold low_score」與「pending identity」三條主線。結果 payload 新增 `stale_identity_refresh` 分段統計,方便後續 Dashboard / 簡報 / AI 決策知道覆蓋率改善是來自刷新、重評或補抓。
- V10.559 收斂 retryable 有效身份新鮮度:`_fetch_retryable_candidate_skus()` 不再把 `expires_at IS NULL` 的舊 PChome `identity_v2` 當成有效阻擋條件,只有明確 `expires_at > CURRENT_TIMESTAMP` 的新鮮 identity 才會阻止 near-threshold revalidation。未知新鮮度仍走 V10.551 的 expired / recovery 刷新入口,重評後仍必須通過最新版 matcher、hard-veto、auto write safety 與既有正式候選覆寫保護,避免為了拉覆蓋率犧牲準確率。

View File

@@ -402,7 +402,7 @@ YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', '')
# ==========================================
# 系統版本與路徑
# ==========================================
SYSTEM_VERSION = "V10.565"
SYSTEM_VERSION = "V10.566"
LOG_FILE_PATH = os.path.join(BASE_DIR, 'logs/system.log')
public_url = PUBLIC_URL # 用於模板顯示

View File

@@ -178,6 +178,7 @@ EwoooC 目前已有 MOMO EDM / 節慶活動資料、`promo_products`、PChome
- 2026-05-31 追加 MCP fetch candidate queue writer review decision gate`services.market_intel.mcp_fetch_candidate_queue_writer_review_decision``services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_gates``services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_sample``/api/market_intel/mcp_fetch_candidate_queue_writer_review_decision` 在 review inventory 通過後審核 operator candidate queue review decision 摘要,檢查 decision identity、target table、row count、dedupe keys、`needs_review` 現態、允許決策集合、evidence refs、matched row exact-identity/variant/overwrite guard、operator confirmations 與 forbidden API actionsAPI/UI 不讀 approval token、不執行 CLI、不開 DB、不寫 decision record、不更新 review_state、不寫 match result、不補 queue、不掛 scheduler只放行到 decision approval / writer preflight 設計。
- 2026-05-31 追加 MCP fetch candidate queue writer review decision approval gate`services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval``services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval_gates``services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval_sample``/api/market_intel/mcp_fetch_candidate_queue_writer_review_decision_approval` 在 review decision 通過後只審核 operator human approval 摘要,確認 decision linkage、approval identity、target table、row count、dedupe keys、`approved_for_writer_preflight` approval result、decision/approval evidence refs、artifact paths、matched row exact-identity/variant/overwrite guard、operator confirmations 與 forbidden API actionsAPI/UI 不讀 approval token、不執行 CLI、不開 DB、不寫 approval record、不寫 decision record、不更新 review_state、不寫 match result、不補 queue、不掛 scheduler只放行到後續 writer preflight 設計。此 endpoint 已拆入 `routes.market_intel_mcp_review_routes`,避免 `routes.market_intel_mcp_run_routes` 超過 800 行治理門檻。
- 2026-05-31 追加 MCP fetch candidate queue writer review decision approval writer preflight gate`services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight`、對應 gates/sample 與 `/api/market_intel/mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight` 在 human approval 通過後只審核 operator writer preflight 摘要,確認 approval linkage、writer_preflight_id、target operation、row count、dedupe keys、approved decision 到 target review_state 的逐列映射、decision/approval/preflight evidence refs、matched row exact-identity/variant/overwrite guard 與 operator boundaryAPI/UI 不讀 approval token、不執行 CLI、不開 DB、不寫 preflight/approval/decision/match、不更新 review_state、不補 queue、不掛 scheduler只放行到後續 CLI review / run package 設計。
- 2026-06-01 追加 Professional Source Governance gate`services.market_intel.mcp_professional_source_governance`、對應 gates/sample 與 `/api/market_intel/mcp_professional_source_governance` 將 robots/REP、sitemap/lastmod、JSON-LD / schema.org structured data、canonical URL、rate limit、公開資料邊界、provenance、snapshot hash 與 idempotency key 整理為 source contract。此 gate 只審核 operator source governance 摘要,不抓外站、不讀 robots/sitemap、不開 DB、不寫檔、不掛 scheduler後續 fetch target review 才能引用通過治理的公開來源。
- 2026-05-18 追加 scheduler attach plan preview`services.market_intel.scheduler_plan``/api/market_intel/scheduler_plan` 描述未來 `campaign_discovery_daily``campaign_product_probe``product_match_review_seed` 三個 job 的 cadence、gate、fallback 與安全邊界。此階段不註冊 scheduler job、不啟動 crawler、不連外、不寫 DB排程掛載必須等 migration、seed、MCP fetch gate、manual sample 與人工批准全過。
- 2026-05-18 追加 match review plan preview`services.market_intel.match_review_plan``/api/market_intel/match_review_plan` 定義商品比對訊號、分數門檻、`needs_review → confirmed/rejected` HITL 流程與安全邊界。此階段不建立 review queue、不自動 confirmed、不寫 `market_product_matches`、不呼叫 MCP價格只能作為輔助訊號不能單獨決定同品比對。
- 2026-05-18 追加 opportunity plan preview`services.market_intel.opportunity_plan``/api/market_intel/opportunity_plan` 定義競品低價威脅、促銷缺口、深折重疊、活動即將結束四類規則與分級策略。此階段不建立 opportunity queue、不派送 Telegram、不產生 AI 摘要、不寫 DB高風險項必須先有 confirmed match 與 DB evidence 才能升級。

View File

@@ -55,6 +55,7 @@
- 2026-05-31 追記:同步市場情報 MCP fetch candidate queue writer review inventory gate 後的 `services/market_intel/deployment_readiness.py` 行數;本次新增 `services/market_intel/mcp_fetch_candidate_queue_writer_review_inventory.py`462 行)、`services/market_intel/mcp_fetch_candidate_queue_writer_review_inventory_gates.py`183 行)與 `services/market_intel/mcp_fetch_candidate_queue_writer_review_inventory_sample.py`107 行),全部低於 600 行提醒門檻;`routes/market_intel_mcp_run_routes.py` 目前 717 行,仍低於 800 行但後續新增 MCP gate 應持續評估拆第二個 route extension。
- 2026-05-31 追記:同步市場情報 MCP fetch candidate queue writer review decision gate 後的 `services/market_intel/deployment_readiness.py` 行數;本次新增 `services/market_intel/mcp_fetch_candidate_queue_writer_review_decision.py`498 行)、`services/market_intel/mcp_fetch_candidate_queue_writer_review_decision_gates.py`241 行)與 `services/market_intel/mcp_fetch_candidate_queue_writer_review_decision_sample.py`118 行),全部低於 600 行提醒門檻;`routes/market_intel_mcp_run_routes.py` 目前 772 行,仍低於 800 行但已接近門檻,下一段 MCP route 應優先拆第二個 route extension。
- 2026-05-31 追記:同步市場情報 MCP fetch candidate queue writer review decision approval gate 後的 `services/market_intel/deployment_readiness.py` 行數;本次新增 `services/market_intel/mcp_fetch_candidate_queue_writer_review_decision_approval.py`560 行)、`services/market_intel/mcp_fetch_candidate_queue_writer_review_decision_approval_gates.py`255 行)、`services/market_intel/mcp_fetch_candidate_queue_writer_review_decision_approval_sample.py`140 行)與 `routes/market_intel_mcp_review_routes.py`64 行),全部低於 600 行提醒門檻;`routes/market_intel_mcp_run_routes.py` 維持 770 行,本次未再加 endpoint改以第二個 MCP review route extension 承接。
- 2026-06-01 追記:同步市場情報 Professional Source Governance gate 後的 `services/market_intel/deployment_readiness.py` 行數;本次新增 `services/market_intel/mcp_professional_source_governance.py`391 行)、`services/market_intel/mcp_professional_source_governance_gates.py`266 行)、`services/market_intel/mcp_professional_source_governance_sample.py`175 行)與 `routes/market_intel_mcp_review_routes.py`165 行),全部低於 600 行提醒門檻;`services/market_intel/deployment_readiness.py` 仍是既有 P2 大檔,只加 preview-safe check 與 smoke target後續需延續小 service + route extension 模式。
- 2026-05-24 追記:同步背景 Code Review 111 fallback 保護合併後的 `services/code_review_pipeline_service.py` 行數;此處只更新 inventory不變更 Code Review 行為。
- 2026-05-21 追記:同步 PChome/LUDEYA 商品線名稱漂移比對更新後的 `services/marketplace_product_matcher.py` 行數;此處只更新 inventory不變更模組化決策。
- 2026-05-21 追記:同步 MAC/Yuskin/AHC 名稱漂移與 bundle equivalent matcher 更新後的 `services/marketplace_product_matcher.py` 行數;此處只更新 inventory不變更模組化決策。
@@ -107,7 +108,7 @@
| 805 | `routes/bot_api_routes.py` | P2 Bot API Blueprint | route glue / bot action service |
| 1319 | `routes/market_intel_review_report_routes.py` | P2 market intel review report Blueprint | review report route glue / export payload / phase handoff orchestration |
| 917 | `routes/market_intel_routes.py` | P2 market intel Blueprint | page route / API route glue / MCP gate route registration helper |
| 1914 | `services/market_intel/deployment_readiness.py` | P2 market intel deployment readiness | preflight gates / readiness payload / route contract helpers |
| 1965 | `services/market_intel/deployment_readiness.py` | P2 market intel deployment readiness | preflight gates / readiness payload / route contract helpers |
| 846 | `services/market_intel/candidate_queue_review_ai_summary_persistence_telegram_dispatch_report_catalog_record_run_receipt.py` | P2 market intel review receipt pipeline | AI summary / persistence / Telegram dispatch / report catalog run receipt orchestration |
## 市場情報開發前置禁區

View File

@@ -103,6 +103,7 @@
- 2026-05-31 起,`V10.505` 新增市場情報 MCP Fetch Candidate Queue Writer Review Decision gate在 review inventory 通過後只審核 operator candidate queue review decision 摘要,要求 decision identity、target table、row count、dedupe keys、`needs_review` 現態、允許決策、evidence refs、matched row exact-identity/variant/overwrite guard 與 operator confirmation 對齊;仍不讀 token、不執行 CLI、不開 DB、不寫 decision record、不更新 review_state、不寫 match result、不補 queue、不掛 scheduler只放行到 decision approval / writer preflight 設計。
- 2026-05-31 起,`V10.506` 新增市場情報 MCP Fetch Candidate Queue Writer Review Decision Approval gate在 review decision 通過後只審核 operator human approval 摘要,要求 decision linkage、approval identity、target table、row count、dedupe keys、`approved_for_writer_preflight` approval result、decision/approval evidence refs、artifact paths、matched row exact-identity/variant/overwrite guard 與 operator confirmation 對齊;仍不讀 token、不執行 CLI、不開 DB、不寫 approval record、不寫 decision record、不更新 review_state、不寫 match result、不補 queue、不掛 scheduler只放行到後續 writer preflight 設計。
- 2026-05-31 起,`V10.509` 新增市場情報 MCP Fetch Candidate Queue Writer Review Decision Approval Writer Preflight gate在 human approval 通過後只審核 operator writer preflight 摘要,要求 approval linkage、writer_preflight_id、target operation、row count、dedupe keys、approved decision 到 target review_state 的逐列映射、decision/approval/preflight evidence refs、matched row exact-identity/variant/overwrite guard 與 operator boundary仍不讀 token、不執行 CLI、不開 DB、不寫 preflight/approval/decision/match、不更新 review_state、不補 queue、不掛 scheduler只放行到後續 CLI review / run package 設計。
- 2026-06-01 起,`V10.566` 新增市場情報 Professional Source Governance gate將 robots/REP、sitemap/lastmod、JSON-LD / schema.org structured data、canonical URL、rate limit、公開資料邊界、provenance、snapshot hash 與 idempotency key 納入 source contract並接上 `/api/market_intel/mcp_professional_source_governance`、UI preview panel、deployment readiness check 與 production smoke target仍不抓外站、不讀 robots/sitemap、不開 DB、不寫檔、不掛 scheduler。
## 3. 12 Agent 決策信封整合

View File

@@ -15,6 +15,7 @@
### 2026-06-01PChome 比價新鮮度操作閉環
- **V10.565 PChome 覆蓋率操作建議**: 補強 `/api/ai/pchome-match/backfill/status`,將低覆蓋率拆成 `operation_backlog`:刷新舊 identity、重評近門檻、補抓未配對、人工覆核、單位價覆核與過期搜尋救援預覽並新增 `recommended_next_action`Dashboard 狀態摘要會直接顯示建議下一步,避免使用者只看到低覆蓋率卻不知道該按哪條產線。
- **V10.563 正式 preview 假可救候選收斂**: 針對正式 `retryable_candidate_preview` 露出的 M.A.C 蜜粉與 SAUGELLA 菁萃潔浴凝露案例補 guard。M.A.C 單邊明確色號(如 `#絕絕紫`)會進 `variant_selection_review`,維持 `true_low_confidence`SAUGELLA 潤澤 / 日用型 / 加強 / 黃金女郎型互斥,直接 hard veto避免同品線不同私密清潔款式被當成 recoverable low_score。
- **V10.566 市場情報 Professional Source Governance**: 新增 `/api/market_intel/mcp_professional_source_governance`、preview service/gates/sample 與市場情報頁卡片,將 robots/REP、sitemap/lastmod、JSON-LD / schema.org structured data、canonical URL、rate limit、公開資料邊界、provenance、snapshot hash 與 idempotency key 納入 source contract。此 gate 只審核操作員治理摘要,不抓外站、不讀 robots/sitemap、不開 DB、不寫檔、不掛 schedulerdeployment readiness 同步新增 preview-safe 檢查與 production smoke target。
- **V10.561 PChome 比價補強前端分段回饋**: Dashboard 的 PChome 操作卡改名為「比價補強產線」,手動按鈕與確認文案同步說明三段流程;結果摘要會顯示刷新、重評、補抓各自的 matched/total讓操作員能判斷覆蓋率改善來自舊 identity 新鮮度回補、近門檻 matcher 回刷,或 pending 商品 fresh search 補抓。
- **V10.560 手動 PChome 比價補強三段式串接**: `/api/ai/pchome-match/backfill` 與每日 scheduler 口徑對齊,手動執行時先小批次刷新過期 `identity_v2`,再跑近門檻候選重評,最後補抓高優先未配對商品。回傳結果新增 `stale_identity_refresh` 分段統計,讓後續 Dashboard、簡報與 AI 決策能區分覆蓋率改善來自舊 identity 新鮮度回補、matcher 回刷,還是 fresh search 補抓。
- **V10.559 retryable 有效身份新鮮度收斂**: `_fetch_retryable_candidate_skus()` 的既有 identity 阻擋條件改成只接受 `cp.expires_at > CURRENT_TIMESTAMP`,不再讓 `expires_at IS NULL` 的未知新鮮度舊配對壓住近門檻候選回刷。未知新鮮度仍由 expired identity refresh / recovery 路徑處理,最後寫入仍必須通過現行 matcher、hard-veto、auto write safety 與 stronger existing production match 保護。

View File

@@ -13,6 +13,9 @@ from services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_appr
from services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight import (
build_mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight_preview,
)
from services.market_intel.mcp_professional_source_governance import (
build_mcp_professional_source_governance_preview,
)
@market_intel_bp.route(
@@ -129,3 +132,34 @@ def market_intel_mcp_fetch_candidate_queue_writer_review_decision_approval_write
phase=service.phase,
)
)
@market_intel_bp.route(
"/api/market_intel/mcp_professional_source_governance",
methods=["GET", "POST"],
)
@login_required
def market_intel_mcp_professional_source_governance():
operator_source_governance = None
if request.method == "POST":
payload = request.get_json(silent=True) or {}
package = (
payload.get("professional_source_governance_package")
or payload.get("source_governance_package")
or payload.get("operator_source_governance")
or payload.get("market_source_governance")
or payload
)
operator_source_governance = (
package.get("operator_source_governance")
or package.get("source_governance")
or package
)
service = MarketIntelService()
return jsonify(
build_mcp_professional_source_governance_preview(
operator_source_governance=operator_source_governance,
phase=service.phase,
)
)

View File

@@ -117,6 +117,9 @@ from services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_appr
from services.market_intel.mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight import (
build_mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight_preview,
)
from services.market_intel.mcp_professional_source_governance import (
build_mcp_professional_source_governance_preview,
)
from services.market_intel.mcp_manual_fetch_handoff import (
build_mcp_manual_fetch_handoff_preview,
)
@@ -313,6 +316,11 @@ PRODUCTION_SMOKE_TARGETS = (
)
+ PRODUCTION_SMOKE_TARGETS[-1:]
)
PRODUCTION_SMOKE_TARGETS = (
PRODUCTION_SMOKE_TARGETS[:-1]
+ ("/api/market_intel/mcp_professional_source_governance",)
+ PRODUCTION_SMOKE_TARGETS[-1:]
)
def _run_review_preview_safe(payload, mode):
@@ -426,6 +434,11 @@ def build_deployment_readiness_preview(*, service, market_intel_tables, schema_s
phase=service.phase,
)
)
mcp_professional_source_governance = (
build_mcp_professional_source_governance_preview(
phase=service.phase,
)
)
scheduler_plan = service.build_scheduler_plan()
manual_sample_plan = service.build_manual_sample_plan()
manual_sample_acceptance = service.build_manual_sample_acceptance()
@@ -1526,6 +1539,23 @@ def build_deployment_readiness_preview(*, service, market_intel_tables, schema_s
mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight,
"mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight_preview",
),
"mcp_professional_source_governance_preview_safe": bool(
mcp_professional_source_governance["mode"]
== "mcp_professional_source_governance_preview"
and not mcp_professional_source_governance["network_request_allowed"]
and not mcp_professional_source_governance["external_network_executed"]
and not mcp_professional_source_governance["api_uses_external_network"]
and not mcp_professional_source_governance["api_fetches_robots_txt"]
and not mcp_professional_source_governance["api_fetches_sitemap"]
and not mcp_professional_source_governance["api_fetches_source_url"]
and not mcp_professional_source_governance[
"api_opens_database_connection"
]
and not mcp_professional_source_governance["api_writes_database"]
and not mcp_professional_source_governance["api_writes_file"]
and not mcp_professional_source_governance["payload_persisted"]
and not mcp_professional_source_governance["scheduler_attached"]
),
"candidate_queue_writer_postwrite_smoke_planned_safe": bool(
candidate_queue_writer_postwrite_smoke["mode"]
== "candidate_queue_writer_postwrite_smoke_planned"
@@ -1857,6 +1887,7 @@ def build_deployment_readiness_preview(*, service, market_intel_tables, schema_s
"mcp_fetch_candidate_queue_writer_review_decision": mcp_fetch_candidate_queue_writer_review_decision,
"mcp_fetch_candidate_queue_writer_review_decision_approval": mcp_fetch_candidate_queue_writer_review_decision_approval,
"mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight": mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight,
"mcp_professional_source_governance": mcp_professional_source_governance,
"scheduler_plan": scheduler_plan,
"manual_sample_plan": manual_sample_plan,
"manual_sample_acceptance": manual_sample_acceptance,

View File

@@ -0,0 +1,391 @@
"""市場情報專業來源治理 gate。
本模組把主流市場資料採集做法轉成可審核合約:
robots/REP、sitemap、structured data、canonical URL、rate limit、
公開資料邊界、provenance、snapshot hash 與 idempotency。
API/UI 只審核操作員提供的治理摘要;不抓外站、不讀 robots/sitemap、
不開 DB、不寫檔、不掛 scheduler。
"""
from urllib.parse import urlparse
from services.market_intel.mcp_fetch_candidate_queue_writer_post_closeout_inventory_review import (
FORBIDDEN_SECRET_KEYS,
SAFE_SECRET_METADATA_KEYS,
_as_dict,
_as_list,
_blocked_side_effects,
_contains_forbidden_key,
_safe_int,
_safe_path,
_safe_text,
)
from services.market_intel.mcp_fetch_candidate_queue_writer_run_readiness import (
ARTIFACT_PREFIX,
)
from services.market_intel.mcp_professional_source_governance_gates import (
CONTRACT_SCOPE,
SOURCE_POLICY_VERSION,
_is_public_http_url,
build_professional_source_governance_gates,
)
from services.market_intel.mcp_professional_source_governance_sample import (
build_sample_professional_source_governance_package,
)
_BLOCKED_SOURCE_GOVERNANCE_SIDE_EFFECT_KEYS = (
"allow_api_database_write",
"allow_api_execution",
"allow_api_file_write",
"allow_api_network_fetch",
"allow_database_write",
"allow_external_network",
"allow_scheduler_attach",
"api_fetches_robots_txt",
"api_fetches_sitemap",
"api_fetches_source_url",
"api_opens_database_connection",
"api_uses_external_network",
"api_writes_database",
"api_writes_file",
"database_commit_executed",
"database_write_executed",
"external_network_executed",
"fetch_executed",
"file_written",
"network_request_allowed",
"payload_persisted",
"ready_for_api_database_write",
"real_write_allowed_by_api",
"scheduler_attached",
"write_database",
"writes_executed",
"would_write_database",
)
_RAW_PAYLOAD_KEYS = (
"body_html",
"full_response_body",
"html",
"page_body",
"page_html",
"raw_html",
"raw_page_html",
"response_body",
)
def _safe_float(value):
try:
return float(value or 0)
except (TypeError, ValueError):
return 0.0
def _contains_raw_payload(value):
if isinstance(value, dict):
for key, nested in value.items():
if str(key).lower() in _RAW_PAYLOAD_KEYS and bool(nested):
return True
if _contains_raw_payload(nested):
return True
if isinstance(value, list):
return any(_contains_raw_payload(item) for item in value)
return False
def _blocked_source_governance_side_effects(payload):
found = list(_blocked_side_effects(payload))
def visit(value, path):
if isinstance(value, dict):
for key, item in value.items():
normalized_key = str(key).lower()
key_path = f"{path}.{key}" if path else key
if (
normalized_key in _BLOCKED_SOURCE_GOVERNANCE_SIDE_EFFECT_KEYS
and bool(item)
):
found.append(key_path)
visit(item, key_path)
elif isinstance(value, list):
for index, item in enumerate(value):
visit(item, f"{path}[{index}]")
visit(payload, "")
return sorted(set(found))
def _normalize_host(value):
if not value:
return None
parsed = urlparse(value)
return parsed.netloc.lower() or None
def _source_summary(source):
source = _as_dict(source)
source_url = _safe_text(source.get("source_url"), 500)
canonical_url = _safe_text(source.get("canonical_url"), 500)
robots_url = _safe_text(source.get("robots_url"), 500)
sitemap_url = _safe_text(source.get("sitemap_url"), 500)
structured_data_types = [
_safe_text(item, 80)
for item in _as_list(source.get("structured_data_types"))
if _safe_text(item, 80)
]
max_requests = _safe_int(source.get("max_requests_per_run"))
crawl_delay_seconds = _safe_float(source.get("crawl_delay_seconds"))
evidence_artifact_path = _safe_text(source.get("evidence_artifact_path"))
source_host = _normalize_host(source_url)
canonical_host = _normalize_host(canonical_url)
return {
"platform_code": _safe_text(source.get("platform_code"), 80),
"source_key": _safe_text(source.get("source_key"), 160),
"source_url": source_url,
"canonical_url": canonical_url,
"robots_url": robots_url,
"sitemap_url": sitemap_url,
"lastmod_source": _safe_text(source.get("lastmod_source"), 160),
"source_url_safe": _is_public_http_url(source_url),
"canonical_url_safe": _is_public_http_url(canonical_url),
"robots_url_safe": _is_public_http_url(robots_url),
"sitemap_url_safe": _is_public_http_url(sitemap_url),
"canonical_host_matches_source": bool(
source_host and canonical_host and source_host == canonical_host
),
"robots_policy_checked": bool(source.get("robots_policy_checked")),
"robots_allowed": bool(source.get("robots_allowed")),
"tos_public_page_checked": bool(source.get("tos_public_page_checked")),
"login_required": bool(source.get("login_required")),
"member_or_order_data": bool(source.get("member_or_order_data")),
"cart_order_or_pii": bool(source.get("cart_order_or_pii")),
"anti_bot_bypass_required": bool(source.get("anti_bot_bypass_required")),
"structured_data_preferred": bool(source.get("structured_data_preferred")),
"json_ld_first": bool(source.get("json_ld_first")),
"dom_selector_fallback_allowed": bool(
source.get("dom_selector_fallback_allowed")
),
"structured_data_types": structured_data_types,
"selector_version": _safe_text(source.get("selector_version"), 120),
"crawl_delay_seconds": crawl_delay_seconds,
"max_requests_per_run": max_requests,
"public_cache_ttl_hours": _safe_int(source.get("public_cache_ttl_hours")),
"evidence_artifact_path": evidence_artifact_path,
"evidence_artifact_path_safe": _safe_path(
evidence_artifact_path,
prefixes=(ARTIFACT_PREFIX,),
suffixes=(".json",),
),
"provenance_required": bool(source.get("provenance_required")),
"snapshot_hash_required": bool(source.get("snapshot_hash_required")),
"idempotency_key_strategy": _safe_text(
source.get("idempotency_key_strategy"), 160
),
}
def _operator_confirmations(payload):
confirmations = _as_dict(payload.get("operator_confirmations"))
return {
"human_reviewed_source_policy": bool(
confirmations.get("human_reviewed_source_policy")
),
"robots_and_tos_checked_by_operator": bool(
confirmations.get("robots_and_tos_checked_by_operator")
),
"public_pages_only": bool(confirmations.get("public_pages_only")),
"no_login_or_member_data": bool(
confirmations.get("no_login_or_member_data")
),
"no_cart_order_or_pii": bool(confirmations.get("no_cart_order_or_pii")),
"no_antibot_bypass": bool(confirmations.get("no_antibot_bypass")),
"structured_data_first": bool(confirmations.get("structured_data_first")),
"provenance_required": bool(confirmations.get("provenance_required")),
"no_api_network_fetch": bool(confirmations.get("no_api_network_fetch")),
"no_database_write": bool(confirmations.get("no_database_write")),
"no_scheduler_attach": bool(confirmations.get("no_scheduler_attach")),
"no_secret_payload": bool(confirmations.get("no_secret_payload")),
}
def _governance_summary(payload):
payload = _as_dict(payload)
sources = [_source_summary(source) for source in _as_list(payload.get("sources"))]
blocked_side_effects = _blocked_source_governance_side_effects(payload)
raw_payload_submitted = _contains_raw_payload(payload)
secret_or_token_submitted = _contains_forbidden_key(
payload,
FORBIDDEN_SECRET_KEYS,
safe_keys=SAFE_SECRET_METADATA_KEYS
| {
"no_api_network_fetch",
"no_secret_payload",
"source_contract_version",
},
)
return {
"governance_id": _safe_text(payload.get("governance_id"), 160),
"governance_scope": _safe_text(payload.get("governance_scope"), 160),
"policy_version": _safe_text(payload.get("policy_version"), 120),
"source_contract_version": _safe_text(
payload.get("source_contract_version"), 120
),
"sources": sources,
"source_count": len(sources),
"platform_count": len(
{source["platform_code"] for source in sources if source["platform_code"]}
),
"robots_checked_count": len(
[
source
for source in sources
if source["robots_policy_checked"] and source["robots_allowed"]
]
),
"structured_data_ready_count": len(
[
source
for source in sources
if source["structured_data_preferred"] and source["json_ld_first"]
]
),
"min_crawl_delay_seconds": min(
[source["crawl_delay_seconds"] for source in sources] or [0.0]
),
"max_requests_per_run": max(
[source["max_requests_per_run"] for source in sources] or [0]
),
"operator_confirmations": _operator_confirmations(payload),
"raw_payload_submitted_to_api": raw_payload_submitted,
"secret_or_token_submitted_to_api": secret_or_token_submitted,
"blocked_side_effects": blocked_side_effects,
"api_uses_external_network": bool(payload.get("api_uses_external_network")),
"api_fetches_robots_txt": bool(payload.get("api_fetches_robots_txt")),
"api_fetches_sitemap": bool(payload.get("api_fetches_sitemap")),
"api_fetches_source_url": bool(payload.get("api_fetches_source_url")),
"api_opens_database_connection": bool(
payload.get("api_opens_database_connection")
),
"api_writes_database": bool(payload.get("api_writes_database")),
"api_writes_file": bool(payload.get("api_writes_file")),
"scheduler_attached": bool(payload.get("scheduler_attached")),
}
def _source_contract():
return {
"contract_scope": CONTRACT_SCOPE,
"policy_version": SOURCE_POLICY_VERSION,
"mainstream_practices": [
{
"key": "robots_exclusion_protocol",
"label": "先人工確認 robots.txt / REP不由 API 自動抓取或繞過",
"reference_url": "https://www.rfc-editor.org/rfc/rfc9309",
},
{
"key": "sitemap_provenance",
"label": "以 sitemap / lastmod 作為活動來源發現與更新依據之一",
"reference_url": "https://www.sitemaps.org/protocol.html",
},
{
"key": "structured_data_first",
"label": "優先解析 JSON-LD / schema.org Product、Offer、ItemList",
"reference_url": "https://developers.google.com/search/docs/appearance/structured-data/product-snippet",
},
{
"key": "canonical_public_url",
"label": "所有來源需保留 canonical URL、公開 URL 與 host provenance",
"reference_url": "https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls",
},
{
"key": "bronze_silver_gold",
"label": "raw evidence、normalized source、reviewed product/match 分層保存",
"reference_url": "internal:market_intel_lakehouse_contract",
},
],
"required_source_fields": [
"platform_code",
"source_key",
"source_url",
"canonical_url",
"robots_url",
"sitemap_url",
"robots_policy_checked",
"robots_allowed",
"structured_data_types",
"crawl_delay_seconds",
"max_requests_per_run",
"evidence_artifact_path",
"snapshot_hash_required",
"idempotency_key_strategy",
],
"forbidden_data": [
"login_page",
"member_profile",
"cart_or_checkout",
"order_data",
"personal_data",
"cookie_or_token",
"anti_bot_bypass",
],
"next_gate": "mcp_fetch_target_review_with_source_governance",
}
def build_mcp_professional_source_governance_preview(
*, operator_source_governance=None, phase=None
):
payload_received = operator_source_governance is not None
governance_payload = _as_dict(operator_source_governance)
governance = _governance_summary(governance_payload)
gates = build_professional_source_governance_gates(
package_received=payload_received,
governance=governance,
)
blocked_reasons = [gate["key"] for gate in gates if not gate["passed"]]
accepted = bool(payload_received and not blocked_reasons)
return {
"mode": (
"mcp_professional_source_governance"
if accepted
else "mcp_professional_source_governance_preview"
),
"phase": phase,
"source_governance_payload_received": payload_received,
"mcp_professional_source_governance_accepted": accepted,
"ready_for_mcp_fetch_source_contract": accepted,
"ready_for_api_database_write": False,
"ready_for_scheduler_attach": False,
"network_request_allowed": False,
"external_network_executed": False,
"api_uses_external_network": False,
"api_fetches_robots_txt": False,
"api_fetches_sitemap": False,
"api_fetches_source_url": False,
"api_opens_database_connection": False,
"api_writes_database": False,
"api_writes_file": False,
"database_connection_opened": False,
"database_write_executed": False,
"database_commit_executed": False,
"file_written": False,
"payload_persisted": False,
"scheduler_attached": False,
"gate_count": len(gates),
"passed_gate_count": len([gate for gate in gates if gate["passed"]]),
"blocked_reasons": blocked_reasons,
"gates": gates,
"source_governance_summary": governance,
"source_contract": _source_contract(),
"sources": governance["sources"],
"next_operator_steps": [
"人工保留 robots / sitemap / ToS / public URL 審核證據。",
"將通過治理的來源餵給後續 fetch target review不由 API 直接抓外站。",
"正式 fetch 前仍需 MCP readiness、外部 MCP health、manual run package 與 receipt gate。",
],
"sample_professional_source_governance_package": (
build_sample_professional_source_governance_package()
),
}

View File

@@ -0,0 +1,266 @@
"""Gate checks for professional market source governance."""
from urllib.parse import urlparse
CONTRACT_SCOPE = "market_intel_public_campaign_source_governance"
SOURCE_POLICY_VERSION = "source_governance_v1"
SUPPORTED_PLATFORMS = ("momo", "pchome", "coupang", "shopee")
ALLOWED_STRUCTURED_DATA_TYPES = {
"AggregateOffer",
"BreadcrumbList",
"ItemList",
"Offer",
"Product",
}
PRIVATE_URL_MARKERS = (
"/account",
"/cart",
"/checkout",
"/login",
"/member",
"/members",
"/my",
"/order",
"/orders",
"/profile",
"/signin",
"/user",
)
MIN_CRAWL_DELAY_SECONDS = 1.0
MAX_REQUESTS_PER_RUN = 50
def _is_public_http_url(value):
if not isinstance(value, str) or not value.strip():
return False
parsed = urlparse(value.strip())
if parsed.scheme not in ("http", "https"):
return False
if not parsed.netloc:
return False
lowered_path = (parsed.path or "").lower()
return not any(marker in lowered_path for marker in PRIVATE_URL_MARKERS)
def _unique_source_keys(sources):
keys = [
f"{source['platform_code']}:{source['source_key']}"
for source in sources
if source["platform_code"] and source["source_key"]
]
return bool(keys and len(keys) == len(set(keys)) == len(sources))
def _all_sources_have_allowed_structured_data(sources):
return bool(
sources
and all(
source["structured_data_preferred"]
and source["json_ld_first"]
and bool(
set(source["structured_data_types"]).intersection(
ALLOWED_STRUCTURED_DATA_TYPES
)
)
for source in sources
)
)
def _rate_limits_safe(sources):
return bool(
sources
and all(
source["crawl_delay_seconds"] >= MIN_CRAWL_DELAY_SECONDS
and 0 < source["max_requests_per_run"] <= MAX_REQUESTS_PER_RUN
for source in sources
)
)
def _source_url_set_safe(source):
return bool(
source["source_url_safe"]
and source["canonical_url_safe"]
and source["robots_url_safe"]
and source["sitemap_url_safe"]
)
def _source_boundaries_safe(source):
return bool(
source["tos_public_page_checked"]
and not source["login_required"]
and not source["member_or_order_data"]
and not source["cart_order_or_pii"]
and not source["anti_bot_bypass_required"]
)
def build_professional_source_governance_gates(
*,
package_received,
governance,
):
"""Return gate rows. The caller decides whether failed rows block acceptance."""
sources = governance["sources"]
confirmations = governance["operator_confirmations"]
operator_boundaries_confirmed = bool(
confirmations["human_reviewed_source_policy"]
and confirmations["robots_and_tos_checked_by_operator"]
and confirmations["public_pages_only"]
and confirmations["no_login_or_member_data"]
and confirmations["no_cart_order_or_pii"]
and confirmations["no_antibot_bypass"]
and confirmations["structured_data_first"]
and confirmations["provenance_required"]
and confirmations["no_api_network_fetch"]
and confirmations["no_database_write"]
and confirmations["no_scheduler_attach"]
and confirmations["no_secret_payload"]
)
return [
{
"key": "source_governance_payload_received",
"label": "已提供 operator source governance 摘要",
"passed": package_received,
},
{
"key": "source_governance_scope_safe",
"label": "governance scope 必須鎖定公開活動來源治理",
"passed": governance["governance_scope"] == CONTRACT_SCOPE,
},
{
"key": "source_governance_policy_version_recorded",
"label": "來源治理 policy version 必須可稽核",
"passed": governance["policy_version"] == SOURCE_POLICY_VERSION,
},
{
"key": "source_governance_sources_present",
"label": "至少一個平台公開來源必須被審核",
"passed": bool(sources),
},
{
"key": "source_governance_supported_platforms",
"label": "來源平台必須在 MOMO / PChome / Coupang / Shopee 白名單內",
"passed": bool(
sources
and all(
source["platform_code"] in SUPPORTED_PLATFORMS
for source in sources
)
),
},
{
"key": "source_governance_unique_source_keys",
"label": "每個 platform/source_key 必須唯一",
"passed": _unique_source_keys(sources),
},
{
"key": "source_governance_public_urls_only",
"label": "source、canonical、robots 與 sitemap URL 必須是公開 http/https URL",
"passed": bool(
sources and all(_source_url_set_safe(source) for source in sources)
),
},
{
"key": "source_governance_robots_policy_checked_and_allowed",
"label": "操作員必須先確認 robots policy 且來源允許抓取",
"passed": bool(
sources
and all(
source["robots_policy_checked"] and source["robots_allowed"]
for source in sources
)
),
},
{
"key": "source_governance_sitemap_or_lastmod_recorded",
"label": "每個來源需記錄 sitemap 或 lastmod provenance",
"passed": bool(
sources
and all(source["sitemap_url"] or source["lastmod_source"] for source in sources)
),
},
{
"key": "source_governance_structured_data_first",
"label": "優先讀 JSON-LD / schema.org Product、Offer 或 ItemList",
"passed": _all_sources_have_allowed_structured_data(sources),
},
{
"key": "source_governance_rate_limits_safe",
"label": "每來源 crawl delay 至少 1 秒且單次 request budget 不超過 50",
"passed": _rate_limits_safe(sources),
},
{
"key": "source_governance_public_data_only_no_login_or_pii",
"label": "不得碰登入、會員、購物車、訂單或個資頁",
"passed": bool(
sources and all(_source_boundaries_safe(source) for source in sources)
),
},
{
"key": "source_governance_no_antibot_bypass",
"label": "不得要求帳號池、繞反爬或破解保護",
"passed": bool(
sources
and all(not source["anti_bot_bypass_required"] for source in sources)
),
},
{
"key": "source_governance_provenance_and_hash_required",
"label": "每筆來源需有 provenance、snapshot hash 與 idempotency key 策略",
"passed": bool(
sources
and all(
source["provenance_required"]
and source["snapshot_hash_required"]
and source["idempotency_key_strategy"]
for source in sources
)
),
},
{
"key": "source_governance_evidence_paths_safe",
"label": "治理證據 artifact path 必須在 artifacts/market_intel/*.json",
"passed": bool(
sources
and all(source["evidence_artifact_path_safe"] for source in sources)
),
},
{
"key": "source_governance_operator_boundaries_confirmed",
"label": "操作員確認 API 不連外、不寫 DB、不掛 scheduler",
"passed": operator_boundaries_confirmed,
},
{
"key": "source_governance_no_raw_payload",
"label": "API payload 不得貼入 raw HTML、頁面本文或完整 response body",
"passed": not governance["raw_payload_submitted_to_api"],
},
{
"key": "source_governance_no_secret_or_token_key",
"label": "API payload 不得包含 cookie、secret、password 或 token key",
"passed": not governance["secret_or_token_submitted_to_api"],
},
{
"key": "source_governance_side_effect_free",
"label": "payload 不得要求 API 連外、寫檔、寫 DB、執行 CLI 或掛 scheduler",
"passed": not governance["blocked_side_effects"],
},
{
"key": "source_governance_api_preview_only",
"label": "本 endpoint 只能審核治理合約,不抓 robots/sitemap/source URL",
"passed": bool(
not governance["api_uses_external_network"]
and not governance["api_fetches_robots_txt"]
and not governance["api_fetches_sitemap"]
and not governance["api_fetches_source_url"]
and not governance["api_opens_database_connection"]
and not governance["api_writes_database"]
and not governance["api_writes_file"]
and not governance["scheduler_attached"]
),
},
]

View File

@@ -0,0 +1,175 @@
"""Sample package for professional source governance review."""
from copy import deepcopy
from services.market_intel.mcp_fetch_candidate_queue_writer_run_readiness import (
ARTIFACT_PREFIX,
)
_SAMPLE_PROFESSIONAL_SOURCE_GOVERNANCE_PACKAGE = {
"operator_source_governance": {
"governance_id": "market-intel-professional-source-governance-sample",
"governance_scope": "market_intel_public_campaign_source_governance",
"policy_version": "source_governance_v1",
"source_contract_version": "market_source_contract_v1",
"sources": [
{
"platform_code": "momo",
"source_key": "momo_edm",
"source_url": "https://www.momoshop.com.tw/edm/cmmedm.jsp",
"canonical_url": "https://www.momoshop.com.tw/edm/cmmedm.jsp",
"robots_url": "https://www.momoshop.com.tw/robots.txt",
"sitemap_url": "https://www.momoshop.com.tw/sitemap.xml",
"lastmod_source": "sitemap_or_http_last_modified",
"robots_policy_checked": True,
"robots_allowed": True,
"tos_public_page_checked": True,
"login_required": False,
"member_or_order_data": False,
"cart_order_or_pii": False,
"anti_bot_bypass_required": False,
"structured_data_preferred": True,
"json_ld_first": True,
"dom_selector_fallback_allowed": True,
"structured_data_types": ["ItemList", "Product", "Offer"],
"selector_version": "momo_campaign_source_v1",
"crawl_delay_seconds": 2.5,
"max_requests_per_run": 12,
"public_cache_ttl_hours": 24,
"evidence_artifact_path": (
ARTIFACT_PREFIX + "professional-source-governance-momo-edm.json"
),
"provenance_required": True,
"snapshot_hash_required": True,
"idempotency_key_strategy": (
"platform_code:source_key:canonical_url_hash"
),
},
{
"platform_code": "pchome",
"source_key": "pchome_home",
"source_url": "https://24h.pchome.com.tw/",
"canonical_url": "https://24h.pchome.com.tw/",
"robots_url": "https://24h.pchome.com.tw/robots.txt",
"sitemap_url": "https://24h.pchome.com.tw/sitemap.xml",
"lastmod_source": "sitemap_or_http_last_modified",
"robots_policy_checked": True,
"robots_allowed": True,
"tos_public_page_checked": True,
"login_required": False,
"member_or_order_data": False,
"cart_order_or_pii": False,
"anti_bot_bypass_required": False,
"structured_data_preferred": True,
"json_ld_first": True,
"dom_selector_fallback_allowed": True,
"structured_data_types": ["ItemList", "Product", "Offer"],
"selector_version": "pchome_campaign_source_v1",
"crawl_delay_seconds": 2.0,
"max_requests_per_run": 10,
"public_cache_ttl_hours": 24,
"evidence_artifact_path": (
ARTIFACT_PREFIX + "professional-source-governance-pchome-home.json"
),
"provenance_required": True,
"snapshot_hash_required": True,
"idempotency_key_strategy": (
"platform_code:source_key:canonical_url_hash"
),
},
{
"platform_code": "coupang",
"source_key": "coupang_tw_home",
"source_url": "https://www.tw.coupang.com/",
"canonical_url": "https://www.tw.coupang.com/",
"robots_url": "https://www.tw.coupang.com/robots.txt",
"sitemap_url": "https://www.tw.coupang.com/sitemap.xml",
"lastmod_source": "sitemap_or_http_last_modified",
"robots_policy_checked": True,
"robots_allowed": True,
"tos_public_page_checked": True,
"login_required": False,
"member_or_order_data": False,
"cart_order_or_pii": False,
"anti_bot_bypass_required": False,
"structured_data_preferred": True,
"json_ld_first": True,
"dom_selector_fallback_allowed": True,
"structured_data_types": ["ItemList", "Product", "Offer"],
"selector_version": "coupang_tw_campaign_source_v1",
"crawl_delay_seconds": 3.0,
"max_requests_per_run": 8,
"public_cache_ttl_hours": 24,
"evidence_artifact_path": (
ARTIFACT_PREFIX
+ "professional-source-governance-coupang-home.json"
),
"provenance_required": True,
"snapshot_hash_required": True,
"idempotency_key_strategy": (
"platform_code:source_key:canonical_url_hash"
),
},
{
"platform_code": "shopee",
"source_key": "shopee_mall",
"source_url": "https://shopee.tw/mall",
"canonical_url": "https://shopee.tw/mall",
"robots_url": "https://shopee.tw/robots.txt",
"sitemap_url": "https://shopee.tw/sitemap.xml",
"lastmod_source": "sitemap_or_http_last_modified",
"robots_policy_checked": True,
"robots_allowed": True,
"tos_public_page_checked": True,
"login_required": False,
"member_or_order_data": False,
"cart_order_or_pii": False,
"anti_bot_bypass_required": False,
"structured_data_preferred": True,
"json_ld_first": True,
"dom_selector_fallback_allowed": True,
"structured_data_types": ["ItemList", "Product", "Offer"],
"selector_version": "shopee_mall_campaign_source_v1",
"crawl_delay_seconds": 3.0,
"max_requests_per_run": 6,
"public_cache_ttl_hours": 24,
"evidence_artifact_path": (
ARTIFACT_PREFIX
+ "professional-source-governance-shopee-mall.json"
),
"provenance_required": True,
"snapshot_hash_required": True,
"idempotency_key_strategy": (
"platform_code:source_key:canonical_url_hash"
),
},
],
"operator_confirmations": {
"human_reviewed_source_policy": True,
"robots_and_tos_checked_by_operator": True,
"public_pages_only": True,
"no_login_or_member_data": True,
"no_cart_order_or_pii": True,
"no_antibot_bypass": True,
"structured_data_first": True,
"provenance_required": True,
"no_api_network_fetch": True,
"no_database_write": True,
"no_scheduler_attach": True,
"no_secret_payload": True,
},
"api_uses_external_network": False,
"api_fetches_robots_txt": False,
"api_fetches_sitemap": False,
"api_fetches_source_url": False,
"api_opens_database_connection": False,
"api_writes_database": False,
"api_writes_file": False,
"scheduler_attached": False,
}
}
def build_sample_professional_source_governance_package():
return deepcopy(_SAMPLE_PROFESSIONAL_SOURCE_GOVERNANCE_PACKAGE)

View File

@@ -1,3 +1,3 @@
"""市場情報 rollout phase 單一來源。"""
MARKET_INTEL_PHASE = "phase_139_market_intel_mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight"
MARKET_INTEL_PHASE = "phase_140_market_intel_professional_source_governance"

View File

@@ -1150,6 +1150,32 @@
</div>
</div>
<div class="market-intel-panel" data-market-intel-mcp-professional-source-governance>
<div class="market-intel-preview-head">
<div>
<p class="market-intel-muted momo-mono mb-1">MCP / SOURCE GOVERNANCE</p>
<h2 class="market-intel-preview-title">Professional Source Governance</h2>
</div>
<button class="market-intel-icon-button" type="button" title="重新整理專業來源治理" data-market-intel-mcp-professional-source-governance-refresh>
<i class="fas fa-rotate-right" aria-hidden="true"></i>
</button>
</div>
<div class="market-intel-preview-meta" data-market-intel-mcp-professional-source-governance-meta>
<span class="market-intel-pill">loading</span>
</div>
<div data-market-intel-mcp-professional-source-governance-body>
<div class="market-intel-empty">讀取 Professional Source Governance 中...</div>
</div>
<div class="market-intel-control-row mt-3">
<textarea class="market-intel-json-input" rows="9" spellcheck="false" data-market-intel-mcp-professional-source-governance-input placeholder="operator source governance JSON"></textarea>
<div class="market-intel-control-actions">
<button class="market-intel-icon-button" type="button" title="審核 Professional Source Governance JSON" data-market-intel-mcp-professional-source-governance-review>
<i class="fas fa-check" aria-hidden="true"></i>
</button>
</div>
</div>
</div>
<div class="market-intel-panel" data-market-intel-manual-sample>
<div class="market-intel-preview-head">
<div>
@@ -1677,6 +1703,7 @@
const mcpFetchCandidateQueueWriterReviewDecisionRoot = document.querySelector('[data-market-intel-mcp-fetch-candidate-queue-writer-review-decision]');
const mcpFetchCandidateQueueWriterReviewDecisionApprovalRoot = document.querySelector('[data-market-intel-mcp-fetch-candidate-queue-writer-review-decision-approval]');
const mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot = document.querySelector('[data-market-intel-mcp-fetch-candidate-queue-writer-review-decision-approval-writer-preflight]');
const mcpProfessionalSourceGovernanceRoot = document.querySelector('[data-market-intel-mcp-professional-source-governance]');
const manualSampleRoot = document.querySelector('[data-market-intel-manual-sample]');
const sampleAcceptanceRoot = document.querySelector('[data-market-intel-sample-acceptance]');
const sampleReviewRoot = document.querySelector('[data-market-intel-sample-review]');
@@ -1693,7 +1720,7 @@
const liveInventoryRoot = document.querySelector('[data-market-intel-live-inventory]');
const approvalRoot = document.querySelector('[data-market-intel-approval]');
const deployRoot = document.querySelector('[data-market-intel-deploy]');
if (!root && !writerRoot && !cliRoot && !dbProbeRoot && !seedDiffRoot && !legacyBridgeRoot && !mcpReadinessRoot && !mcpPreflightRoot && !mcpActivationRoot && !mcpFetchGateRoot && !mcpCompletionRoot && !mcpActivationEvidenceRoot && !mcpRuntimeSmokeRoot && !mcpRuntimePromotionRoot && !mcpManualFetchHandoffRoot && !mcpFetchTargetReviewRoot && !mcpFetchRunPackageRoot && !mcpFetchRunReadinessRoot && !mcpFetchRunReceiptRoot && !mcpFetchResultParserReviewRoot && !mcpFetchCandidateHandoffReviewRoot && !mcpFetchCandidateQueueReviewRoot && !mcpFetchCandidateQueueWriterPreflightRoot && !mcpFetchCandidateQueueWriterCliReviewRoot && !mcpFetchCandidateQueueWriterRunPackageReviewRoot && !mcpFetchCandidateQueueWriterRunReadinessRoot && !mcpFetchCandidateQueueWriterRunReceiptReviewRoot && !mcpFetchCandidateQueueWriterRunCloseoutReviewRoot && !mcpFetchCandidateQueueWriterPostCloseoutInventoryReviewRoot && !mcpFetchCandidateQueueWriterReviewHandoffRoot && !mcpFetchCandidateQueueWriterReviewInventoryRoot && !mcpFetchCandidateQueueWriterReviewDecisionRoot && !mcpFetchCandidateQueueWriterReviewDecisionApprovalRoot && !mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot && !manualSampleRoot && !sampleAcceptanceRoot && !sampleReviewRoot && !schedulerRoot && !matchReviewRoot && !opportunityRoot && !opportunityScoringRoot && !opportunityEvidenceRoot && !opportunityAlertRoot && !migrationRoot && !migrationDrillRoot && !catalogReviewRoot && !liveSmokeRoot && !liveInventoryRoot && !approvalRoot && !deployRoot) return;
if (!root && !writerRoot && !cliRoot && !dbProbeRoot && !seedDiffRoot && !legacyBridgeRoot && !mcpReadinessRoot && !mcpPreflightRoot && !mcpActivationRoot && !mcpFetchGateRoot && !mcpCompletionRoot && !mcpActivationEvidenceRoot && !mcpRuntimeSmokeRoot && !mcpRuntimePromotionRoot && !mcpManualFetchHandoffRoot && !mcpFetchTargetReviewRoot && !mcpFetchRunPackageRoot && !mcpFetchRunReadinessRoot && !mcpFetchRunReceiptRoot && !mcpFetchResultParserReviewRoot && !mcpFetchCandidateHandoffReviewRoot && !mcpFetchCandidateQueueReviewRoot && !mcpFetchCandidateQueueWriterPreflightRoot && !mcpFetchCandidateQueueWriterCliReviewRoot && !mcpFetchCandidateQueueWriterRunPackageReviewRoot && !mcpFetchCandidateQueueWriterRunReadinessRoot && !mcpFetchCandidateQueueWriterRunReceiptReviewRoot && !mcpFetchCandidateQueueWriterRunCloseoutReviewRoot && !mcpFetchCandidateQueueWriterPostCloseoutInventoryReviewRoot && !mcpFetchCandidateQueueWriterReviewHandoffRoot && !mcpFetchCandidateQueueWriterReviewInventoryRoot && !mcpFetchCandidateQueueWriterReviewDecisionRoot && !mcpFetchCandidateQueueWriterReviewDecisionApprovalRoot && !mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot && !mcpProfessionalSourceGovernanceRoot && !manualSampleRoot && !sampleAcceptanceRoot && !sampleReviewRoot && !schedulerRoot && !matchReviewRoot && !opportunityRoot && !opportunityScoringRoot && !opportunityEvidenceRoot && !opportunityAlertRoot && !migrationRoot && !migrationDrillRoot && !catalogReviewRoot && !liveSmokeRoot && !liveInventoryRoot && !approvalRoot && !deployRoot) return;
const meta = root ? root.querySelector('[data-market-intel-preview-meta]') : null;
const body = root ? root.querySelector('[data-market-intel-preview-body]') : null;
@@ -1878,6 +1905,12 @@
const mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightReview = mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot ? mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot.querySelector('[data-market-intel-mcp-fetch-candidate-queue-writer-review-decision-approval-writer-preflight-review]') : null;
const mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRefresh = mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot ? mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightRoot.querySelector('[data-market-intel-mcp-fetch-candidate-queue-writer-review-decision-approval-writer-preflight-refresh]') : null;
const mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightEndpoint = "{{ url_for('market_intel.market_intel_mcp_fetch_candidate_queue_writer_review_decision_approval_writer_preflight') }}";
const mcpProfessionalSourceGovernanceMeta = mcpProfessionalSourceGovernanceRoot ? mcpProfessionalSourceGovernanceRoot.querySelector('[data-market-intel-mcp-professional-source-governance-meta]') : null;
const mcpProfessionalSourceGovernanceBody = mcpProfessionalSourceGovernanceRoot ? mcpProfessionalSourceGovernanceRoot.querySelector('[data-market-intel-mcp-professional-source-governance-body]') : null;
const mcpProfessionalSourceGovernanceInput = mcpProfessionalSourceGovernanceRoot ? mcpProfessionalSourceGovernanceRoot.querySelector('[data-market-intel-mcp-professional-source-governance-input]') : null;
const mcpProfessionalSourceGovernanceReview = mcpProfessionalSourceGovernanceRoot ? mcpProfessionalSourceGovernanceRoot.querySelector('[data-market-intel-mcp-professional-source-governance-review]') : null;
const mcpProfessionalSourceGovernanceRefresh = mcpProfessionalSourceGovernanceRoot ? mcpProfessionalSourceGovernanceRoot.querySelector('[data-market-intel-mcp-professional-source-governance-refresh]') : null;
const mcpProfessionalSourceGovernanceEndpoint = "{{ url_for('market_intel.market_intel_mcp_professional_source_governance') }}";
const manualSampleMeta = manualSampleRoot ? manualSampleRoot.querySelector('[data-market-intel-manual-sample-meta]') : null;
const manualSampleBody = manualSampleRoot ? manualSampleRoot.querySelector('[data-market-intel-manual-sample-body]') : null;
const manualSampleRefresh = manualSampleRoot ? manualSampleRoot.querySelector('[data-market-intel-manual-sample-refresh]') : null;
@@ -5767,6 +5800,126 @@
}
};
const renderMcpProfessionalSourceGovernanceMeta = data => {
const summary = data.source_governance_summary || {};
mcpProfessionalSourceGovernanceMeta.innerHTML = [
`mode=${data.mode || 'unknown'}`,
`accepted=${data.mcp_professional_source_governance_accepted ? 'yes' : 'no'}`,
`gates=${data.passed_gate_count || 0}/${data.gate_count || 0}`,
`sources=${summary.source_count || 0}`,
`platforms=${summary.platform_count || 0}`,
`network=${data.external_network_executed ? 'executed' : 'blocked'}`
].map(item => `<span class="market-intel-pill">${escapeHtml(item)}</span>`).join('');
};
const renderMcpProfessionalSourceGovernanceBody = data => {
const blockers = (data.blocked_reasons || []).join(' / ');
const gates = data.gates || [];
const sources = data.sources || data.source_governance_summary?.sources || [];
const contract = data.source_contract || {};
const practices = contract.mainstream_practices || [];
const steps = data.next_operator_steps || [];
const renderCheck = (key, label, status) => `
<div class="market-intel-check">
<div>
<strong>${escapeHtml(key)}</strong>
<small>${escapeHtml(label || '')}</small>
</div>
<span>${escapeHtml(status)}</span>
</div>
`;
mcpProfessionalSourceGovernanceBody.innerHTML = `
<div class="market-intel-empty mb-3">此治理層把 robots、sitemap、structured data、canonical URL、rate limit、公開資料邊界與 provenance 變成 source contractAPI 不抓外站、不讀 robots/sitemap、不寫 DB、不掛 scheduler。${blockers ? `阻擋:${escapeHtml(blockers)}` : ''}</div>
<div class="market-intel-deploy-grid">
<div data-market-intel-mcp-professional-source-governance-gates>
<p class="market-intel-deploy-section-title">SOURCE GATES</p>
<div class="market-intel-check-list">${
gates.length
? gates.map(item => renderCheck(item.key, item.label, item.passed ? 'PASS' : 'BLOCK')).join('')
: '<div class="market-intel-empty">尚未提供 source gates。</div>'
}</div>
</div>
<div data-market-intel-mcp-professional-source-governance-sources>
<p class="market-intel-deploy-section-title">PUBLIC SOURCES</p>
<div class="market-intel-check-list">${
sources.length
? sources.map(source => renderCheck(
`${source.platform_code || 'unknown'}:${source.source_key || 'missing'}`,
`${source.canonical_url || source.source_url || 'missing'} / delay=${source.crawl_delay_seconds || 0}s / requests=${source.max_requests_per_run || 0}`,
source.source_url_safe && source.robots_allowed && source.structured_data_preferred && source.evidence_artifact_path_safe ? 'READY' : 'BLOCK'
)).join('')
: '<div class="market-intel-empty">尚未提供公開來源。</div>'
}</div>
</div>
<div data-market-intel-mcp-professional-source-governance-practices>
<p class="market-intel-deploy-section-title">MAINSTREAM PRACTICES</p>
<div class="market-intel-check-list">${
practices.length
? practices.map(item => renderCheck(item.key, item.label, 'LOCKED')).join('')
: '<div class="market-intel-empty">尚未提供專業做法合約。</div>'
}</div>
</div>
<div data-market-intel-mcp-professional-source-governance-next>
<p class="market-intel-deploy-section-title">BOUNDARY / NEXT</p>
<div class="market-intel-check-list">
${renderCheck('next_gate', contract.next_gate || 'missing', contract.next_gate ? 'NEXT' : 'BLOCK')}
${renderCheck('api_boundary', 'no external fetch / no robots fetch / no sitemap fetch / no DB write / no file write / no scheduler', data.api_uses_external_network || data.api_fetches_robots_txt || data.api_fetches_sitemap || data.api_fetches_source_url || data.api_writes_database || data.api_writes_file || data.scheduler_attached ? 'BLOCK' : 'CLOSED')}
${steps.map((item, index) => renderCheck(`step_${index + 1}`, item, 'NEXT')).join('')}
</div>
</div>
</div>
`;
if (mcpProfessionalSourceGovernanceInput && !mcpProfessionalSourceGovernanceInput.value.trim() && data.sample_professional_source_governance_package) {
mcpProfessionalSourceGovernanceInput.value = JSON.stringify(data.sample_professional_source_governance_package, null, 2);
}
};
const loadMcpProfessionalSourceGovernance = async () => {
if (!mcpProfessionalSourceGovernanceMeta || !mcpProfessionalSourceGovernanceBody) return;
mcpProfessionalSourceGovernanceBody.innerHTML = '<div class="market-intel-empty">讀取 Professional Source Governance 中...</div>';
try {
const response = await fetch(mcpProfessionalSourceGovernanceEndpoint, { credentials: 'same-origin' });
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const data = await response.json();
renderMcpProfessionalSourceGovernanceMeta(data);
renderMcpProfessionalSourceGovernanceBody(data);
} catch (error) {
mcpProfessionalSourceGovernanceMeta.innerHTML = '<span class="market-intel-pill">error</span>';
mcpProfessionalSourceGovernanceBody.innerHTML = `<div class="market-intel-empty">Professional Source Governance 讀取失敗:${escapeHtml(error.message)}</div>`;
}
};
const reviewMcpProfessionalSourceGovernance = async () => {
if (!mcpProfessionalSourceGovernanceMeta || !mcpProfessionalSourceGovernanceBody || !mcpProfessionalSourceGovernanceInput) return;
let parsed;
try {
parsed = JSON.parse(mcpProfessionalSourceGovernanceInput.value || '{}');
} catch (error) {
mcpProfessionalSourceGovernanceMeta.innerHTML = '<span class="market-intel-pill">json_error</span>';
mcpProfessionalSourceGovernanceBody.innerHTML = `<div class="market-intel-empty">JSON 格式錯誤:${escapeHtml(error.message)}</div>`;
return;
}
mcpProfessionalSourceGovernanceBody.innerHTML = '<div class="market-intel-empty">審核 Professional Source Governance 中...</div>';
try {
const response = await fetch(mcpProfessionalSourceGovernanceEndpoint, {
method: 'POST',
credentials: 'same-origin',
headers: {
'Content-Type': 'application/json',
'X-CSRFToken': csrfToken
},
body: JSON.stringify({ professional_source_governance_package: parsed })
});
const data = await response.json();
if (!response.ok && !data.mode) throw new Error(`HTTP ${response.status}`);
renderMcpProfessionalSourceGovernanceMeta(data);
renderMcpProfessionalSourceGovernanceBody(data);
} catch (error) {
mcpProfessionalSourceGovernanceMeta.innerHTML = '<span class="market-intel-pill">error</span>';
mcpProfessionalSourceGovernanceBody.innerHTML = `<div class="market-intel-empty">Professional Source Governance 審核失敗:${escapeHtml(error.message)}</div>`;
}
};
const renderManualSampleMeta = data => {
manualSampleMeta.innerHTML = [
`mode=${data.mode || 'unknown'}`,
@@ -15308,6 +15461,12 @@
if (mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightReview) {
mcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflightReview.addEventListener('click', reviewMcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflight);
}
if (mcpProfessionalSourceGovernanceRefresh) {
mcpProfessionalSourceGovernanceRefresh.addEventListener('click', loadMcpProfessionalSourceGovernance);
}
if (mcpProfessionalSourceGovernanceReview) {
mcpProfessionalSourceGovernanceReview.addEventListener('click', reviewMcpProfessionalSourceGovernance);
}
if (manualSampleRefresh) {
manualSampleRefresh.addEventListener('click', loadManualSample);
}
@@ -15585,6 +15744,7 @@
loadMcpFetchCandidateQueueWriterReviewDecision();
loadMcpFetchCandidateQueueWriterReviewDecisionApproval();
loadMcpFetchCandidateQueueWriterReviewDecisionApprovalWriterPreflight();
loadMcpProfessionalSourceGovernance();
loadManualSample();
loadSampleAcceptance();
loadSampleReview();

File diff suppressed because it is too large Load Diff