docs(recovery): record conversation event index apply
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Failing after 3m18s
AWOOOI Harbor 110 Local Repair / workflow-shape (push) Successful in 0s
AI 技術雷達監控 / ai-technology-watch (push) Successful in 36s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
AWOOOI Harbor 110 Local Repair / harbor-110-local-repair (push) Has been cancelled

This commit is contained in:
Your Name
2026-07-01 07:58:56 +08:00
parent 70edab2b9a
commit 1c6bc6ce08
3 changed files with 82 additions and 2 deletions

View File

@@ -51016,3 +51016,20 @@ production browser smoke:
- 沒有讀 secret / token / `.env` / raw sessions / SQLite / auth沒有讀 `.runner` 內容。
- 沒有使用 GitHub / gh / GitHub API / GitHub Actions。
- 沒有重啟主機,沒有 Docker / Nginx / K3s / DB restart沒有 workflow_dispatch沒有 DB write / restore / prune。
## 2026-07-01 — 07:58 P0 188 hot-path index controlled apply
**完成內容**
- 回答「為什麼還是高」的 live 證據source migration 已推送但 runtime DB 尚未套用;`awooop_conversation_event` live index 清單仍只有 pkey、`idx_conv_event_run``idx_conv_event_subject``uix_conv_event_dedup`CD `#4182` 已失敗 / backloggedHarbor repair `#4176` 仍 Waiting 且缺 `awoooi-host`
- 188 preflight`awooop_conversation_event` table size 約 `93 MB`;套用前 `k3s-postgres-recovery``7.9277` CPU cores、188 `load5=10.57`
- 已在 188 `k3s-postgres-recovery` container 以 postgres local socket 執行 repo migration `apps/api/migrations/awooop_conversation_event_hot_path_indexes_2026-07-01.sql`;只執行 `CREATE INDEX CONCURRENTLY IF NOT EXISTS``lock_timeout=5s``statement_timeout=0`
- post-apply verifier12 個新 hot-path indexes 全部 `indisvalid=true` / `indisready=true``pg_stat_activity` 收斂為 idle `35`、unknown `5`、active `1`;第一次 CPU readback `k3s-postgres-recovery=1.0552` cores20 秒後已不在 188 top 3 CPU containers。
- 新增 receipt `docs/operations/awooop-conversation-event-hot-path-index-apply-receipt-2026-07-01.snapshot.json`,並把這次經驗寫入 `docs/runbooks/FULL-STACK-COLD-START-SOP.md` v1.84。
**仍維持**
- 沒有讀 secret / token / `.env` / raw sessions / SQLite / auth沒有讀 `.runner` 內容。
- 沒有使用 GitHub / gh / GitHub API / GitHub Actions。
- 沒有重啟主機,沒有 Docker / Nginx / K3s / DB restart沒有 workflow_dispatch沒有 DROP / TRUNCATE / restore / prune。
**下一步**
- 188 DB CPU 已降110 仍高,原因仍是 `gitea` / queue / `awoooi-host` control path110 `load5=27.22``gitea=3.4019` cores、Harbor repair `#4176 Waiting`、no matching `awoooi-host`。主線下一步繼續 110 Gitea queue / controlled lane recovery不恢復 generic runner、不重啟主機。

View File

@@ -0,0 +1,61 @@
{
"schema_version": "awooop_conversation_event_hot_path_index_apply_receipt_v1",
"generated_at": "2026-07-01T07:58:00+08:00",
"scope": "188:k3s-postgres-recovery:awoooi_prod:awooop_conversation_event",
"source_commit": "c29771a2d1b592e94fe3a1051b3a9d3842ec20f4",
"migration": "apps/api/migrations/awooop_conversation_event_hot_path_indexes_2026-07-01.sql",
"rollback": "apps/api/migrations/awooop_conversation_event_hot_path_indexes_2026-07-01_down.sql",
"operation": {
"type": "controlled_db_migration",
"statements": "CREATE INDEX CONCURRENTLY IF NOT EXISTS only",
"lock_timeout": "5s",
"statement_timeout": "0",
"runtime_write_performed": true,
"destructive_db_operation_performed": false,
"drop_truncate_restore_performed": false,
"service_restart_performed": false,
"secret_value_read": false,
"runner_token_read": false
},
"pre_apply": {
"table_size": "93 MB",
"indexes_present": [
"awooop_conversation_event_pkey",
"idx_conv_event_run",
"idx_conv_event_subject",
"uix_conv_event_dedup"
],
"k3s_postgres_recovery_cpu_cores": 7.9277,
"host_188_load5": 10.57
},
"post_apply": {
"indexes_valid_ready": [
"idx_awooop_conv_event_project_provider_event_recent",
"idx_awooop_conv_event_project_provider_lower_recent",
"idx_awooop_conv_event_project_provider_recent",
"idx_awooop_conv_event_project_run_id_text_recent",
"idx_awooop_conv_event_source_refs_alert_ids_gin",
"idx_awooop_conv_event_source_refs_approval_ids_gin",
"idx_awooop_conv_event_source_refs_event_ids_gin",
"idx_awooop_conv_event_source_refs_fingerprints_gin",
"idx_awooop_conv_event_source_refs_incident_ids_gin",
"idx_awooop_conv_event_source_refs_sentry_issue_ids_gin",
"idx_awooop_conv_event_source_refs_signoz_alerts_gin",
"idx_conv_event_recent"
],
"pg_stat_activity": {
"idle": 35,
"unknown": 5,
"active": 1
},
"k3s_postgres_recovery_cpu_cores_after_first_readback": 1.0552,
"k3s_postgres_recovery_top3_after_20_seconds": false,
"host_188_load5_after_20_seconds": 10.04
},
"remaining_blockers": [
"host_110_gitea_cpu_pressure",
"harbor_110_repair_no_matching_runner:awoooi-host",
"cd_4182_failure_or_waiting_backlog"
],
"safe_next_step": "continue_110_gitea_queue_control_path_recovery_without_generic_runner_or_host_reboot"
}

View File

@@ -1,7 +1,7 @@
# AWOOOI 全棧冷啟動與主機重啟 SOP
> Version: v1.83
> Last updated: 2026-06-30 Asia/Taipei
> Version: v1.84
> Last updated: 2026-07-01 Asia/Taipei
> Scope: 110 / 120 / 121 / 188 full-stack reboot recovery. 112 Kali is recorded as P3 optional and is not part of this recovery path.
---
@@ -22,6 +22,8 @@ v1.80 / v1.81 credential escrow intake scorecard rule同一輪 owner response
v1.83 Gitea CD running retry rule`read-public-gitea-actions-queue.py --json` 必須同時看 `latest_visible_cd_failure_classifier``latest_visible_cd_inflight_classifier`。final `BLOCKER harbor_registry_public_route_unavailable` 尚未出現時,只要 `latest_visible_cd_harbor_public_route_retrying_unavailable=true``latest_visible_cd_harbor_latest_registry_v2_status` 不是 `200/401`,就把它當作 in-flight production deploy blocker evidence若 Harbor repair workflow 同時 `Waiting` 或 no-matching `awoooi-host`,下一步是恢復 110 local repair control path而不是等 CD timeout、重跑無效 CD、workflow_dispatch或把 `Running` 當作版本最新。
2026-07-01 07:58 live host-pressure update188 持續高 CPU 的原因不是一般重啟噪音,而是 `k3s-postgres-recovery``awooop_conversation_event` hot-path index drift。live DB 原本只剩 `awooop_conversation_event_pkey``idx_conv_event_run``idx_conv_event_subject``uix_conv_event_dedup`,缺 base `idx_conv_event_recent` 與 provider/source_refs hot-path indexes`k3s-postgres-recovery` 當時約 `7.9277` CPU cores、188 `load5=10.57`。已依 `apps/api/migrations/awooop_conversation_event_hot_path_indexes_2026-07-01.sql` 走 controlled DB migration僅執行 `CREATE INDEX CONCURRENTLY IF NOT EXISTS``lock_timeout=5s`,無 DROP / TRUNCATE / restore / DB restart / Docker restart / secret read。post-apply verifier 顯示 12 個新索引全部 `indisvalid=true``indisready=true``pg_stat_activity` 收斂到 active `1`,第一次讀回 `k3s-postgres-recovery` 降到約 `1.0552` cores20 秒後已不在 188 top 3 CPU containers。receipt`docs/operations/awooop-conversation-event-hot-path-index-apply-receipt-2026-07-01.snapshot.json`。110 仍高不是同一個 DB 問題110 `gitea` 仍約 `3.4019` corespublic queue 仍是 `blocked_harbor_110_repair_no_matching_runner` / `awoooi-host`;下一步固定為 110 Gitea queue / controlled lane recovery不得恢復 generic runner、不得重啟主機。
v1.82 bounded summary rule`post-start-quick-check.sh``188-host-hygiene-maintenance-checklist.sh` 的 SSH helper 必須有 command timeout、single connection attempt、ServerAlive 與 no password prompt任何 110 / 188 read-only control path 卡住時,都要收斂成 blocker / evidence而不是讓 `post-reboot-readiness-summary.sh` 無限等待。若 backup / escrow 證據讀不到,`ESCROW_MISSING_COUNT=unknown` 必須同時輸出 `DR_ESCROW_BLOCKED=1``DR_ESCROW_EVIDENCE_UNKNOWN=1`,並把 `backup_core_readback_recovery``credential_escrow_evidence` 放進 `NEXT_REQUIRED_GATES`unknown 不得被解讀為 DR 或 backup green。
2026-06-29 09:13 previous live summary`scripts/reboot-recovery/post-reboot-readiness-summary.sh --no-color` artifact `/tmp/awoooi-post-reboot-readiness-20260629-091918/summary.txt` 回傳 `POST_START_RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED``POST_START_SERVICE_WARNINGS=0``SERVICE_GREEN=1``PRODUCT_DATA_GREEN=1``STOCK_FRESHNESS_STATUS=ok``STOCK_LATEST_TRADING_DATE=2026-06-26``BACKUP_CORE_GREEN=1``HOST_188_HYGIENE_BLOCKED=0``WAZUH_MANAGER_REGISTRY_ACCEPTED=6``RUNTIME_ACTION_AUTHORIZED=0``NEXT_REQUIRED_GATES=credential_escrow_evidence`。此 baseline 已被 2026-06-30 20:18 全主機重啟後 evidence 覆蓋,不得再拿來宣稱目前 green。