fix(recovery): bound 110 pressure readback
Some checks failed
CD Pipeline / workflow-shape (push) Has been cancelled
CD Pipeline / cancel-stale-cd (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Some checks failed
CD Pipeline / workflow-shape (push) Has been cancelled
CD Pipeline / cancel-stale-cd (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
This commit is contained in:
@@ -25,6 +25,24 @@
|
||||
|
||||
**邊界**:只改 Gitea public queue readback / closure verifier / tests / LOGBOOK;未使用 GitHub / `gh` / GitHub API;未 workflow_dispatch;未讀 secret / token / `.env` / raw sessions / SQLite / auth;未讀 authorized_keys 內容或 `.runner` 內容;未執行 110 runtime apply;未 reboot / restart Docker / Nginx / K3s / DB / firewall。
|
||||
|
||||
## 2026-07-01 — 13:35 110 CPU/load 誤判修正與 cold-start bounded readback
|
||||
|
||||
**照主線修正的問題**:
|
||||
- live 110 仍顯示高 `node_load5`,但即時 Prometheus CPU mode 不是 12 核滿載;`docker_stats.prom` 的 mtime 可 stale,不能把舊 `docker_container_cpu_cores{container_name="gitea"}` 當成當下 CPU 元兇。
|
||||
- `scripts/reboot-recovery/diagnose-110-ssh-publickey-auth.sh` 現在輸出 `DOCKER_STATS_TEXTFILE_MTIME_SECONDS`、`DOCKER_STATS_TEXTFILE_AGE_SECONDS`、`DOCKER_STATS_TEXTFILE_FRESHNESS=fresh|stale|missing`,並把 `scrape_error timeout` / `systemctl_timeout_budget_exhausted` 明確分類為 110 systemd control-plane timeout。
|
||||
- `scripts/reboot-recovery/full-stack-cold-start-check.sh` 的 110 `systemctl` readback 已包進 3 秒 bounded helper;若 SSH 110 失敗,固定輸出 `SSH_110_BLOCKER remote_control_channel_unavailable` 與 `SSH_110_NEXT_ACTION local_console_run_recover_110_control_path_and_harbor_local_check`,避免 cold-start scorecard 卡成不可預期等待。
|
||||
- `scripts/ci/wait-host-web-build-pressure.sh` 不再用超過 300 秒的 `docker_stats.prom` 做 container CPU blocker / attribution,避免舊樣本把主線拉去錯誤的 Gitea CPU 支線。
|
||||
|
||||
**驗證**:
|
||||
- `python3.11 -m pytest scripts/reboot-recovery/tests/test_cold_start_monitor_bounded_probes.py ops/runner/test_cd_controlled_runtime_profile.py -q`:`44 passed`。
|
||||
- `bash -n scripts/reboot-recovery/full-stack-cold-start-check.sh scripts/reboot-recovery/diagnose-110-ssh-publickey-auth.sh scripts/ci/wait-host-web-build-pressure.sh`、`git diff --check`:通過。
|
||||
|
||||
**仍維持**:
|
||||
- 沒有讀 secret / token / `.env` / raw sessions / SQLite / auth;沒有讀 `.runner` / authorized_keys 內容;沒有使用 GitHub / `gh` / GitHub API;沒有 workflow_dispatch;沒有重啟主機、沒有 Docker / Nginx / K3s / DB restart、沒有 firewall change。
|
||||
|
||||
**下一步**:
|
||||
- 110 仍需走本機 console / 已恢復的 SSH control path 執行 `scripts/reboot-recovery/recover-110-control-path-and-harbor-local.sh --check`;在 external SSH userauth timeout 前,不再用 Harbor workflow 重試製造 queue 壓力。
|
||||
|
||||
## 2026-07-01 — 13:11 P0-7 Windows 99 VMware autostart controlled apply
|
||||
|
||||
**照主線修正的問題**:
|
||||
|
||||
@@ -70,6 +70,8 @@ v1.82 bounded summary rule:`post-start-quick-check.sh` 與 `188-host-hygiene-m
|
||||
|
||||
2026-06-25 21:14 StockPlatform natural-cron / full-wrapper refresh supersedes the 20:25 product-data blocker wording. After waiting for official schedules instead of manual ingestion, `intelligence-sync` 21:00 finished `status=0`, `core.margin_short_daily` reached `2026-06-25` / 1976 rows, and `ai-recommendation-pipeline` 21:10 finished `STOCKPLATFORM_AI_RECOMMENDATION_PIPELINE_OK as_of_date=2026-06-25` with `draft_count=120`, `candidate_count=120`, and `rag_documents=1000`. StockPlatform `/api/v1/system/freshness` now returns `status=ok`, `latest_trading_date=2026-06-25`, blockers `[]`, with price / chips / margin / AI recommendations all on `2026-06-25`. The 21:14 full wrapper returns cold-start `PASS=89 WARN=0 BLOCKED=0` and overall `POST_START_QUICK_CHECK PASS=38 WARN=2 BLOCKED=0`, `RESULT=FULL_STACK_GREEN_DR_ESCROW_BLOCKED`. The only remaining recovery red gate is DR credential escrow evidence `escrow_missing=5`; Wazuh manager registry accepted remains `0` as a security evidence blocker, not a reboot service blocker.
|
||||
|
||||
2026-07-01 13:35 110 CPU/load 判讀規則更新:`docker_stats.prom` 必須先看 mtime / freshness,超過 300 秒不得作為當下 container CPU 歸因或 blocker;若 110 `node_load5` 高但 Prometheus CPU mode 仍有 idle、`awoooi_host_gitea_actions_active_process_count=0`、orphan browser count=0,主 blocker 不得誤寫成 Gitea / Playwright / Stock smoke CPU。此時優先看 `diagnose-110-ssh-publickey-auth.sh` 的 `NODE_LOAD_CLASSIFIER`、`DOCKER_STATS_TEXTFILE_FRESHNESS` 與 `SYSTEMD_UNIT ... classifier=systemctl_show_timeout|systemctl_timeout_budget_exhausted`。外部 SSH userauth timeout 時,cold-start 必須輸出 `SSH_110_BLOCKER remote_control_channel_unavailable` 與 `SSH_110_NEXT_ACTION local_console_run_recover_110_control_path_and_harbor_local_check`;下一步是 110 本機 console / 已恢復 control path 執行 `recover-110-control-path-and-harbor-local.sh --check`,不是重跑 Harbor workflow 或用舊 docker stats 指認 Gitea。
|
||||
|
||||
2026-06-25 20:25 orphan Chrome cleanup / scorecard refresh supersedes the 20:11 CPU wording. 110 high CPU was traced to two `stockplatform-review-bulk-ux` Chrome process groups `2756503` and `2829627` with root Chrome process `PPID=1`, elapsed about 5h, no active parent smoke, and sustained GPU/renderer CPU. With user approval, only those two process groups received targeted `SIGTERM` at 20:24. Post-check showed no remaining PGID entries; `vmstat` showed CPU idle around `85-90%`, `si/so=0`, and no immediate swap thrash. No Docker/systemd/Nginx/firewall/K8s action, CI cancellation, manual data ingestion, manual DB write, Wazuh/SOC runtime change, or secret read was performed. The 20:25 full post-start wrapper then returned cold-start `PASS=89 WARN=0 BLOCKED=0`, but overall `POST_START_QUICK_CHECK PASS=37 WARN=2 BLOCKED=1`, `RESULT=BLOCKED`, because StockPlatform data freshness was still blocked at that time and DR remained incomplete.
|
||||
|
||||
2026-06-25 20:11 StockPlatform cron-source recovery supersedes the 19:35 source-version wording. StockPlatform Gitea `main` and live `/home/wooo/stockplatform-v2` are now at `fb91aa4c6272469d1d26e0820169629eac17d28a fix(ops): restore production cron recovery entrypoints`; six missing production cron entrypoint scripts are restored, `run-intelligence-sync.sh` contains the Docker-backed `psql` shim, and live contract check confirms every `scripts/ops/*.sh` referenced by `install-production-cron.sh` exists. The only live write performed for StockPlatform recovery was a fast-forward `git pull --ff-only origin main` on 110; no Docker/systemd/Nginx/firewall/K8s restart, manual ingestion run, manual DB write, or secret read was performed. Natural cron evidence after the pull is now green for the repaired entrypoints: `source-remediation-queue` 19:56 and 20:00 succeeded, `market-index-ingestion` 20:00 succeeded, `price-ingestion` 20:02 succeeded, `margin-short-ingestion` 20:05 succeeded, `chips-ingestion` 20:06 succeeded, and `ai-recommendation-pipeline` 20:10 ran but correctly produced the internal blocker `core_margin_short_daily_incomplete,official_margin_short_daily_official_pending`. StockPlatform `/api/v1/system/freshness` therefore still returns `status=blocked` because the 2026-06-25 official margin-short source is pending and `ai.recommendations` must stay on 2026-06-24 until that gate clears. This is no longer a route, source-version, or missing-cron-script blocker; it is a product-data freshness blocker waiting on official source availability and the next valid AI pipeline run.
|
||||
|
||||
@@ -10,6 +10,7 @@ CD_WORKFLOW = ROOT / ".gitea" / "workflows" / "cd.yaml"
|
||||
HARBOR_110_REPAIR_WORKFLOW = (
|
||||
ROOT / ".gitea" / "workflows" / "harbor-110-local-repair.yaml"
|
||||
)
|
||||
WAIT_HOST_PRESSURE = ROOT / "scripts" / "ci" / "wait-host-web-build-pressure.sh"
|
||||
|
||||
|
||||
def _workflow_text() -> str:
|
||||
@@ -117,6 +118,15 @@ def test_harbor_login_has_public_route_retry_and_safe_secret_transport() -> None
|
||||
assert "--password " not in block
|
||||
|
||||
|
||||
def test_host_pressure_gate_ignores_stale_docker_stats_for_cpu_attribution() -> None:
|
||||
text = WAIT_HOST_PRESSURE.read_text(encoding="utf-8")
|
||||
|
||||
assert "MAX_DOCKER_METRICS_AGE_SECONDS" in text
|
||||
assert 'mtime="$(stat -c %Y "$DOCKER_METRICS_FILE"' in text
|
||||
assert 'if [ "$age" -gt "$MAX_DOCKER_METRICS_AGE_SECONDS" ]; then' in text
|
||||
assert "docker_container_cpu_cores" in text
|
||||
|
||||
|
||||
def test_harbor_110_local_repair_workflow_is_dispatch_only_and_bounded() -> None:
|
||||
text = HARBOR_110_REPAIR_WORKFLOW.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
@@ -35,6 +35,7 @@ MAX_ACTIVE_CI_PROCESS_GROUPS="${HOST_WEB_BUILD_PRESSURE_MAX_ACTIVE_CI_PROCESS_GR
|
||||
MAX_ACTIVE_CI_CONTAINERS="${HOST_WEB_BUILD_PRESSURE_MAX_ACTIVE_CI_CONTAINERS:-1}"
|
||||
MAX_ORPHAN_BROWSER_GROUPS="${HOST_WEB_BUILD_PRESSURE_MAX_ORPHAN_BROWSER_GROUPS:-0}"
|
||||
MAX_POSTGRES_CPU_CORES="${HOST_WEB_BUILD_PRESSURE_MAX_POSTGRES_CPU_CORES:-2.0}"
|
||||
MAX_DOCKER_METRICS_AGE_SECONDS="${HOST_WEB_BUILD_PRESSURE_MAX_DOCKER_METRICS_AGE_SECONDS:-300}"
|
||||
POSTGRES_CONTAINER_NAME="${HOST_WEB_BUILD_PRESSURE_POSTGRES_CONTAINER:-k3s-postgres-recovery}"
|
||||
METRICS_FILE="${HOST_RUNAWAY_PROCESS_METRICS_FILE:-${HOST_WEB_BUILD_PRESSURE_METRICS_FILE:-/home/wooo/node_exporter_textfiles/host_runaway_process.prom}}"
|
||||
DEFAULT_DOCKER_METRICS_FILE="/home/$(id -un)/node_exporter_textfiles/docker_stats.prom"
|
||||
@@ -122,6 +123,13 @@ docker_metric_labeled_value() {
|
||||
if [ ! -r "$DOCKER_METRICS_FILE" ]; then
|
||||
return 1
|
||||
fi
|
||||
local now mtime age
|
||||
now="$(date +%s)"
|
||||
mtime="$(stat -c %Y "$DOCKER_METRICS_FILE" 2>/dev/null || stat -f %m "$DOCKER_METRICS_FILE" 2>/dev/null || echo 0)"
|
||||
age=$((now - mtime))
|
||||
if [ "$age" -gt "$MAX_DOCKER_METRICS_AGE_SECONDS" ]; then
|
||||
return 1
|
||||
fi
|
||||
awk -v metric="$name" -v key="$label_key" -v val="$label_value" '
|
||||
$1 ~ ("^" metric "\\{") && $0 ~ (key "=\"" val "\"") {
|
||||
value = $NF
|
||||
|
||||
@@ -67,13 +67,26 @@ probe_node_exporter() {
|
||||
echo "NODE_EXPORTER=ok"
|
||||
awk '
|
||||
$1 == "node_boot_time_seconds" {print "NODE_BOOT_TIME_SECONDS="$2}
|
||||
$1 == "node_time_seconds" {print "NODE_TIME_SECONDS="$2}
|
||||
$1 == "node_time_seconds" {node_time=$2; print "NODE_TIME_SECONDS="$2}
|
||||
$1 == "node_load1" {print "NODE_LOAD1="$2}
|
||||
$1 == "node_load5" {print "NODE_LOAD5="$2}
|
||||
$1 == "node_load15" {print "NODE_LOAD15="$2}
|
||||
$1 == "node_procs_blocked" {print "NODE_PROCS_BLOCKED="$2}
|
||||
$1 == "node_memory_MemAvailable_bytes" {print "NODE_MEM_AVAILABLE_BYTES="$2}
|
||||
$1 == "node_memory_MemTotal_bytes" {print "NODE_MEM_TOTAL_BYTES="$2}
|
||||
/^node_textfile_mtime_seconds/ && /docker_stats\.prom/ {docker_stats_mtime=$NF}
|
||||
END {
|
||||
if (docker_stats_mtime != "") {
|
||||
print "DOCKER_STATS_TEXTFILE_MTIME_SECONDS=" docker_stats_mtime
|
||||
if (node_time != "") {
|
||||
age = node_time - docker_stats_mtime
|
||||
printf "DOCKER_STATS_TEXTFILE_AGE_SECONDS=%.0f\n", age
|
||||
print "DOCKER_STATS_TEXTFILE_FRESHNESS=" (age <= 300 ? "fresh" : "stale")
|
||||
}
|
||||
} else {
|
||||
print "DOCKER_STATS_TEXTFILE_FRESHNESS=missing"
|
||||
}
|
||||
}
|
||||
' <<<"$metrics"
|
||||
cpu_count="$(awk -F'cpu="' '/^node_cpu_seconds_total/ {split($2, a, "\""); seen[a[1]]=1} END {for (cpu in seen) n++; print n+0}' <<<"$metrics")"
|
||||
load1="$(awk '$1 == "node_load1" {print $2; exit}' <<<"$metrics")"
|
||||
@@ -96,8 +109,10 @@ probe_node_exporter() {
|
||||
sub(/^.*sub_state="/, "", substate)
|
||||
sub(/".*$/, "", substate)
|
||||
classifier=active
|
||||
if (active == "scrape_error" && substate ~ /timed out/) {
|
||||
if (active == "scrape_error" && substate ~ /(timed out|timeout)/) {
|
||||
classifier="systemctl_show_timeout"
|
||||
} else if (active == "scrape_skipped" && substate ~ /systemctl_timeout_budget_exhausted/) {
|
||||
classifier="systemctl_timeout_budget_exhausted"
|
||||
}
|
||||
printf "SYSTEMD_UNIT unit=%s active_state=%s classifier=%s\n", unit, active, classifier
|
||||
}
|
||||
|
||||
@@ -312,36 +312,43 @@ check_110() {
|
||||
fi
|
||||
|
||||
if ! out=$(host_cmd "wooo@192.168.0.110" '
|
||||
sc() {
|
||||
if command -v timeout >/dev/null 2>&1; then
|
||||
timeout 3 systemctl "$@" 2>/dev/null || true
|
||||
else
|
||||
systemctl "$@" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
echo "HOST $(hostname) $(uptime)"
|
||||
echo "MEM $(free -h | awk "/Mem:/ {print \$2,\$3,\$7}")"
|
||||
echo "DOCKER_SYSTEMD $(systemctl is-active docker 2>/dev/null || true)"
|
||||
echo "DOCKER_SYSTEMD $(sc is-active docker)"
|
||||
echo "HARBOR_CODE $(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://127.0.0.1:5000/v2/ || true)"
|
||||
echo "GITEA_CODE $(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://127.0.0.1:3001/ || true)"
|
||||
echo "PROM_CODE $(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://127.0.0.1:9090/-/ready || true)"
|
||||
echo "AM_CODE $(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://127.0.0.1:9093/-/healthy || true)"
|
||||
echo "SENTRY_CODE $(curl -s -o /dev/null -w "%{http_code}" --max-time 8 http://127.0.0.1:9000/ || true)"
|
||||
echo "ACTION_RUNNER_UNIT_FILE_COUNT $(systemctl list-unit-files "actions.runner.*" --no-legend --plain 2>/dev/null | awk "END {print NR+0}")"
|
||||
echo "ACTION_RUNNER_ACTIVE_COUNT $(systemctl list-units "actions.runner.*" --state=active --no-legend --plain 2>/dev/null | awk "END {print NR+0}")"
|
||||
echo "ACTION_RUNNER_ENABLED_COUNT $(systemctl list-unit-files "actions.runner.*" --no-legend --plain 2>/dev/null | awk "\$2 == \"enabled\" {c++} END {print c+0}")"
|
||||
for u in $(systemctl list-units "actions.runner.*" --all --no-legend --plain 2>/dev/null | awk "{print \$1}"); do
|
||||
systemctl show "$u" -p ActiveState -p SubState -p CPUQuotaPerSecUSec -p MemoryMax -p WatchdogUSec -p NRestarts | sed "s/^/RUNNER $u /"
|
||||
echo "ACTION_RUNNER_UNIT_FILE_COUNT $(sc list-unit-files "actions.runner.*" --no-legend --plain | awk "END {print NR+0}")"
|
||||
echo "ACTION_RUNNER_ACTIVE_COUNT $(sc list-units "actions.runner.*" --state=active --no-legend --plain | awk "END {print NR+0}")"
|
||||
echo "ACTION_RUNNER_ENABLED_COUNT $(sc list-unit-files "actions.runner.*" --no-legend --plain | awk "\$2 == \"enabled\" {c++} END {print c+0}")"
|
||||
for u in $(sc list-units "actions.runner.*" --all --no-legend --plain | awk "{print \$1}"); do
|
||||
sc show "$u" -p ActiveState -p SubState -p CPUQuotaPerSecUSec -p MemoryMax -p WatchdogUSec -p NRestarts | sed "s/^/RUNNER $u /"
|
||||
done
|
||||
for u in awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do
|
||||
load=$(systemctl show "$u" -p LoadState --value 2>/dev/null || true)
|
||||
unitfile=$(systemctl show "$u" -p UnitFileState --value 2>/dev/null || true)
|
||||
active=$(systemctl show "$u" -p ActiveState --value 2>/dev/null || true)
|
||||
mainpid=$(systemctl show "$u" -p MainPID --value 2>/dev/null || true)
|
||||
load=$(sc show "$u" -p LoadState --value)
|
||||
unitfile=$(sc show "$u" -p UnitFileState --value)
|
||||
active=$(sc show "$u" -p ActiveState --value)
|
||||
mainpid=$(sc show "$u" -p MainPID --value)
|
||||
unit_ok=0
|
||||
if [ "$load" = "masked" ] && [ "$unitfile" = "masked" ] && [ "$active" = "inactive" ]; then
|
||||
unit_ok=1
|
||||
fi
|
||||
echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active mainpid=$mainpid ok=$unit_ok"
|
||||
done
|
||||
cd_lane_load=$(systemctl show awoooi-cd-lane.service -p LoadState --value 2>/dev/null || true)
|
||||
cd_lane_unitfile=$(systemctl show awoooi-cd-lane.service -p UnitFileState --value 2>/dev/null || true)
|
||||
cd_lane_active=$(systemctl show awoooi-cd-lane.service -p ActiveState --value 2>/dev/null || true)
|
||||
cd_lane_mainpid=$(systemctl show awoooi-cd-lane.service -p MainPID --value 2>/dev/null || true)
|
||||
cd_lane_execstart=$(systemctl show awoooi-cd-lane.service -p ExecStart --value 2>/dev/null || true)
|
||||
cd_lane_load=$(sc show awoooi-cd-lane.service -p LoadState --value)
|
||||
cd_lane_unitfile=$(sc show awoooi-cd-lane.service -p UnitFileState --value)
|
||||
cd_lane_active=$(sc show awoooi-cd-lane.service -p ActiveState --value)
|
||||
cd_lane_mainpid=$(sc show awoooi-cd-lane.service -p MainPID --value)
|
||||
cd_lane_execstart=$(sc show awoooi-cd-lane.service -p ExecStart --value)
|
||||
cd_lane_sentinel=missing
|
||||
[ -e /run/awoooi-cd-lane-enabled ] && cd_lane_sentinel=present
|
||||
cd_lane_capacity_ok=0
|
||||
@@ -372,16 +379,16 @@ elif [ "$cd_lane_sentinel" = "present" ] && [ "$cd_lane_active" = "active" ] &&
|
||||
cd_lane_mode=controlled_open
|
||||
fi
|
||||
echo "CD_LANE_CONTROLLED mode=$cd_lane_mode load=$cd_lane_load unitfile=$cd_lane_unitfile active=$cd_lane_active mainpid=$cd_lane_mainpid sentinel=$cd_lane_sentinel capacity=$cd_lane_capacity_ok labels=$cd_lane_labels_ok binary_elf=$cd_lane_binary_elf process_count=$cd_lane_process_count ok=$cd_lane_ok"
|
||||
cd_lane_drain_load=$(systemctl show awoooi-cd-lane-drain.service -p LoadState --value 2>/dev/null || true)
|
||||
cd_lane_drain_unitfile=$(systemctl show awoooi-cd-lane-drain.service -p UnitFileState --value 2>/dev/null || true)
|
||||
cd_lane_drain_active=$(systemctl show awoooi-cd-lane-drain.service -p ActiveState --value 2>/dev/null || true)
|
||||
cd_lane_drain_mainpid=$(systemctl show awoooi-cd-lane-drain.service -p MainPID --value 2>/dev/null || true)
|
||||
cd_lane_drain_cpu_accounting=$(systemctl show awoooi-cd-lane-drain.service -p CPUAccounting --value 2>/dev/null || true)
|
||||
cd_lane_drain_cpu_quota=$(systemctl show awoooi-cd-lane-drain.service -p CPUQuotaPerSecUSec --value 2>/dev/null || true)
|
||||
cd_lane_drain_memory_accounting=$(systemctl show awoooi-cd-lane-drain.service -p MemoryAccounting --value 2>/dev/null || true)
|
||||
cd_lane_drain_memory_max=$(systemctl show awoooi-cd-lane-drain.service -p MemoryMax --value 2>/dev/null || true)
|
||||
cd_lane_drain_tasks_accounting=$(systemctl show awoooi-cd-lane-drain.service -p TasksAccounting --value 2>/dev/null || true)
|
||||
cd_lane_drain_tasks_max=$(systemctl show awoooi-cd-lane-drain.service -p TasksMax --value 2>/dev/null || true)
|
||||
cd_lane_drain_load=$(sc show awoooi-cd-lane-drain.service -p LoadState --value)
|
||||
cd_lane_drain_unitfile=$(sc show awoooi-cd-lane-drain.service -p UnitFileState --value)
|
||||
cd_lane_drain_active=$(sc show awoooi-cd-lane-drain.service -p ActiveState --value)
|
||||
cd_lane_drain_mainpid=$(sc show awoooi-cd-lane-drain.service -p MainPID --value)
|
||||
cd_lane_drain_cpu_accounting=$(sc show awoooi-cd-lane-drain.service -p CPUAccounting --value)
|
||||
cd_lane_drain_cpu_quota=$(sc show awoooi-cd-lane-drain.service -p CPUQuotaPerSecUSec --value)
|
||||
cd_lane_drain_memory_accounting=$(sc show awoooi-cd-lane-drain.service -p MemoryAccounting --value)
|
||||
cd_lane_drain_memory_max=$(sc show awoooi-cd-lane-drain.service -p MemoryMax --value)
|
||||
cd_lane_drain_tasks_accounting=$(sc show awoooi-cd-lane-drain.service -p TasksAccounting --value)
|
||||
cd_lane_drain_tasks_max=$(sc show awoooi-cd-lane-drain.service -p TasksMax --value)
|
||||
cd_lane_drain_limits_ok=0
|
||||
if [ "$cd_lane_drain_cpu_accounting" = "yes" ] \
|
||||
&& [ -n "$cd_lane_drain_cpu_quota" ] && [ "$cd_lane_drain_cpu_quota" != "infinity" ] \
|
||||
@@ -442,6 +449,8 @@ done
|
||||
docker ps --format "DOCKER {{.Names}}\t{{.Status}}" | head -120
|
||||
' 2>&1); then
|
||||
fail "ssh 110 read-only check"
|
||||
echo "SSH_110_BLOCKER remote_control_channel_unavailable"
|
||||
echo "SSH_110_NEXT_ACTION local_console_run_recover_110_control_path_and_harbor_local_check"
|
||||
echo "$out"
|
||||
return
|
||||
fi
|
||||
|
||||
@@ -27,6 +27,9 @@ def test_full_stack_cold_start_check_bounds_ssh_probes() -> None:
|
||||
assert "-o ServerAliveCountMax=1" in text
|
||||
assert "timeout ${SSH_COMMAND_TIMEOUT_SECONDS}s bash -lc" in text
|
||||
assert "printf -v quoted_cmd '%q' \"$cmd\"" in text
|
||||
assert 'timeout 3 systemctl "$@"' in text
|
||||
assert "SSH_110_BLOCKER remote_control_channel_unavailable" in text
|
||||
assert "SSH_110_NEXT_ACTION local_console_run_recover_110_control_path_and_harbor_local_check" in text
|
||||
|
||||
|
||||
def test_recovery_scorecard_bounds_offsite_evidence_ssh() -> None:
|
||||
@@ -118,8 +121,11 @@ def test_110_ssh_publickey_auth_diagnosis_is_bounded_and_read_only() -> None:
|
||||
assert "NODE_EXPORTER=ok" in text
|
||||
assert "NODE_LOAD1_PER_CPU" in text
|
||||
assert "NODE_LOAD_CLASSIFIER" in text
|
||||
assert "DOCKER_STATS_TEXTFILE_AGE_SECONDS" in text
|
||||
assert "DOCKER_STATS_TEXTFILE_FRESHNESS" in text
|
||||
assert "SYSTEMD_UNIT unit=%s active_state=%s classifier=%s" in text
|
||||
assert "systemctl_show_timeout" in text
|
||||
assert "systemctl_timeout_budget_exhausted" in text
|
||||
assert "cat /home" not in text
|
||||
assert "cat ~/.ssh/authorized_keys" not in text
|
||||
assert "cat \"$home_dir/.ssh/authorized_keys\"" not in text
|
||||
|
||||
Reference in New Issue
Block a user