ops(runner): inventory workflow labels [skip ci]
This commit is contained in:
@@ -1,3 +1,64 @@
|
||||
## 2026-05-24|T141 workflow label matrix
|
||||
|
||||
**觸發**:
|
||||
|
||||
- T140 只回答 live runner config:同一個 110 host runner 同時宣告 `awoooi-host`、`ewoooc-host` 與 `ubuntu-latest`。
|
||||
- 下一步若要 runner label isolation,必須先知道各 repo workflow 實際用哪些 `runs-on`,避免把 `ewoooc` 或 `stockplatform-v2` 的 CI/CD 直接切斷。
|
||||
|
||||
**修正**:
|
||||
|
||||
- 新增 `ops/runner/audit-workflow-labels.py`。
|
||||
- 只讀 Gitea `.gitea/workflows/*.yml` / `.yaml`,擷取 `runs-on`。
|
||||
- Gitea auth 從環境或目前 repo `gitea` remote 解析,token 不輸出。
|
||||
- Gitea 不可讀時可用 `--local-repo OWNER/NAME=/path/to/repo` fallback。
|
||||
- 輸出 per workflow line inventory、label summary、inventory warnings。
|
||||
- `ops/runner/README.md` 補第七層 workflow label matrix 與隔離判讀。
|
||||
|
||||
**Verification**:
|
||||
|
||||
```text
|
||||
python3 -m py_compile ops/runner/audit-workflow-labels.py -> pass
|
||||
ops/runner/audit-workflow-labels.py --local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2 -> pass
|
||||
|
||||
Evidence:
|
||||
wooo/awoooi:
|
||||
awoooi-host -> .gitea/workflows/cd.yaml lines 64 / 313 / 1132
|
||||
ubuntu-latest -> ansible-lint, cd-dev, code-review, deploy-alerts, e2e-health, run-migration, type-sync
|
||||
|
||||
wooo/ewoooc:
|
||||
ewoooc-host -> .gitea/workflows/cd.yaml line 67
|
||||
|
||||
wooo/stockplatform-v2:
|
||||
ubuntu-latest -> .gitea/workflows/ci.yaml lines 12 / 23
|
||||
Gitea API for stockplatform-v2 returned 404 in this session; local repo fallback used.
|
||||
```
|
||||
|
||||
**判讀**:
|
||||
|
||||
- AWOOI production CD 已用 `awoooi-host`,但 AWOOI code-review / health / aux workflows 仍走 shared `ubuntu-latest`。
|
||||
- EwoooC CD 明確使用 `ewoooc-host`,而這個 foreign label 仍在 110 同一個 user-level runner config 內。
|
||||
- Stockplatform-v2 CI 走 `ubuntu-latest`,會和 AWOOI 的 non-CD workflows 共用同一條 runner queue。
|
||||
- 真正修復不是在同一份 config 繼續加 label,而是 runner registration / service split,或將非 AWOOI repo 搬到獨立 runner;在替代 runner ready 前不可直接移除 `ewoooc-host` 或 `ubuntu-latest`。
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
- AwoooP 告警可觀測鏈:99.998%。
|
||||
- Incident-level source correlation 可見性:98.8%。
|
||||
- Source correlation apply 狀態鏈可驗證性:99.72%。
|
||||
- Source correlation freshness / rolling gate:98.2%。
|
||||
- 前端 AI 自動化管理介面同步:99.999%。
|
||||
- Dashboard snapshot / SSE console noise 收斂:99.2%。
|
||||
- CI/CD runner hygiene:99.5%。
|
||||
- Runner ownership 收斂:96%。
|
||||
- Runner pool inventory:70% → 82%。
|
||||
- Workflow label matrix:0% → 85%。
|
||||
- API image build layer hygiene:88%。
|
||||
- Deploy rollout-risk 可觀測性:91%。
|
||||
- CI/CD evidence 前端可見性:92%。
|
||||
- Pipeline stage 可觀測性:88%。
|
||||
- Build host pressure治理:86%。
|
||||
- 完整 AI 自動化管理產品化:99.965% → 99.966%。
|
||||
|
||||
## 2026-05-24|T140 runner pool live inventory
|
||||
|
||||
**觸發**:
|
||||
|
||||
@@ -2665,6 +2665,14 @@ Phase 6 完成後
|
||||
- 判讀:T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner,而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.2%;Runner ownership 收斂約 96%;Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
|
||||
|
||||
**T141 workflow label matrix(2026-05-24 台北)**:
|
||||
- 觸發:T140 只回答 110 live runner config,尚未回答各 repo workflow 實際使用哪些 `runs-on`。要做 runner label isolation 前,必須避免切斷 `ewoooc` 或 `stockplatform-v2` 的 CI/CD。
|
||||
- 修正:新增 `ops/runner/audit-workflow-labels.py`,只讀 Gitea `.gitea/workflows/*.yml` / `.yaml` 並擷取 `runs-on`;Gitea auth 由 env 或目前 repo `gitea` remote 解析,token 不輸出;Gitea 不可讀時可用 `--local-repo OWNER/NAME=/path/to/repo` fallback。`ops/runner/README.md` 補第七層 workflow label matrix。
|
||||
- Evidence:AWOOI `.gitea/workflows/cd.yaml` 三個 job 使用 `awoooi-host`,但 code-review / e2e-health / deploy-alerts / cd-dev / ansible-lint / type-sync / run-migration 使用 `ubuntu-latest`;EwoooC `.gitea/workflows/cd.yaml` 使用 `ewoooc-host`;stockplatform-v2 local `.gitea/workflows/ci.yaml` 兩個 job 使用 `ubuntu-latest`,本 session Gitea API 對該 repo 回 404,因此用 local fallback。
|
||||
- 判讀:AWOOI production CD label 已專用,但 runner queue 仍共享,因為同一個 110 runner service 同時宣告 `awoooi-host`、`ewoooc-host`、`ubuntu-latest`。真正修復是 runner registration / service split 或將非 AWOOI repo 搬到獨立 runner;不可只在同一份 config 繼續加 label,也不可在替代 runner ready 前直接移除 `ewoooc-host` 或 `ubuntu-latest`。
|
||||
- Verification:`python3 -m py_compile ops/runner/audit-workflow-labels.py` pass;`ops/runner/audit-workflow-labels.py --local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2` pass。
|
||||
- 目前進度更新:AwoooP 告警可觀測鏈約 99.998%;Incident-level source correlation 可見性約 98.8%;Source correlation apply 狀態鏈可驗證性約 99.72%;Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%;Dashboard snapshot / SSE console noise 收斂約 99.2%;CI/CD runner hygiene 約 99.5%;Runner ownership 收斂約 96%;Runner pool inventory 約 82%;Workflow label matrix 約 85%;API image build layer hygiene 約 88%;Deploy rollout-risk 可觀測性約 91%;CI/CD evidence 前端可見性約 92%;Pipeline stage 可觀測性約 88%;Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.966%。
|
||||
|
||||
**T140 runner pool live inventory(2026-05-24 台北)**:
|
||||
- 觸發:T139 已把 CI/CD stage transition 寫回 AwoooP Deployments,但 shared runner pool 仍是部署證據與 post-deploy queue 的風險來源。直接改 runner labels 會影響 `ewoooc` / `stockplatform-v2` 等 repo,因此先建立可重跑的 live inventory。
|
||||
- 修正:新增 `ops/runner/audit-runner-pool.sh`,只讀盤點 `gitea-act-runner-host.service`、`/home/wooo/act-runner/config.yaml` labels、Docker-wrapped `gitea-runner`、active `GITEA-ACTIONS-TASK-*` containers、近 2 小時 runner journal repo counts。`ops/runner/README.md` 補第六層 shared runner label inventory,明確禁止把 `capacity: 2` 當修復。
|
||||
|
||||
@@ -296,6 +296,42 @@ recent 2h repo counts: none
|
||||
- 下一步應先讀各 repo workflow 實際使用的 labels,再規劃 repo label isolation 或獨立 runner
|
||||
registration;不可在沒有替代 runner 前直接移除 live `ewoooc-host`。
|
||||
|
||||
### 第七層修復: workflow label matrix
|
||||
|
||||
Runner config 只能看到「這台 runner 願意接什麼 label」,不能回答「哪些 repo 實際在使用」。
|
||||
T141 新增 workflow label 盤點工具:
|
||||
|
||||
```bash
|
||||
ops/runner/audit-workflow-labels.py \
|
||||
--local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2
|
||||
```
|
||||
|
||||
工具會透過 Gitea API 讀 `.gitea/workflows/*.yml` / `.yaml` 的 `runs-on`,Gitea 不可讀時可指定
|
||||
local fallback;Gitea token 只從 env 或目前 repo `gitea` remote 解析,永不輸出。
|
||||
|
||||
T141 evidence 摘要(2026-05-24 台北):
|
||||
|
||||
```text
|
||||
wooo/awoooi:
|
||||
awoooi-host: cd.yaml tests / build-and-deploy / post-deploy-checks
|
||||
ubuntu-latest: code-review, e2e-health, deploy-alerts, cd-dev, ansible-lint, type-sync, run-migration
|
||||
|
||||
wooo/ewoooc:
|
||||
ewoooc-host: cd.yaml deploy
|
||||
|
||||
wooo/stockplatform-v2:
|
||||
ubuntu-latest: ci.yaml hygiene / frontend
|
||||
```
|
||||
|
||||
風險判讀:
|
||||
|
||||
- `awoooi-host` 已經是 AWOOI CD 專用 label,但同一個 runner service 仍同時宣告
|
||||
`ewoooc-host` 與 `ubuntu-latest`,所以 runner queue 仍共享。
|
||||
- `ubuntu-latest` 是最主要共享入口;AWOOI code-review / e2e-health 與 stockplatform-v2 CI
|
||||
仍可能互相排隊。
|
||||
- 下一步若要真正隔離,必須做新的 runner registration / service split,或把非 AWOOI repo 移到
|
||||
另一台 runner。不可只在同一個 runner config 加更多 label,因為 `capacity: 1` 仍是同一條隊列。
|
||||
|
||||
---
|
||||
版本: v2.0 | 更新: 2026-03-29 | 作者: Claude Code
|
||||
變更: v1.0→v2.0 序列建構取代 Job Concurrency Groups
|
||||
|
||||
259
ops/runner/audit-workflow-labels.py
Executable file
259
ops/runner/audit-workflow-labels.py
Executable file
@@ -0,0 +1,259 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Read-only inventory for Gitea workflow runner labels.
|
||||
|
||||
The script never prints credentials. It reads workflow files from Gitea when
|
||||
GITEA_BASE/GITEA_USER/GITEA_TOKEN are available, or derives them from the
|
||||
current repository's `gitea` remote when that remote embeds basic auth.
|
||||
|
||||
Example:
|
||||
ops/runner/audit-workflow-labels.py \
|
||||
--local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import base64
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
|
||||
DEFAULT_REPOS = ("wooo/awoooi", "wooo/ewoooc", "wooo/stockplatform-v2")
|
||||
WORKFLOW_DIRS = (".gitea/workflows",)
|
||||
RUNS_ON_RE = re.compile(r"^\s*runs-on:\s*(?P<label>.+?)\s*(?:#.*)?$")
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class GiteaAuth:
|
||||
base: str
|
||||
user: str
|
||||
token: str
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WorkflowLabel:
|
||||
repo: str
|
||||
source: str
|
||||
branch: str
|
||||
file_path: str
|
||||
line_number: int
|
||||
label: str
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument(
|
||||
"--repo",
|
||||
action="append",
|
||||
dest="repos",
|
||||
help="Repository in owner/name form. Defaults to AWOOI, EwoooC, stockplatform-v2.",
|
||||
)
|
||||
parser.add_argument("--branch", default="main", help="Branch/ref to inspect. Default: main.")
|
||||
parser.add_argument(
|
||||
"--local-repo",
|
||||
action="append",
|
||||
default=[],
|
||||
metavar="OWNER/NAME=PATH",
|
||||
help="Local fallback repository path used when Gitea content is unavailable.",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def derive_gitea_auth() -> GiteaAuth | None:
|
||||
try:
|
||||
remote_url = subprocess.check_output(
|
||||
["git", "remote", "get-url", "gitea"],
|
||||
text=True,
|
||||
stderr=subprocess.DEVNULL,
|
||||
).strip()
|
||||
except (OSError, subprocess.CalledProcessError):
|
||||
return None
|
||||
|
||||
match = re.match(r"http://([^:]+):([^@]+)@([^/]+)", remote_url)
|
||||
if not match:
|
||||
return None
|
||||
user, token, host = match.groups()
|
||||
return GiteaAuth(base=f"http://{host}", user=user, token=token)
|
||||
|
||||
|
||||
def build_opener(auth: GiteaAuth) -> urllib.request.OpenerDirector:
|
||||
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
|
||||
password_mgr.add_password(None, auth.base, auth.user, auth.token)
|
||||
return urllib.request.build_opener(urllib.request.HTTPBasicAuthHandler(password_mgr))
|
||||
|
||||
|
||||
def get_json(opener: urllib.request.OpenerDirector, auth: GiteaAuth, path: str) -> object:
|
||||
with opener.open(auth.base + path, timeout=10) as response:
|
||||
return json.load(response)
|
||||
|
||||
|
||||
def parse_runs_on(repo: str, source: str, branch: str, file_path: str, content: str) -> list[WorkflowLabel]:
|
||||
labels: list[WorkflowLabel] = []
|
||||
for line_number, line in enumerate(content.splitlines(), start=1):
|
||||
match = RUNS_ON_RE.match(line)
|
||||
if not match:
|
||||
continue
|
||||
label = match.group("label").strip().strip("'\"")
|
||||
labels.append(
|
||||
WorkflowLabel(
|
||||
repo=repo,
|
||||
source=source,
|
||||
branch=branch,
|
||||
file_path=file_path,
|
||||
line_number=line_number,
|
||||
label=label,
|
||||
)
|
||||
)
|
||||
return labels
|
||||
|
||||
|
||||
def fetch_gitea_labels(repo: str, branch: str, auth: GiteaAuth) -> tuple[list[WorkflowLabel], str | None]:
|
||||
opener = build_opener(auth)
|
||||
labels: list[WorkflowLabel] = []
|
||||
owner, name = repo.split("/", 1)
|
||||
|
||||
for workflow_dir in WORKFLOW_DIRS:
|
||||
api_dir = f"/api/v1/repos/{owner}/{name}/contents/{workflow_dir}?ref={branch}"
|
||||
try:
|
||||
entries = get_json(opener, auth, api_dir)
|
||||
except urllib.error.HTTPError as exc:
|
||||
return labels, f"gitea_http_{exc.code}:{workflow_dir}"
|
||||
except Exception as exc: # noqa: BLE001 - inventory should report and continue.
|
||||
return labels, f"gitea_error:{type(exc).__name__}:{workflow_dir}"
|
||||
|
||||
if not isinstance(entries, list):
|
||||
continue
|
||||
|
||||
for entry in entries:
|
||||
if not isinstance(entry, dict) or entry.get("type") != "file":
|
||||
continue
|
||||
name = str(entry.get("name", ""))
|
||||
if not re.search(r"\.ya?ml$", name):
|
||||
continue
|
||||
|
||||
file_path = f"{workflow_dir}/{name}"
|
||||
api_file = f"/api/v1/repos/{owner}/{repo.split('/', 1)[1]}/contents/{file_path}?ref={branch}"
|
||||
try:
|
||||
item = get_json(opener, auth, api_file)
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
content = base64.b64decode(str(item.get("content", ""))).decode("utf-8", "replace")
|
||||
except Exception as exc: # noqa: BLE001
|
||||
return labels, f"gitea_file_error:{type(exc).__name__}:{file_path}"
|
||||
labels.extend(parse_runs_on(repo, "gitea", branch, file_path, content))
|
||||
|
||||
return labels, None
|
||||
|
||||
|
||||
def parse_local_repo_args(values: Iterable[str]) -> dict[str, Path]:
|
||||
paths: dict[str, Path] = {}
|
||||
for value in values:
|
||||
if "=" not in value:
|
||||
raise SystemExit(f"invalid --local-repo value: {value}")
|
||||
repo, path = value.split("=", 1)
|
||||
paths[repo] = Path(path).expanduser().resolve()
|
||||
return paths
|
||||
|
||||
|
||||
def fetch_local_labels(repo: str, branch: str, repo_path: Path) -> tuple[list[WorkflowLabel], str | None]:
|
||||
labels: list[WorkflowLabel] = []
|
||||
if not repo_path.exists():
|
||||
return labels, f"local_missing:{repo_path}"
|
||||
|
||||
for workflow_dir in WORKFLOW_DIRS:
|
||||
directory = repo_path / workflow_dir
|
||||
if not directory.exists():
|
||||
continue
|
||||
for path in sorted(directory.glob("*.y*ml")):
|
||||
content = path.read_text(encoding="utf-8", errors="replace")
|
||||
labels.extend(parse_runs_on(repo, "local", branch, str(path.relative_to(repo_path)), content))
|
||||
return labels, None
|
||||
|
||||
|
||||
def label_owner(label: str) -> str:
|
||||
value = label.strip().strip("'\"")
|
||||
if value == "awoooi-host":
|
||||
return "awoooi_dedicated"
|
||||
if value == "ewoooc-host":
|
||||
return "foreign_dedicated"
|
||||
if value == "ubuntu-latest" or "ubuntu-latest" in value:
|
||||
return "shared_queue"
|
||||
if value.startswith("ubuntu-") or value.startswith("["):
|
||||
return "shared_queue"
|
||||
return "unknown_or_custom"
|
||||
|
||||
|
||||
def print_labels(labels: list[WorkflowLabel], errors: list[str]) -> None:
|
||||
print("== workflow label inventory ==")
|
||||
if labels:
|
||||
print("repo\tsource\tbranch\tfile\tline\truns_on\towner")
|
||||
for item in labels:
|
||||
print(
|
||||
f"{item.repo}\t{item.source}\t{item.branch}\t{item.file_path}\t"
|
||||
f"{item.line_number}\t{item.label}\t{label_owner(item.label)}"
|
||||
)
|
||||
else:
|
||||
print("labels_found=0")
|
||||
|
||||
print("\n== label summary ==")
|
||||
summary: dict[str, set[str]] = {}
|
||||
for item in labels:
|
||||
summary.setdefault(item.label, set()).add(item.repo)
|
||||
if summary:
|
||||
for label, repos in sorted(summary.items(), key=lambda pair: (label_owner(pair[0]), pair[0])):
|
||||
print(f"label={label} owner={label_owner(label)} repo_count={len(repos)} repos={','.join(sorted(repos))}")
|
||||
else:
|
||||
print("summary=none")
|
||||
|
||||
print("\n== inventory warnings ==")
|
||||
if errors:
|
||||
for error in errors:
|
||||
print(error)
|
||||
else:
|
||||
print("warnings=none")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
repos = args.repos or list(DEFAULT_REPOS)
|
||||
local_paths = parse_local_repo_args(args.local_repo)
|
||||
auth = derive_gitea_auth()
|
||||
|
||||
labels: list[WorkflowLabel] = []
|
||||
errors: list[str] = []
|
||||
|
||||
for repo in repos:
|
||||
repo_labels: list[WorkflowLabel] = []
|
||||
error: str | None = None
|
||||
if auth is not None:
|
||||
repo_labels, error = fetch_gitea_labels(repo, args.branch, auth)
|
||||
elif repo not in local_paths:
|
||||
error = "gitea_auth_unavailable"
|
||||
|
||||
if error and repo in local_paths:
|
||||
local_labels, local_error = fetch_local_labels(repo, args.branch, local_paths[repo])
|
||||
if local_labels:
|
||||
repo_labels = local_labels
|
||||
errors.append(f"{repo}: {error}; local_fallback=used")
|
||||
elif local_error:
|
||||
errors.append(f"{repo}: {error}; {local_error}")
|
||||
else:
|
||||
errors.append(f"{repo}: {error}; local_fallback=no_workflows")
|
||||
elif error:
|
||||
errors.append(f"{repo}: {error}")
|
||||
|
||||
labels.extend(repo_labels)
|
||||
|
||||
print_labels(labels, errors)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user