ops(runner): inventory workflow labels [skip ci]

This commit is contained in:
Your Name
2026-05-24 09:52:04 +08:00
parent 22b45006b7
commit 4407b46bb6
4 changed files with 364 additions and 0 deletions

View File

@@ -1,3 +1,64 @@
## 2026-05-24T141 workflow label matrix
**觸發**
- T140 只回答 live runner config同一個 110 host runner 同時宣告 `awoooi-host``ewoooc-host``ubuntu-latest`
- 下一步若要 runner label isolation必須先知道各 repo workflow 實際用哪些 `runs-on`,避免把 `ewoooc``stockplatform-v2` 的 CI/CD 直接切斷。
**修正**
- 新增 `ops/runner/audit-workflow-labels.py`
- 只讀 Gitea `.gitea/workflows/*.yml` / `.yaml`,擷取 `runs-on`
- Gitea auth 從環境或目前 repo `gitea` remote 解析token 不輸出。
- Gitea 不可讀時可用 `--local-repo OWNER/NAME=/path/to/repo` fallback。
- 輸出 per workflow line inventory、label summary、inventory warnings。
- `ops/runner/README.md` 補第七層 workflow label matrix 與隔離判讀。
**Verification**
```text
python3 -m py_compile ops/runner/audit-workflow-labels.py -> pass
ops/runner/audit-workflow-labels.py --local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2 -> pass
Evidence:
wooo/awoooi:
awoooi-host -> .gitea/workflows/cd.yaml lines 64 / 313 / 1132
ubuntu-latest -> ansible-lint, cd-dev, code-review, deploy-alerts, e2e-health, run-migration, type-sync
wooo/ewoooc:
ewoooc-host -> .gitea/workflows/cd.yaml line 67
wooo/stockplatform-v2:
ubuntu-latest -> .gitea/workflows/ci.yaml lines 12 / 23
Gitea API for stockplatform-v2 returned 404 in this session; local repo fallback used.
```
**判讀**
- AWOOI production CD 已用 `awoooi-host`,但 AWOOI code-review / health / aux workflows 仍走 shared `ubuntu-latest`
- EwoooC CD 明確使用 `ewoooc-host`,而這個 foreign label 仍在 110 同一個 user-level runner config 內。
- Stockplatform-v2 CI 走 `ubuntu-latest`,會和 AWOOI 的 non-CD workflows 共用同一條 runner queue。
- 真正修復不是在同一份 config 繼續加 label而是 runner registration / service split或將非 AWOOI repo 搬到獨立 runner在替代 runner ready 前不可直接移除 `ewoooc-host``ubuntu-latest`
**目前整體進度**
- AwoooP 告警可觀測鏈99.998%。
- Incident-level source correlation 可見性98.8%。
- Source correlation apply 狀態鏈可驗證性99.72%。
- Source correlation freshness / rolling gate98.2%。
- 前端 AI 自動化管理介面同步99.999%。
- Dashboard snapshot / SSE console noise 收斂99.2%。
- CI/CD runner hygiene99.5%。
- Runner ownership 收斂96%。
- Runner pool inventory70% → 82%。
- Workflow label matrix0% → 85%。
- API image build layer hygiene88%。
- Deploy rollout-risk 可觀測性91%。
- CI/CD evidence 前端可見性92%。
- Pipeline stage 可觀測性88%。
- Build host pressure治理86%。
- 完整 AI 自動化管理產品化99.965% → 99.966%。
## 2026-05-24T140 runner pool live inventory
**觸發**

View File

@@ -2665,6 +2665,14 @@ Phase 6 完成後
- 判讀T135 已把 runner ownership 從雙 runner 搶工收斂到 host runner 單一主控;下一段不要重新啟用 Docker-wrapped runner而是做 runner pool / repo label 隔離、API image `apt-get` / `chown -R` 分層、Web build cache/offload、Playwright apt source-list hygiene。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.2%Runner ownership 收斂約 96%Build host pressure治理約 82%;完整 AI 自動化管理產品化約 99.960%。
**T141 workflow label matrix2026-05-24 台北)**
- 觸發T140 只回答 110 live runner config尚未回答各 repo workflow 實際使用哪些 `runs-on`。要做 runner label isolation 前,必須避免切斷 `ewoooc``stockplatform-v2` 的 CI/CD。
- 修正:新增 `ops/runner/audit-workflow-labels.py`,只讀 Gitea `.gitea/workflows/*.yml` / `.yaml` 並擷取 `runs-on`Gitea auth 由 env 或目前 repo `gitea` remote 解析token 不輸出Gitea 不可讀時可用 `--local-repo OWNER/NAME=/path/to/repo` fallback。`ops/runner/README.md` 補第七層 workflow label matrix。
- EvidenceAWOOI `.gitea/workflows/cd.yaml` 三個 job 使用 `awoooi-host`,但 code-review / e2e-health / deploy-alerts / cd-dev / ansible-lint / type-sync / run-migration 使用 `ubuntu-latest`EwoooC `.gitea/workflows/cd.yaml` 使用 `ewoooc-host`stockplatform-v2 local `.gitea/workflows/ci.yaml` 兩個 job 使用 `ubuntu-latest`,本 session Gitea API 對該 repo 回 404因此用 local fallback。
- 判讀AWOOI production CD label 已專用,但 runner queue 仍共享,因為同一個 110 runner service 同時宣告 `awoooi-host``ewoooc-host``ubuntu-latest`。真正修復是 runner registration / service split 或將非 AWOOI repo 搬到獨立 runner不可只在同一份 config 繼續加 label也不可在替代 runner ready 前直接移除 `ewoooc-host``ubuntu-latest`
- Verification`python3 -m py_compile ops/runner/audit-workflow-labels.py` pass`ops/runner/audit-workflow-labels.py --local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2` pass。
- 目前進度更新AwoooP 告警可觀測鏈約 99.998%Incident-level source correlation 可見性約 98.8%Source correlation apply 狀態鏈可驗證性約 99.72%Source correlation freshness / rolling gate 約 98.2%;前端 AI 自動化管理介面同步約 99.999%Dashboard snapshot / SSE console noise 收斂約 99.2%CI/CD runner hygiene 約 99.5%Runner ownership 收斂約 96%Runner pool inventory 約 82%Workflow label matrix 約 85%API image build layer hygiene 約 88%Deploy rollout-risk 可觀測性約 91%CI/CD evidence 前端可見性約 92%Pipeline stage 可觀測性約 88%Build host pressure治理約 86%;完整 AI 自動化管理產品化約 99.966%。
**T140 runner pool live inventory2026-05-24 台北)**
- 觸發T139 已把 CI/CD stage transition 寫回 AwoooP Deployments但 shared runner pool 仍是部署證據與 post-deploy queue 的風險來源。直接改 runner labels 會影響 `ewoooc` / `stockplatform-v2` 等 repo因此先建立可重跑的 live inventory。
- 修正:新增 `ops/runner/audit-runner-pool.sh`,只讀盤點 `gitea-act-runner-host.service``/home/wooo/act-runner/config.yaml` labels、Docker-wrapped `gitea-runner`、active `GITEA-ACTIONS-TASK-*` containers、近 2 小時 runner journal repo counts。`ops/runner/README.md` 補第六層 shared runner label inventory明確禁止把 `capacity: 2` 當修復。

View File

@@ -296,6 +296,42 @@ recent 2h repo counts: none
- 下一步應先讀各 repo workflow 實際使用的 labels再規劃 repo label isolation 或獨立 runner
registration不可在沒有替代 runner 前直接移除 live `ewoooc-host`
### 第七層修復: workflow label matrix
Runner config 只能看到「這台 runner 願意接什麼 label」不能回答「哪些 repo 實際在使用」。
T141 新增 workflow label 盤點工具:
```bash
ops/runner/audit-workflow-labels.py \
--local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2
```
工具會透過 Gitea API 讀 `.gitea/workflows/*.yml` / `.yaml``runs-on`Gitea 不可讀時可指定
local fallbackGitea token 只從 env 或目前 repo `gitea` remote 解析,永不輸出。
T141 evidence 摘要2026-05-24 台北):
```text
wooo/awoooi:
awoooi-host: cd.yaml tests / build-and-deploy / post-deploy-checks
ubuntu-latest: code-review, e2e-health, deploy-alerts, cd-dev, ansible-lint, type-sync, run-migration
wooo/ewoooc:
ewoooc-host: cd.yaml deploy
wooo/stockplatform-v2:
ubuntu-latest: ci.yaml hygiene / frontend
```
風險判讀:
- `awoooi-host` 已經是 AWOOI CD 專用 label但同一個 runner service 仍同時宣告
`ewoooc-host``ubuntu-latest`,所以 runner queue 仍共享。
- `ubuntu-latest` 是最主要共享入口AWOOI code-review / e2e-health 與 stockplatform-v2 CI
仍可能互相排隊。
- 下一步若要真正隔離,必須做新的 runner registration / service split或把非 AWOOI repo 移到
另一台 runner。不可只在同一個 runner config 加更多 label因為 `capacity: 1` 仍是同一條隊列。
---
版本: v2.0 | 更新: 2026-03-29 | 作者: Claude Code
變更: v1.0→v2.0 序列建構取代 Job Concurrency Groups

View File

@@ -0,0 +1,259 @@
#!/usr/bin/env python3
"""Read-only inventory for Gitea workflow runner labels.
The script never prints credentials. It reads workflow files from Gitea when
GITEA_BASE/GITEA_USER/GITEA_TOKEN are available, or derives them from the
current repository's `gitea` remote when that remote embeds basic auth.
Example:
ops/runner/audit-workflow-labels.py \
--local-repo wooo/stockplatform-v2=/Users/ogt/stockplatform-v2
"""
from __future__ import annotations
import argparse
import base64
import json
import re
import subprocess
import sys
import urllib.error
import urllib.request
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
DEFAULT_REPOS = ("wooo/awoooi", "wooo/ewoooc", "wooo/stockplatform-v2")
WORKFLOW_DIRS = (".gitea/workflows",)
RUNS_ON_RE = re.compile(r"^\s*runs-on:\s*(?P<label>.+?)\s*(?:#.*)?$")
@dataclass(frozen=True)
class GiteaAuth:
base: str
user: str
token: str
@dataclass(frozen=True)
class WorkflowLabel:
repo: str
source: str
branch: str
file_path: str
line_number: int
label: str
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--repo",
action="append",
dest="repos",
help="Repository in owner/name form. Defaults to AWOOI, EwoooC, stockplatform-v2.",
)
parser.add_argument("--branch", default="main", help="Branch/ref to inspect. Default: main.")
parser.add_argument(
"--local-repo",
action="append",
default=[],
metavar="OWNER/NAME=PATH",
help="Local fallback repository path used when Gitea content is unavailable.",
)
return parser.parse_args()
def derive_gitea_auth() -> GiteaAuth | None:
try:
remote_url = subprocess.check_output(
["git", "remote", "get-url", "gitea"],
text=True,
stderr=subprocess.DEVNULL,
).strip()
except (OSError, subprocess.CalledProcessError):
return None
match = re.match(r"http://([^:]+):([^@]+)@([^/]+)", remote_url)
if not match:
return None
user, token, host = match.groups()
return GiteaAuth(base=f"http://{host}", user=user, token=token)
def build_opener(auth: GiteaAuth) -> urllib.request.OpenerDirector:
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, auth.base, auth.user, auth.token)
return urllib.request.build_opener(urllib.request.HTTPBasicAuthHandler(password_mgr))
def get_json(opener: urllib.request.OpenerDirector, auth: GiteaAuth, path: str) -> object:
with opener.open(auth.base + path, timeout=10) as response:
return json.load(response)
def parse_runs_on(repo: str, source: str, branch: str, file_path: str, content: str) -> list[WorkflowLabel]:
labels: list[WorkflowLabel] = []
for line_number, line in enumerate(content.splitlines(), start=1):
match = RUNS_ON_RE.match(line)
if not match:
continue
label = match.group("label").strip().strip("'\"")
labels.append(
WorkflowLabel(
repo=repo,
source=source,
branch=branch,
file_path=file_path,
line_number=line_number,
label=label,
)
)
return labels
def fetch_gitea_labels(repo: str, branch: str, auth: GiteaAuth) -> tuple[list[WorkflowLabel], str | None]:
opener = build_opener(auth)
labels: list[WorkflowLabel] = []
owner, name = repo.split("/", 1)
for workflow_dir in WORKFLOW_DIRS:
api_dir = f"/api/v1/repos/{owner}/{name}/contents/{workflow_dir}?ref={branch}"
try:
entries = get_json(opener, auth, api_dir)
except urllib.error.HTTPError as exc:
return labels, f"gitea_http_{exc.code}:{workflow_dir}"
except Exception as exc: # noqa: BLE001 - inventory should report and continue.
return labels, f"gitea_error:{type(exc).__name__}:{workflow_dir}"
if not isinstance(entries, list):
continue
for entry in entries:
if not isinstance(entry, dict) or entry.get("type") != "file":
continue
name = str(entry.get("name", ""))
if not re.search(r"\.ya?ml$", name):
continue
file_path = f"{workflow_dir}/{name}"
api_file = f"/api/v1/repos/{owner}/{repo.split('/', 1)[1]}/contents/{file_path}?ref={branch}"
try:
item = get_json(opener, auth, api_file)
if not isinstance(item, dict):
continue
content = base64.b64decode(str(item.get("content", ""))).decode("utf-8", "replace")
except Exception as exc: # noqa: BLE001
return labels, f"gitea_file_error:{type(exc).__name__}:{file_path}"
labels.extend(parse_runs_on(repo, "gitea", branch, file_path, content))
return labels, None
def parse_local_repo_args(values: Iterable[str]) -> dict[str, Path]:
paths: dict[str, Path] = {}
for value in values:
if "=" not in value:
raise SystemExit(f"invalid --local-repo value: {value}")
repo, path = value.split("=", 1)
paths[repo] = Path(path).expanduser().resolve()
return paths
def fetch_local_labels(repo: str, branch: str, repo_path: Path) -> tuple[list[WorkflowLabel], str | None]:
labels: list[WorkflowLabel] = []
if not repo_path.exists():
return labels, f"local_missing:{repo_path}"
for workflow_dir in WORKFLOW_DIRS:
directory = repo_path / workflow_dir
if not directory.exists():
continue
for path in sorted(directory.glob("*.y*ml")):
content = path.read_text(encoding="utf-8", errors="replace")
labels.extend(parse_runs_on(repo, "local", branch, str(path.relative_to(repo_path)), content))
return labels, None
def label_owner(label: str) -> str:
value = label.strip().strip("'\"")
if value == "awoooi-host":
return "awoooi_dedicated"
if value == "ewoooc-host":
return "foreign_dedicated"
if value == "ubuntu-latest" or "ubuntu-latest" in value:
return "shared_queue"
if value.startswith("ubuntu-") or value.startswith("["):
return "shared_queue"
return "unknown_or_custom"
def print_labels(labels: list[WorkflowLabel], errors: list[str]) -> None:
print("== workflow label inventory ==")
if labels:
print("repo\tsource\tbranch\tfile\tline\truns_on\towner")
for item in labels:
print(
f"{item.repo}\t{item.source}\t{item.branch}\t{item.file_path}\t"
f"{item.line_number}\t{item.label}\t{label_owner(item.label)}"
)
else:
print("labels_found=0")
print("\n== label summary ==")
summary: dict[str, set[str]] = {}
for item in labels:
summary.setdefault(item.label, set()).add(item.repo)
if summary:
for label, repos in sorted(summary.items(), key=lambda pair: (label_owner(pair[0]), pair[0])):
print(f"label={label} owner={label_owner(label)} repo_count={len(repos)} repos={','.join(sorted(repos))}")
else:
print("summary=none")
print("\n== inventory warnings ==")
if errors:
for error in errors:
print(error)
else:
print("warnings=none")
def main() -> int:
args = parse_args()
repos = args.repos or list(DEFAULT_REPOS)
local_paths = parse_local_repo_args(args.local_repo)
auth = derive_gitea_auth()
labels: list[WorkflowLabel] = []
errors: list[str] = []
for repo in repos:
repo_labels: list[WorkflowLabel] = []
error: str | None = None
if auth is not None:
repo_labels, error = fetch_gitea_labels(repo, args.branch, auth)
elif repo not in local_paths:
error = "gitea_auth_unavailable"
if error and repo in local_paths:
local_labels, local_error = fetch_local_labels(repo, args.branch, local_paths[repo])
if local_labels:
repo_labels = local_labels
errors.append(f"{repo}: {error}; local_fallback=used")
elif local_error:
errors.append(f"{repo}: {error}; {local_error}")
else:
errors.append(f"{repo}: {error}; local_fallback=no_workflows")
elif error:
errors.append(f"{repo}: {error}")
labels.extend(repo_labels)
print_labels(labels, errors)
return 0
if __name__ == "__main__":
sys.exit(main())