feat(aiops): structure agent loop shadow output
Some checks failed
CD Pipeline / tests (push) Successful in 2m50s
Code Review / ai-code-review (push) Successful in 33s
CD Pipeline / build-and-deploy (push) Failing after 25m48s
CD Pipeline / post-deploy-checks (push) Has been cancelled

This commit is contained in:
Your Name
2026-05-01 15:09:57 +08:00
parent f53d7e5584
commit b0da6da1e9
5 changed files with 165 additions and 3 deletions

View File

@@ -10,7 +10,7 @@
| 欄位 | 值 |
|------|-----|
| **版本** | v1.5 |
| **版本** | v1.6 |
| **建立日期** | 2026-03-25 23:30 (台北) |
| **建立者** | Claude Code |
| **最後修改** | 2026-05-01 15:45 (台北) |
@@ -20,6 +20,7 @@
| 版本 | 日期 | 執行者 | 變更內容 |
|------|------|--------|----------|
| v1.6 | 2026-05-01 | Codex | Agent Loop shadow structured metadata, non-decisive confidence delta guard |
| v1.5 | 2026-05-01 | Codex | OpenClaw Agent Loop read-only shadow canary + prod feature flag |
| v1.4 | 2026-05-01 | Codex | MCP Agent Loop governance、audit schema、Agent role tool permissions |
| v1.3 | 2026-03-26 18:00 | Claude Code | 新增 Grafana MCP (#83) + SignOz query_logs |
@@ -59,6 +60,7 @@ Phase 13.2 Tool 實作 (P0 最優先):
- Internal RAG/MCP 知識層沿用 PostgreSQL + pgvector + Redis hot cache不得為「MCP RAG」另建孤立資料庫除非已有量級、隔離或延遲證據。
- `incident_id` 在 MCP audit schema 中使用 `VARCHAR(64)`,因為 AWOOOI incident 是 `INC-*` 字串,不是 UUID。
- OpenClaw Agent Loop 初期只可用 shadow canary`ENABLE_OPENCLAW_AGENT_LOOP_SHADOW=true` 時,先給 read-only tools 且不改主決策;確認 `mcp_audit_log`、latency、LLM quality 後才允許升級成 decisive path。
- Shadow canary output 必須正規化為 `agent_loop_shadow.structured`,並固定 `decision_impact=none``confidence_delta` 初期只能記錄 0 到 -0.15 的保守 metadata禁止用 shadow 結果提高信心或覆蓋主決策。
### 已完成 Tool 功能

View File

@@ -1802,6 +1802,7 @@ Focus on:
"task_type": "diagnose",
},
)
structured_shadow = self._parse_agent_loop_shadow_response(result.raw_response or "")
proposal["agent_loop_shadow"] = {
"enabled": True,
"success": result.success,
@@ -1809,6 +1810,9 @@ Focus on:
"tokens": result.tokens,
"latency_ms": round(result.latency_ms, 1),
"error": result.error,
"decision_impact": "none",
"structured": structured_shadow,
"confidence_delta": structured_shadow.get("confidence_delta", 0.0),
"preview": (result.raw_response or "")[:700],
}
logger.info(
@@ -1818,6 +1822,8 @@ Focus on:
success=result.success,
tools_available=len(available_tools),
latency_ms=round(result.latency_ms, 1),
confidence_delta=structured_shadow.get("confidence_delta", 0.0),
parse_status=structured_shadow.get("parse_status"),
)
except Exception as exc:
logger.warning(
@@ -1826,6 +1832,106 @@ Focus on:
error=str(exc),
)
@classmethod
def _parse_agent_loop_shadow_response(cls, raw_response: str) -> dict:
"""
Normalize read-only Agent Loop output into durable metadata.
The shadow result is intentionally non-decisive. Downstream code can
inspect this structure for quality review, but it must not override the
main proposal until ADR-105 canary graduation.
"""
text = (raw_response or "").strip()
if not text:
return {
"parse_status": "empty",
"root_cause_check": "",
"evidence_used": [],
"confidence_delta": 0.0,
"missing_evidence": [],
"human_or_ai_next_step": "",
}
payload = cls._extract_json_object(text)
if not isinstance(payload, dict):
return {
"parse_status": "unparsed",
"root_cause_check": "",
"evidence_used": [],
"confidence_delta": 0.0,
"missing_evidence": [],
"human_or_ai_next_step": "",
"raw_preview": text[:700],
}
return {
"parse_status": "ok",
"root_cause_check": cls._clip_shadow_text(payload.get("root_cause_check"), max_chars=500),
"evidence_used": cls._coerce_shadow_list(payload.get("evidence_used"), max_items=5),
"confidence_delta": cls._coerce_agent_loop_confidence_delta(
payload.get("confidence_delta", 0.0)
),
"missing_evidence": cls._coerce_shadow_list(payload.get("missing_evidence"), max_items=5),
"human_or_ai_next_step": cls._clip_shadow_text(
payload.get("human_or_ai_next_step"), max_chars=500
),
}
@staticmethod
def _extract_json_object(text: str) -> dict | None:
"""Extract the first JSON object from plain or fenced LLM output."""
candidates = [text]
fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, flags=re.DOTALL | re.IGNORECASE)
if fenced:
candidates.insert(0, fenced.group(1))
object_match = re.search(r"\{.*\}", text, flags=re.DOTALL)
if object_match:
candidates.append(object_match.group(0))
for candidate in candidates:
try:
parsed = json.loads(candidate)
except (TypeError, json.JSONDecodeError):
continue
if isinstance(parsed, dict):
return parsed
return None
@staticmethod
def _clip_shadow_text(value: object, *, max_chars: int) -> str:
if value is None:
return ""
return str(value).strip()[:max_chars]
@classmethod
def _coerce_shadow_list(cls, value: object, *, max_items: int) -> list[str]:
if value is None:
return []
if isinstance(value, list):
items = value
else:
items = [value]
normalized = []
for item in items:
clipped = cls._clip_shadow_text(item, max_chars=240)
if clipped:
normalized.append(clipped)
if len(normalized) >= max_items:
break
return normalized
@staticmethod
def _coerce_agent_loop_confidence_delta(value: object) -> float:
"""
Keep canary deltas conservative: metadata may lower confidence later,
but positive boosts are recorded as 0 until the shadow path graduates.
"""
try:
delta = float(value)
except (TypeError, ValueError):
return 0.0
return round(max(min(delta, 0.0), -0.15), 3)
def _build_agent_loop_shadow_prompt(
self,
*,

View File

@@ -2,9 +2,12 @@ import pytest
from src.plugins.mcp.interfaces import MCPTool, MCPToolProvider, MCPToolResult
from src.plugins.mcp.registry import AuditedMCPToolProvider
from src.services.ai_providers.interfaces import AIResult
from src.services.ai_providers.agent_loop import AgentToolExecutor
from src.services.ai_providers.permissions import filter_tools_for_agent, is_tool_allowed
from src.services.ai_providers.interfaces import AIResult
from src.services.ai_providers.permissions import (
filter_tools_for_agent,
is_tool_allowed,
)
from src.services.ai_providers.tool_schema import (
anthropic_tool_schema,
openai_tool_schema,
@@ -69,6 +72,38 @@ def test_tool_schema_round_trips_provider_safe_names():
assert tool_by_provider_name([tool], safe_name) is tool
def test_openclaw_agent_loop_shadow_parser_normalizes_json():
from src.services.openclaw import OpenClawService
raw = """```json
{
"root_cause_check": "current RCA still needs pod evidence",
"evidence_used": ["event spike", "error rate"],
"confidence_delta": -0.42,
"missing_evidence": ["deployment rollout history"],
"human_or_ai_next_step": "query rollout history with read-only tools"
}
```"""
parsed = OpenClawService._parse_agent_loop_shadow_response(raw)
assert parsed["parse_status"] == "ok"
assert parsed["root_cause_check"] == "current RCA still needs pod evidence"
assert parsed["evidence_used"] == ["event spike", "error rate"]
assert parsed["confidence_delta"] == -0.15
assert parsed["missing_evidence"] == ["deployment rollout history"]
def test_openclaw_agent_loop_shadow_parser_never_boosts_confidence():
from src.services.openclaw import OpenClawService
parsed = OpenClawService._parse_agent_loop_shadow_response(
'{"root_cause_check":"looks good","confidence_delta":0.2}'
)
assert parsed["confidence_delta"] == 0.0
@pytest.mark.asyncio
async def test_audited_provider_strips_internal_audit_context(monkeypatch):
audit_calls = []
@@ -188,4 +223,7 @@ async def test_openclaw_agent_loop_shadow_uses_read_only_tools(monkeypatch):
)
assert proposal["agent_loop_shadow"]["success"] is True
assert proposal["agent_loop_shadow"]["decision_impact"] == "none"
assert proposal["agent_loop_shadow"]["structured"]["parse_status"] == "ok"
assert proposal["agent_loop_shadow"]["structured"]["root_cause_check"] == "ok"
assert [tool.name for tool in fake_ai_provider.seen_tools] == ["list_incidents"]

View File

@@ -6,6 +6,20 @@
---
## 2026-05-01 | Agent Loop shadow structured metadata guard
承接 P1 canary 上線後的 production 觀測:`ENABLE_OPENCLAW_AGENT_LOOP_SHADOW=True`、max iteration 2 已在 API pod 生效;`mcp_audit_log` 已有 MCP 呼叫,但尚未看到新的 `openclaw_agent_loop_shadow` production incident log。下一步先讓 shadow 一旦觸發就留下可評估、可治理的結構化結果,而不是直接改主決策。
### 完成
- `OpenClawService._maybe_run_openclaw_agent_loop_shadow()` 會把 Agent Loop raw JSON 正規化到 `agent_loop_shadow.structured`,包含 `root_cause_check``evidence_used``confidence_delta``missing_evidence``human_or_ai_next_step``parse_status`
- shadow metadata 固定 `decision_impact=none`,不覆蓋 `action``risk_level``confidence` 或 Nemotron result。
- canary `confidence_delta` 初期只可落在 `[-0.15, 0.0]`LLM 若回正值會歸零,避免 shadow 被誤用成加信心捷徑。
- ADR-105 與 Tool Integration skill 同步新增 structured shadow guard。
### 驗證
- Production 觀測API pod 內 `agent_loop_shadow True max_iter 2`
- Production 觀測:`mcp_audit_log` 目前 198 筆;最近 sample 仍是既有 sense/govern MCP 路徑,尚無 Agent Loop shadow incident 可評分。
## 2026-05-01 | Agent Loop P1 canary + CD Argo revision gate + SSH MCP 四節點閉環
承接 ADR-105 地基與 production 驗證後的待辦CD 會在 push deploy commit 後誤判上一個 Argo revision 已 Synced/HealthySSH MCP key 尚未授權 120/121Agent Loop 仍只停在 provider capability尚未有 production canary。

View File

@@ -74,6 +74,8 @@ OpenClaw 先接 read-only shadow investigation而不是直接替換主決策
- 允許工具Kubernetes / Prometheus / SignOz / Database / RAG / Grafana 的 read-only tools
- Provider本地 Ollama 優先,不新增 Gemini/Claude 付費呼叫
- 影響面:只附加 `agent_loop_shadow` metadata不覆蓋 `action``risk_level``confidence` 或 Nemotron tool result
- Structured metadatashadow raw response 需正規化為 `agent_loop_shadow.structured`,包含 `root_cause_check``evidence_used``confidence_delta``missing_evidence``human_or_ai_next_step``parse_status`
- Confidence deltacanary 階段只可記錄 0 到 -0.15 的保守 metadata正值一律歸零任何 score 或 auto-execute gate 變更需另開 ADR/Logbook 並通過 production audit
- 失敗策略log warning 後回到既有 proposal / Nemotron / Playbook 路徑
## 驗收