docs(logbook): record run incident audit closure [skip ci]
This commit is contained in:
@@ -1,3 +1,86 @@
|
||||
## 2026-05-31|Run detail Incident 稽核時間線 production 收斂
|
||||
|
||||
**背景**:
|
||||
|
||||
- 使用者持續指出 Telegram 詳情 / 歷史與前端 Run 頁面必須清楚回答:告警是否重複、跑到哪個流程、用了哪些 MCP / 自建 MCP、是否匹配 Sentry / SigNoz、是否匹配 PlayBook / Ansible、是否真的執行、是否寫入 KM / Learning、最後是 AI 自動收斂還是人工卡點。
|
||||
- 前一段已把「單一 Incident 處理流程」六段式接到 Run detail;本輪補上 per-incident 稽核時間線,直接讀 production Incident timeline,不新增 fake data。
|
||||
|
||||
**本次調整**:
|
||||
|
||||
- `apps/web/src/app/[locale]/awooop/runs/[run_id]/page.tsx`:
|
||||
- Run detail 讀完 `/api/v1/platform/runs/{run_id}/detail` 後,以 `awooop_status_chain.source_id` 優先取得 Incident id,再讀 `/api/v1/incidents/{incident_id}/timeline`。
|
||||
- 新增 `IncidentAuditTimelinePanel`,顯示階段數、事件數、Direct / Candidate / Applied、Final verification、處理階段、匹配與採用證據、稽核事件。
|
||||
- 稽核事件收斂 `automation_operation_log`、`knowledge_entries`、`incident_evidence`、`alert_operation_log`、verifier / km / executor / ai_router 相關事件。
|
||||
- 面板會把 status-chain 的 Ansible / PlayBook 選擇與 Incident timeline 的 executor / KM / verifier 證據放在同一頁。
|
||||
- `apps/web/messages/zh-TW.json` / `apps/web/messages/en.json` 補齊 i18n。
|
||||
|
||||
**提交與部署**:
|
||||
|
||||
```text
|
||||
bdcb0594 fix(web): add incident audit timeline to run detail
|
||||
8f73058b chore(cd): deploy bdcb059 [skip ci]
|
||||
8699fe0c fix(api): align kb extractor ollama model
|
||||
3c1f94a2 chore(cd): deploy 8699fe0 [skip ci]
|
||||
```
|
||||
|
||||
**Post-deploy 紅燈判讀**:
|
||||
|
||||
- `bdcb0594` runtime rollout 已成功,production browser smoke 也證明 Run detail 稽核時間線可用;但 CI post-deploy-checks 失敗。
|
||||
- 追查後確認失敗來源不是 Run detail UI,而是 KB extractor 還硬打已移除的 `llama3.2:3b`,造成 `kb_ollama_all_endpoints_failed`。
|
||||
- `8699fe0c` 已將 KB extractor 改用 `settings.OLLAMA_TOOL_MODEL` / `hermes3:latest`,並接上 endpoint cooldown;重新部署後 post-deploy 全綠。
|
||||
|
||||
**驗證**:
|
||||
|
||||
```text
|
||||
local:
|
||||
python3 -m json.tool apps/web/messages/zh-TW.json apps/web/messages/en.json -> pass
|
||||
pnpm --dir apps/web exec tsc --noEmit --tsBuildInfoFile /tmp/awoooi-run-incident-audit-after-merge-20260531.tsbuildinfo -> pass
|
||||
python3 scripts/security/security-mirror-progress-guard.py -> SECURITY_MIRROR_PROGRESS_GUARD_OK
|
||||
NEXT_PUBLIC_API_URL=https://awoooi.wooo.work pnpm --dir apps/web run build -> pass
|
||||
|
||||
production rollout:
|
||||
awoooi-api 192.168.0.110:5000/awoooi/api:8699fe0c7ff12275f131acdbc563097d220be2e4 2/2
|
||||
awoooi-web 192.168.0.110:5000/awoooi/web:8699fe0c7ff12275f131acdbc563097d220be2e4 2/2
|
||||
awoooi-worker 192.168.0.110:5000/awoooi/api:8699fe0c7ff12275f131acdbc563097d220be2e4 1/1
|
||||
|
||||
production API:
|
||||
/api/v1/health:
|
||||
status=healthy
|
||||
mock_mode=false
|
||||
ollama_route_order=["ollama_gcp_a","ollama_gcp_b","ollama_local"]
|
||||
components.ollama.status=up
|
||||
/api/v1/incidents/INC-20260530-0DD83C/timeline:
|
||||
ascii_timeline=webhook:ok > investigator:ok > ai_router:ok > llm:ok > target:ok > blast:ok > safe:ok > executor:fail > verifier:warn > km:ok
|
||||
timeline stages include webhook / investigator / ai_router / llm / target / blast / safe / executor / verifier / km
|
||||
events include automation_operation_log ansible_candidate_matched, ansible_check_mode_executed, knowledge_entries KM entry written, incident_evidence verifier warning
|
||||
|
||||
production browser:
|
||||
https://awoooi.wooo.work/zh-TW/awooop/runs/d17ff68c-6459-5ad4-b0d1-408fc5d6711d?project_id=awoooi
|
||||
canScroll=true
|
||||
horizontalOverflow=false
|
||||
visible: Incident 稽核時間線
|
||||
visible: 處理階段 / 匹配與採用證據 / 稽核事件
|
||||
visible: Automation: ansible_check_mode_executed
|
||||
visible: KM entry written
|
||||
visible: Final verification warning
|
||||
productionUrlErrors=[]
|
||||
screenshot=/tmp/awoooi-run-incident-audit-prod-8699fe0c-20260531.png
|
||||
|
||||
CI/CD:
|
||||
8699fe0c post-deploy success
|
||||
summary=API=✅; Web=✅; AlertChain=✅; SourceLink=✅; Monitoring=✅; Smoke=✅
|
||||
```
|
||||
|
||||
**目前整體進度**:
|
||||
|
||||
- MOMO PostgreSQL backup 接入 AwoooP failure-notify / Ansible controlled apply:100%。
|
||||
- AwoooP truth-chain / status-chain 對 Ansible apply 證據可視化:約 98%。
|
||||
- Telegram detail/history 與前端 Run drill-down 穩定性:約 98%。
|
||||
- MCP / Sentry / SigNoz / KM / PlayBook / Ansible 的單一流程透明度:約 81%;已能在單一 Run detail 看見流程段落、Incident timeline、MCP 調查、Ansible check/apply 證據、KM 寫入、verifier 結果與人工下一步。
|
||||
- 前端 AI 自動化管理介面同步:約 82%;Run detail 已補關鍵稽核鏈,首頁 / Work Items / Approvals 還需要把同一套 state machine 轉成跨頁產品視圖。
|
||||
- 整體 AI 自動化飛輪:約 72%;仍需更多自然告警樣本、24h 閉環統計、Sentry / SigNoz source match coverage 與 owner review workflow。
|
||||
- 24h 完整 AI Agent 自動修復 production claim:0%;目前只能宣稱「特定 controlled apply 事件已驗證收斂」,不能宣稱全自動修復閉環已達成。
|
||||
|
||||
## 2026-05-31|KB extractor Ollama 模型漂移修復
|
||||
|
||||
**背景**:
|
||||
|
||||
Reference in New Issue
Block a user