Files
ewoooc/templates/admin/host_health.html
OoO 849e189b60
All checks were successful
CD Pipeline / deploy (push) Successful in 2m37s
feat(p45): UI/UX 升級 ewoooc_base.html + sidebar AI 觀測 7 項 + 新增總覽頁
統帥質疑:「那六頁的視覺方格 UI/UX 搞好了嗎?還有新增頁面嗎?」
回答:沒有,從 Phase 38 開始一直推遲。本 commit 補做。

I-1: 6 頁 base.html → ewoooc_base.html
- host_health / ai_calls_dashboard / budget / promotion_review /
  quality_trend / ppt_audit_history 全改
- {% extends "base.html" %} → {% extends "ewoooc_base.html" %}
- {% block content %} → {% block ewooo_content %}
- 自動繼承:sidebar 240px / topbar 64px / fonts (Inter+JetBrains+Noto Sans TC)
  / ewoooc-tokens.css / ewoooc-shell.css / search box / 米色背景

I-2: _ewoooc_shell.html 加「AI 觀測」nav group
- 7 個項目:觀測台總覽 / 主機健康 / AI 呼叫 / 預算控管 /
  RAG 晉升審核 / 反饋趨勢 / PPT 視覺審核
- 對應 active_page='obs_*',正確高亮
- 編號 07-13(系統管理改 14)

I-3: 新增頁面 /observability/ + /observability/overview
- routes/admin_observability_routes.py::observability_overview
- 單頁聚合 8 表跨 JOIN 的 KPI:
  • 三主機 24h 在線率(host_health_probes,per host card)
  • AI 呼叫 24h(ai_calls:total/tokens/cost/error rate/RAG hit/cache hit)
  • 當月成本累計
  • 預算告警(ratio ≥ alert_pct 自動列表)
  • AIOps 7d(incidents + heal_logs:自癒成功率)
  • MCP 24h(mcp_calls:tool 呼叫 + cache 率 + cost)
  • RAG 學習 30d(learning_episodes:待審 + 晉升率)
  • PPT 視覺審核 7d(ppt_audit_results:通過率)
  • 6 大子頁入口卡(含一行說明)
- 對應 Phase 44 daily Telegram summary 的 web 版本
- 全部失敗安全(個別 query 失敗只跳過該卡,不擋整頁)

升級對應:
- UI 框架:base.html → ewoooc_base.html (sidebar + topbar + token css 已生效)
- 設計憲法:8 卡片 + 8 表跨 JOIN 全景 + 一頁式總覽
- 入口:sidebar 7 項 + 觀測台首頁
- 資料表覆蓋:4 表(Phase 38)→ 8 表(Phase 45)

注意:完整 design token 重塑(Bootstrap class → --momo-* token / 焦糖橘)
留待後續 phase;本 commit 重點是「框架升級 + 新總覽頁」。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:34:18 +08:00

297 lines
12 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{% extends "ewoooc_base.html" %}
{% block title %}主機健康監控{% endblock %}
{% block ewooo_content %}
<div class="container-fluid mt-3">
<h2 class="mb-3"><i class="fas fa-heartbeat me-2"></i>主機健康監控
<small class="text-muted">三主機 Ollama + MCP + 成本節流即時狀態</small>
</h2>
<!-- Ollama 三主機 -->
<div class="card mb-3">
<div class="card-header"><strong><i class="fas fa-server me-2"></i>Ollama 三主機HTTP /api/tags 即時 probe</strong></div>
<div class="card-body p-0">
<table class="table mb-0">
<thead class="table-light">
<tr><th>角色</th><th>主機</th><th>HTTP 健康</th><th>異常標記</th><th>已載入模型</th><th>動作</th></tr>
</thead>
<tbody>
{% for h in ollama_hosts %}
<tr>
<td><strong>{{ h.label }}</strong></td>
<td><code>{{ h.host }}</code></td>
<td>
{% if h.healthy %}
<span class="badge bg-success"><i class="fas fa-check me-1"></i>HTTP 正常</span>
{% else %}
<span class="badge bg-danger"><i class="fas fa-times me-1"></i>離線</span>
{% endif %}
</td>
<td>
{% if h.unhealthy_mark %}
<span class="badge bg-warning"><i class="fas fa-exclamation-triangle me-1"></i>已標記異常30 秒)</span>
{% else %}
<span class="badge bg-light text-dark"></span>
{% endif %}
</td>
<td>
{% for m in h.models %}
<span class="badge bg-info text-dark me-1">{{ m }}</span>
{% endfor %}
{% if not h.models %}<small class="text-muted">無 / 未連線</small>{% endif %}
</td>
<td>
{% if h.unhealthy_mark or not h.healthy %}
<button class="btn btn-sm btn-outline-danger"
onclick="triggerAutoHeal({{ h.label|tojson }})">
<i class="fas fa-band-aid me-1"></i>AutoHeal
</button>
{% else %}
<small class="text-muted"></small>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<!-- MCP servers -->
<div class="card mb-3">
<div class="card-header"><strong><i class="fas fa-plug me-2"></i>MCP 服務Phase 10/10.5</strong></div>
<div class="card-body p-0">
<table class="table mb-0">
<thead class="table-light">
<tr><th>服務名稱</th><th>狀態</th></tr>
</thead>
<tbody>
{% for server, healthy in mcp_status.items() %}
<tr>
<td><code>{{ server }}</code></td>
<td>
{% if healthy %}
<span class="badge bg-success"><i class="fas fa-check me-1"></i>正常</span>
{% else %}
<span class="badge bg-secondary">— 未啟用 / 離線</span>
{% endif %}
</td>
</tr>
{% else %}
<tr><td colspan="2" class="text-muted small">MCP_ROUTER_ENABLED=false 或 mcp-stack 未部署</td></tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<!-- Cost Throttle 狀態Phase 20 -->
<div class="card mb-3">
<div class="card-header"><strong><i class="fas fa-dollar-sign me-2"></i>成本節流狀態Phase 20</strong></div>
<div class="card-body p-0">
{% if throttle_state %}
<table class="table mb-0">
<thead class="table-light">
<tr><th>供應商</th><th>已花費</th><th>預算</th><th>月底推估</th><th>使用率</th><th>狀態</th></tr>
</thead>
<tbody>
{% for provider, info in throttle_state.items() %}
<tr {% if info.throttled %}class="table-warning"{% endif %}>
<td><code>{{ provider }}</code></td>
<td>${{ "%.2f"|format(info.spent) }}</td>
<td>${{ "%.2f"|format(info.budget) }}</td>
<td>${{ "%.2f"|format(info.projected) }}</td>
<td>{{ "%.0f"|format(info.ratio * 100) }}%</td>
<td>
{% if info.throttled %}
<span class="badge bg-danger"><i class="fas fa-exclamation-triangle me-1"></i>已節流</span>
{% else %}
<span class="badge bg-success"><i class="fas fa-check me-1"></i>正常</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p class="text-muted m-3 small">
COST_THROTTLE_ENABLED=false 或尚未首次評估(每小時 cron 執行)
</p>
{% endif %}
</div>
</div>
<!-- AIOps 7d 摘要Phase 39 D-5 新增) -->
{% if aiops_summary %}
<div class="card mb-3" style="border-left: 4px solid #0d6efd;">
<div class="card-header">
<strong><i class="fas fa-shield-virus me-2"></i>AIOps 自癒系統 7 日摘要</strong>
<small class="text-muted">資料來源incidents + heal_logsADR-013 AutoHeal 閉環)</small>
</div>
<div class="card-body">
<div class="row g-2">
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">總事件</small>
<strong style="font-size: 1.4em;">{{ aiops_summary.incidents_total }}</strong>
</div>
</div>
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">未解決</small>
<strong class="{% if aiops_summary.incidents_open > 0 %}text-danger{% endif %}" style="font-size: 1.4em;">
{{ aiops_summary.incidents_open }}
</strong>
</div>
</div>
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">已解決</small>
<strong class="text-success" style="font-size: 1.4em;">{{ aiops_summary.incidents_resolved }}</strong>
</div>
</div>
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">P0/P1</small>
<strong class="{% if (aiops_summary.incidents_p0 + aiops_summary.incidents_p1) > 0 %}text-danger{% endif %}" style="font-size: 1.4em;">
{{ aiops_summary.incidents_p0 + aiops_summary.incidents_p1 }}
</strong>
</div>
</div>
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">自癒成功率</small>
<strong class="{% if aiops_summary.heal_success_rate >= 80 %}text-success{% elif aiops_summary.heal_success_rate >= 50 %}text-warning{% else %}text-danger{% endif %}" style="font-size: 1.4em;">
{{ "%.0f"|format(aiops_summary.heal_success_rate) }}%
</strong>
</div>
</div>
<div class="col-md-2 col-sm-4">
<div class="border rounded p-2 text-center">
<small class="text-muted d-block">平均自癒耗時</small>
<strong style="font-size: 1.4em;">{{ aiops_summary.heals_avg_ms }} ms</strong>
</div>
</div>
</div>
<div class="mt-2 small text-muted">
<i class="fas fa-info-circle me-1"></i>
7d 共 {{ aiops_summary.heals_total }} 次自癒嘗試
(成功 {{ aiops_summary.heals_success }} · 失敗 {{ aiops_summary.heals_failed }}
</div>
</div>
</div>
{% endif %}
<!-- MCP 24h 工作量Phase 39 D-2 新增) -->
{% if mcp_24h %}
<div class="card mb-3">
<div class="card-header"><strong><i class="fas fa-bolt me-2"></i>MCP 服務 24h 工作量</strong>
<small class="text-muted">資料來源mcp_calls 表 — 展現 AI×MCP 編排規模</small>
</div>
<div class="card-body p-0">
<table class="table mb-0">
<thead class="table-light">
<tr>
<th>服務</th>
<th class="text-end">呼叫次數</th>
<th class="text-end">成功率</th>
<th class="text-end">快取命中率</th>
<th class="text-end">使用 Tool 數</th>
<th class="text-end">平均耗時</th>
<th class="text-end">成本 (USD)</th>
</tr>
</thead>
<tbody>
{% for s in mcp_24h %}
<tr>
<td><code>{{ s.server }}</code></td>
<td class="text-end">{{ "{:,}".format(s.total_calls) }}</td>
<td class="text-end">
<strong class="{% if s.success_rate >= 95 %}text-success{% elif s.success_rate >= 80 %}text-warning{% else %}text-danger{% endif %}">
{{ "%.1f"|format(s.success_rate) }}%
</strong>
</td>
<td class="text-end">
<span class="{% if s.cache_rate >= 30 %}text-success{% endif %}">
{{ "%.1f"|format(s.cache_rate) }}%
</span>
</td>
<td class="text-end">{{ s.tools_used }}</td>
<td class="text-end">{{ s.avg_ms }} ms</td>
<td class="text-end">${{ "%.4f"|format(s.total_cost) }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
{% endif %}
<!-- 過去 24h 健康趨勢Phase 38 新增) -->
{% if health_history %}
<div class="card mb-3">
<div class="card-header"><strong><i class="fas fa-chart-line me-2"></i>過去 24 小時健康趨勢</strong>
<small class="text-muted">資料來源host_health_probes每次刷新自動寫入</small>
</div>
<div class="card-body p-0">
<table class="table mb-0">
<thead class="table-light">
<tr>
<th>角色</th>
<th class="text-end">總探針次數</th>
<th class="text-end">正常次數</th>
<th class="text-end">離線次數</th>
<th class="text-end">在線率</th>
<th class="text-end">平均回應 ms</th>
</tr>
</thead>
<tbody>
{% for h in health_history %}
<tr>
<td><strong>{{ h.host_label }}</strong></td>
<td class="text-end">{{ h.total }}</td>
<td class="text-end text-success">{{ h.up_count }}</td>
<td class="text-end text-danger">{{ h.down_count }}</td>
<td class="text-end">
<strong class="{% if h.uptime_pct >= 99 %}text-success{% elif h.uptime_pct >= 90 %}text-warning{% else %}text-danger{% endif %}">
{{ "%.1f"|format(h.uptime_pct) }}%
</strong>
</td>
<td class="text-end">{{ h.avg_ms }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
{% endif %}
<p class="text-muted mt-3"><small>
<i class="fas fa-robot me-1"></i>Operation Ollama-First v5.0 / Phase 40 — 主機健康監控(含 24h 歷史 / MCP / AIOps / AutoHeal L2
</small></p>
</div>
<script>
async function triggerAutoHeal(hostLabel) {
if (!confirm(`觸發 AutoHeal\n\n主機:${hostLabel}\n\n會跑對應 ADR-013 playbookDOCKER_RESTART / SSH_CMD / ALERT_ONLY並寫入 incidents 表。`)) return;
try {
const r = await fetch('/observability/host_health/trigger_autoheal', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({host_label: hostLabel}),
});
const d = await r.json();
if (d.ok) {
alert(`✅ AutoHeal 已派出\n動作:${d.action || '—'}\n訊息:${d.message || ''}`);
window.location.reload();
} else {
alert('❌ ' + (d.error || d.message || '觸發失敗'));
}
} catch (e) {
alert('Error: ' + e);
}
}
</script>
{% endblock %}