All checks were successful
CD Pipeline / deploy (push) Successful in 2m37s
統帥質疑:「那六頁的視覺方格 UI/UX 搞好了嗎?還有新增頁面嗎?」
回答:沒有,從 Phase 38 開始一直推遲。本 commit 補做。
I-1: 6 頁 base.html → ewoooc_base.html
- host_health / ai_calls_dashboard / budget / promotion_review /
quality_trend / ppt_audit_history 全改
- {% extends "base.html" %} → {% extends "ewoooc_base.html" %}
- {% block content %} → {% block ewooo_content %}
- 自動繼承:sidebar 240px / topbar 64px / fonts (Inter+JetBrains+Noto Sans TC)
/ ewoooc-tokens.css / ewoooc-shell.css / search box / 米色背景
I-2: _ewoooc_shell.html 加「AI 觀測」nav group
- 7 個項目:觀測台總覽 / 主機健康 / AI 呼叫 / 預算控管 /
RAG 晉升審核 / 反饋趨勢 / PPT 視覺審核
- 對應 active_page='obs_*',正確高亮
- 編號 07-13(系統管理改 14)
I-3: 新增頁面 /observability/ + /observability/overview
- routes/admin_observability_routes.py::observability_overview
- 單頁聚合 8 表跨 JOIN 的 KPI:
• 三主機 24h 在線率(host_health_probes,per host card)
• AI 呼叫 24h(ai_calls:total/tokens/cost/error rate/RAG hit/cache hit)
• 當月成本累計
• 預算告警(ratio ≥ alert_pct 自動列表)
• AIOps 7d(incidents + heal_logs:自癒成功率)
• MCP 24h(mcp_calls:tool 呼叫 + cache 率 + cost)
• RAG 學習 30d(learning_episodes:待審 + 晉升率)
• PPT 視覺審核 7d(ppt_audit_results:通過率)
• 6 大子頁入口卡(含一行說明)
- 對應 Phase 44 daily Telegram summary 的 web 版本
- 全部失敗安全(個別 query 失敗只跳過該卡,不擋整頁)
升級對應:
- UI 框架:base.html → ewoooc_base.html ✅(sidebar + topbar + token css 已生效)
- 設計憲法:8 卡片 + 8 表跨 JOIN 全景 + 一頁式總覽
- 入口:sidebar 7 項 + 觀測台首頁
- 資料表覆蓋:4 表(Phase 38)→ 8 表(Phase 45)
注意:完整 design token 重塑(Bootstrap class → --momo-* token / 焦糖橘)
留待後續 phase;本 commit 重點是「框架升級 + 新總覽頁」。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
297 lines
12 KiB
HTML
297 lines
12 KiB
HTML
{% extends "ewoooc_base.html" %}
|
||
|
||
{% block title %}主機健康監控{% endblock %}
|
||
|
||
{% block ewooo_content %}
|
||
<div class="container-fluid mt-3">
|
||
<h2 class="mb-3"><i class="fas fa-heartbeat me-2"></i>主機健康監控
|
||
<small class="text-muted">三主機 Ollama + MCP + 成本節流即時狀態</small>
|
||
</h2>
|
||
|
||
<!-- Ollama 三主機 -->
|
||
<div class="card mb-3">
|
||
<div class="card-header"><strong><i class="fas fa-server me-2"></i>Ollama 三主機(HTTP /api/tags 即時 probe)</strong></div>
|
||
<div class="card-body p-0">
|
||
<table class="table mb-0">
|
||
<thead class="table-light">
|
||
<tr><th>角色</th><th>主機</th><th>HTTP 健康</th><th>異常標記</th><th>已載入模型</th><th>動作</th></tr>
|
||
</thead>
|
||
<tbody>
|
||
{% for h in ollama_hosts %}
|
||
<tr>
|
||
<td><strong>{{ h.label }}</strong></td>
|
||
<td><code>{{ h.host }}</code></td>
|
||
<td>
|
||
{% if h.healthy %}
|
||
<span class="badge bg-success"><i class="fas fa-check me-1"></i>HTTP 正常</span>
|
||
{% else %}
|
||
<span class="badge bg-danger"><i class="fas fa-times me-1"></i>離線</span>
|
||
{% endif %}
|
||
</td>
|
||
<td>
|
||
{% if h.unhealthy_mark %}
|
||
<span class="badge bg-warning"><i class="fas fa-exclamation-triangle me-1"></i>已標記異常(30 秒)</span>
|
||
{% else %}
|
||
<span class="badge bg-light text-dark">—</span>
|
||
{% endif %}
|
||
</td>
|
||
<td>
|
||
{% for m in h.models %}
|
||
<span class="badge bg-info text-dark me-1">{{ m }}</span>
|
||
{% endfor %}
|
||
{% if not h.models %}<small class="text-muted">無 / 未連線</small>{% endif %}
|
||
</td>
|
||
<td>
|
||
{% if h.unhealthy_mark or not h.healthy %}
|
||
<button class="btn btn-sm btn-outline-danger"
|
||
onclick="triggerAutoHeal({{ h.label|tojson }})">
|
||
<i class="fas fa-band-aid me-1"></i>AutoHeal
|
||
</button>
|
||
{% else %}
|
||
<small class="text-muted">—</small>
|
||
{% endif %}
|
||
</td>
|
||
</tr>
|
||
{% endfor %}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- MCP servers -->
|
||
<div class="card mb-3">
|
||
<div class="card-header"><strong><i class="fas fa-plug me-2"></i>MCP 服務(Phase 10/10.5)</strong></div>
|
||
<div class="card-body p-0">
|
||
<table class="table mb-0">
|
||
<thead class="table-light">
|
||
<tr><th>服務名稱</th><th>狀態</th></tr>
|
||
</thead>
|
||
<tbody>
|
||
{% for server, healthy in mcp_status.items() %}
|
||
<tr>
|
||
<td><code>{{ server }}</code></td>
|
||
<td>
|
||
{% if healthy %}
|
||
<span class="badge bg-success"><i class="fas fa-check me-1"></i>正常</span>
|
||
{% else %}
|
||
<span class="badge bg-secondary">— 未啟用 / 離線</span>
|
||
{% endif %}
|
||
</td>
|
||
</tr>
|
||
{% else %}
|
||
<tr><td colspan="2" class="text-muted small">MCP_ROUTER_ENABLED=false 或 mcp-stack 未部署</td></tr>
|
||
{% endfor %}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- Cost Throttle 狀態(Phase 20) -->
|
||
<div class="card mb-3">
|
||
<div class="card-header"><strong><i class="fas fa-dollar-sign me-2"></i>成本節流狀態(Phase 20)</strong></div>
|
||
<div class="card-body p-0">
|
||
{% if throttle_state %}
|
||
<table class="table mb-0">
|
||
<thead class="table-light">
|
||
<tr><th>供應商</th><th>已花費</th><th>預算</th><th>月底推估</th><th>使用率</th><th>狀態</th></tr>
|
||
</thead>
|
||
<tbody>
|
||
{% for provider, info in throttle_state.items() %}
|
||
<tr {% if info.throttled %}class="table-warning"{% endif %}>
|
||
<td><code>{{ provider }}</code></td>
|
||
<td>${{ "%.2f"|format(info.spent) }}</td>
|
||
<td>${{ "%.2f"|format(info.budget) }}</td>
|
||
<td>${{ "%.2f"|format(info.projected) }}</td>
|
||
<td>{{ "%.0f"|format(info.ratio * 100) }}%</td>
|
||
<td>
|
||
{% if info.throttled %}
|
||
<span class="badge bg-danger"><i class="fas fa-exclamation-triangle me-1"></i>已節流</span>
|
||
{% else %}
|
||
<span class="badge bg-success"><i class="fas fa-check me-1"></i>正常</span>
|
||
{% endif %}
|
||
</td>
|
||
</tr>
|
||
{% endfor %}
|
||
</tbody>
|
||
</table>
|
||
{% else %}
|
||
<p class="text-muted m-3 small">
|
||
COST_THROTTLE_ENABLED=false 或尚未首次評估(每小時 cron 執行)
|
||
</p>
|
||
{% endif %}
|
||
</div>
|
||
</div>
|
||
|
||
<!-- AIOps 7d 摘要(Phase 39 D-5 新增) -->
|
||
{% if aiops_summary %}
|
||
<div class="card mb-3" style="border-left: 4px solid #0d6efd;">
|
||
<div class="card-header">
|
||
<strong><i class="fas fa-shield-virus me-2"></i>AIOps 自癒系統 7 日摘要</strong>
|
||
<small class="text-muted">資料來源:incidents + heal_logs(ADR-013 AutoHeal 閉環)</small>
|
||
</div>
|
||
<div class="card-body">
|
||
<div class="row g-2">
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">總事件</small>
|
||
<strong style="font-size: 1.4em;">{{ aiops_summary.incidents_total }}</strong>
|
||
</div>
|
||
</div>
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">未解決</small>
|
||
<strong class="{% if aiops_summary.incidents_open > 0 %}text-danger{% endif %}" style="font-size: 1.4em;">
|
||
{{ aiops_summary.incidents_open }}
|
||
</strong>
|
||
</div>
|
||
</div>
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">已解決</small>
|
||
<strong class="text-success" style="font-size: 1.4em;">{{ aiops_summary.incidents_resolved }}</strong>
|
||
</div>
|
||
</div>
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">P0/P1</small>
|
||
<strong class="{% if (aiops_summary.incidents_p0 + aiops_summary.incidents_p1) > 0 %}text-danger{% endif %}" style="font-size: 1.4em;">
|
||
{{ aiops_summary.incidents_p0 + aiops_summary.incidents_p1 }}
|
||
</strong>
|
||
</div>
|
||
</div>
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">自癒成功率</small>
|
||
<strong class="{% if aiops_summary.heal_success_rate >= 80 %}text-success{% elif aiops_summary.heal_success_rate >= 50 %}text-warning{% else %}text-danger{% endif %}" style="font-size: 1.4em;">
|
||
{{ "%.0f"|format(aiops_summary.heal_success_rate) }}%
|
||
</strong>
|
||
</div>
|
||
</div>
|
||
<div class="col-md-2 col-sm-4">
|
||
<div class="border rounded p-2 text-center">
|
||
<small class="text-muted d-block">平均自癒耗時</small>
|
||
<strong style="font-size: 1.4em;">{{ aiops_summary.heals_avg_ms }} ms</strong>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="mt-2 small text-muted">
|
||
<i class="fas fa-info-circle me-1"></i>
|
||
7d 共 {{ aiops_summary.heals_total }} 次自癒嘗試
|
||
(成功 {{ aiops_summary.heals_success }} · 失敗 {{ aiops_summary.heals_failed }})
|
||
</div>
|
||
</div>
|
||
</div>
|
||
{% endif %}
|
||
|
||
<!-- MCP 24h 工作量(Phase 39 D-2 新增) -->
|
||
{% if mcp_24h %}
|
||
<div class="card mb-3">
|
||
<div class="card-header"><strong><i class="fas fa-bolt me-2"></i>MCP 服務 24h 工作量</strong>
|
||
<small class="text-muted">資料來源:mcp_calls 表 — 展現 AI×MCP 編排規模</small>
|
||
</div>
|
||
<div class="card-body p-0">
|
||
<table class="table mb-0">
|
||
<thead class="table-light">
|
||
<tr>
|
||
<th>服務</th>
|
||
<th class="text-end">呼叫次數</th>
|
||
<th class="text-end">成功率</th>
|
||
<th class="text-end">快取命中率</th>
|
||
<th class="text-end">使用 Tool 數</th>
|
||
<th class="text-end">平均耗時</th>
|
||
<th class="text-end">成本 (USD)</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
{% for s in mcp_24h %}
|
||
<tr>
|
||
<td><code>{{ s.server }}</code></td>
|
||
<td class="text-end">{{ "{:,}".format(s.total_calls) }}</td>
|
||
<td class="text-end">
|
||
<strong class="{% if s.success_rate >= 95 %}text-success{% elif s.success_rate >= 80 %}text-warning{% else %}text-danger{% endif %}">
|
||
{{ "%.1f"|format(s.success_rate) }}%
|
||
</strong>
|
||
</td>
|
||
<td class="text-end">
|
||
<span class="{% if s.cache_rate >= 30 %}text-success{% endif %}">
|
||
{{ "%.1f"|format(s.cache_rate) }}%
|
||
</span>
|
||
</td>
|
||
<td class="text-end">{{ s.tools_used }}</td>
|
||
<td class="text-end">{{ s.avg_ms }} ms</td>
|
||
<td class="text-end">${{ "%.4f"|format(s.total_cost) }}</td>
|
||
</tr>
|
||
{% endfor %}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
{% endif %}
|
||
|
||
<!-- 過去 24h 健康趨勢(Phase 38 新增) -->
|
||
{% if health_history %}
|
||
<div class="card mb-3">
|
||
<div class="card-header"><strong><i class="fas fa-chart-line me-2"></i>過去 24 小時健康趨勢</strong>
|
||
<small class="text-muted">資料來源:host_health_probes(每次刷新自動寫入)</small>
|
||
</div>
|
||
<div class="card-body p-0">
|
||
<table class="table mb-0">
|
||
<thead class="table-light">
|
||
<tr>
|
||
<th>角色</th>
|
||
<th class="text-end">總探針次數</th>
|
||
<th class="text-end">正常次數</th>
|
||
<th class="text-end">離線次數</th>
|
||
<th class="text-end">在線率</th>
|
||
<th class="text-end">平均回應 ms</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
{% for h in health_history %}
|
||
<tr>
|
||
<td><strong>{{ h.host_label }}</strong></td>
|
||
<td class="text-end">{{ h.total }}</td>
|
||
<td class="text-end text-success">{{ h.up_count }}</td>
|
||
<td class="text-end text-danger">{{ h.down_count }}</td>
|
||
<td class="text-end">
|
||
<strong class="{% if h.uptime_pct >= 99 %}text-success{% elif h.uptime_pct >= 90 %}text-warning{% else %}text-danger{% endif %}">
|
||
{{ "%.1f"|format(h.uptime_pct) }}%
|
||
</strong>
|
||
</td>
|
||
<td class="text-end">{{ h.avg_ms }}</td>
|
||
</tr>
|
||
{% endfor %}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
{% endif %}
|
||
|
||
<p class="text-muted mt-3"><small>
|
||
<i class="fas fa-robot me-1"></i>Operation Ollama-First v5.0 / Phase 40 — 主機健康監控(含 24h 歷史 / MCP / AIOps / AutoHeal L2)
|
||
</small></p>
|
||
</div>
|
||
|
||
<script>
|
||
async function triggerAutoHeal(hostLabel) {
|
||
if (!confirm(`觸發 AutoHeal?\n\n主機:${hostLabel}\n\n會跑對應 ADR-013 playbook(DOCKER_RESTART / SSH_CMD / ALERT_ONLY)並寫入 incidents 表。`)) return;
|
||
try {
|
||
const r = await fetch('/observability/host_health/trigger_autoheal', {
|
||
method: 'POST',
|
||
headers: {'Content-Type': 'application/json'},
|
||
body: JSON.stringify({host_label: hostLabel}),
|
||
});
|
||
const d = await r.json();
|
||
if (d.ok) {
|
||
alert(`✅ AutoHeal 已派出\n動作:${d.action || '—'}\n訊息:${d.message || ''}`);
|
||
window.location.reload();
|
||
} else {
|
||
alert('❌ ' + (d.error || d.message || '觸發失敗'));
|
||
}
|
||
} catch (e) {
|
||
alert('Error: ' + e);
|
||
}
|
||
}
|
||
</script>
|
||
{% endblock %}
|