AWOOOI CD
|
fdb8c2b97b
|
chore(cd): deploy a86ecf3 [skip ci]
|
2026-04-12 04:28:38 +00:00 |
|
OG T
|
a86ecf32a2
|
fix(cd): 修復 non-fast-forward push 失敗 + 部署 8be87b0 修復版
CD Pipeline / build-and-deploy (push) Successful in 19m9s
1. kustomization.yaml: c439277 → 8be87b0 (auto_approve/decision_manager/webhooks)
2. cd.yaml: git push 前先 fetch+rebase,避免 CI 期間其他 commit 造成 non-fast-forward
8be87b0 包含:
- auto_approve: high risk 開放自動執行 + DESTRUCTIVE_PATTERNS 攔截
- decision_manager: classify_notification() 接通 + NO_ACTION 早退 + MCP context 收集
- webhooks: target_resource 修正 (name/container label 提取,DockerContainerUnhealthy 不再 target=alertname)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 12:17:02 +08:00 |
|
OG T
|
08de73be5a
|
chore(cd): deploy 8be87b0 — auto_approve/decision_manager/webhooks 修復上線
|
2026-04-12 12:13:39 +08:00 |
|
OG T
|
3086123962
|
docs(logbook): Memory 清理 — LOGBOOK 壓縮 1176→46 行
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 12:12:02 +08:00 |
|
OG T
|
796517f64a
|
docs(logbook): SSH MCP 連通驗證完成 + 人工操作清單全清零
- 188(ollama) + 110(wooo) SSH from API Pod: OK
- authorized_keys: ALREADY EXISTS (兩台)
- 192.168.0.111 確認不存在於五主機架構,舊 Memory 修正
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 12:08:37 +08:00 |
|
OG T
|
c7677750b5
|
docs(adr-070): 補全 c439277 全自動化三大修復 + Tier 3 CR 修補記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 00:09:18 +08:00 |
|
OG T
|
4c2b69248b
|
docs(logbook): c439277 Tier 3 Code Review 全修補記錄
E2E Health Check / e2e-health (push) Successful in 33s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 22:06:27 +08:00 |
|
OG T
|
8be87b0f32
|
fix(review): 首席架構師 Code Review — c439277 Tier 3 紅區修補
CD Pipeline / build-and-deploy (push) Failing after 8m39s
Critical:
- C1: decision_manager _collect_mcp_context container 變數 Python ternary 優先度 bug 修正
原: `A or B or C[0] if list else ""` (ternary 控制全式)
修: `A or B or (C[0] if list else "")` (明確括號)
- C2: 所有 MCP 呼叫加 asyncio.wait_for timeout=5s,防止阻塞決策主路徑
同時加 unknown host warning log (C4)
- C3+M1: _DESTRUCTIVE_PATTERNS 補全移至模組頂層常量
新增: delete pods(複數)/kubectl drain/kubectl cordon/kubectl rollout undo/
docker rm/docker stop/docker kill/rm -rf/"replicas": 0(JSON patch)
Important:
- I1: webhooks.py IP 排除改用 is_internal_ip() 支援全 RFC-1918 (10.x/172.16-31.x/192.168.x)
- I4: 新增 test_destructive_patterns.py — 25 測試全過
涵蓋: 常量存在、攔截、誤攔迴歸、critical 永遠攔截
🔴 Tier 3 紅區 — 首席架構師 Code Review 通過後 push
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 22:05:52 +08:00 |
|
AWOOOI CD
|
45cf1b869f
|
chore(cd): deploy c439277 [skip ci]
|
2026-04-11 14:04:07 +00:00 |
|
OG T
|
c439277fc3
|
feat(aiops): ADR-070 全自動化方向 — 三大修復
CD Pipeline / build-and-deploy (push) Has been cancelled
1. auto_approve.py: 允許 high risk 自動執行 (low/medium/high 全開放)
- min_confidence 0.65→0.50 (信心門檻降低)
- 新增 DESTRUCTIVE_PATTERNS 攔截真正危險指令
(scale=0, delete deployment/pvc/namespace, drop table)
- 核心: critical + 破壞性操作 → 人工; 其他 → 全自動
2. decision_manager.py: 新增 _collect_mcp_context()
- LLM 分析前先收集真實環境狀態 (SSH/K8s MCP)
- Host/Docker 告警 → ssh_get_container_status + ssh_get_top_processes
- K8s 告警 → k8s_get_events
- 注入 diagnosis_context "當前環境狀態 (MCP 實時查詢)" 區段
3. webhooks.py: 修復 target_resource 提取
- 新增 name/container/job label 提取
- DockerContainerUnhealthy 不再 target=alertname
- IP 位址自動排除 (192.x 開頭不作為 target)
🔴 Tier 3 紅區 — 需首席架構師批准
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:39:52 +08:00 |
|
OG T
|
99cc420429
|
docs(review): 首席架構師 Code Review 後 — ADR-064/067 + Skill 02 補全記錄
ADR-064: 補 I1 整合記錄(get_incident_type 三層降級、rule.id ≠ incident_type 設計決策)
ADR-067: 補 D1 集中化完成記錄(9 purpose keys 對應表)
Skill 02: 補 get_incident_type 使用規範 + Ollama D1 模型中央化禁令
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:35:25 +08:00 |
|
OG T
|
d77b2add73
|
fix(review): 首席架構師 Code Review 修補 — I1 get_incident_type 邏輯修正 + 測試補全
CD Pipeline / build-and-deploy (push) Failing after 8m13s
Code Review 發現 2 個 Critical + 2 個 Important 問題:
Critical:
- rule.id 語意為「規則識別符」,與 incident_type 命名空間不同,不可混用
移除 rule_id fallback 路徑,YAML 匹配無 incident_type 時 fall through 靜態 dict
- get_incident_type() 關鍵路徑無測試覆蓋
新增 test_get_incident_type.py:11 測試、4 類別(靜態/YAML優先/YAML錯誤/custom)全過
Important:
- ALERTNAME_TO_TYPE deferred import 移至模組頂層(無 circular 風險)
- alert_types.py TODO 過期 → 更新為 I1 整合後正確說明
技術債記錄:NetworkPolicy ArgoCD egress ClusterIP 10.43.16.201/32 需 ArgoCD 重裝後更新
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:33:19 +08:00 |
|
OG T
|
b2dfcf9b0d
|
fix(telegram): safety guard 攔截改發人工審核卡片,不再發 ❌ 失敗訊息
問題:AI 無法確認 deployment name 時,每次告警都發一條
「❌ 自動修復失敗 kubectl scale deployment unknown」的垃圾訊息
修復:
- safety guard 攔截 → token.state 回 READY(非 ERROR)
- 改呼叫 _push_decision_to_telegram,發 TYPE-4 人工審核卡片
- mcp_all_failed=True 讓 classify_notification 選 TYPE-4
- K8s 找不到 target 的路徑同樣處理
效果:統帥看到的是「需要人工介入的審核卡片」而非「修復失敗」錯誤訊息
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:33:19 +08:00 |
|
AWOOOI CD
|
33a6f34104
|
chore(cd): deploy 615822d [skip ci]
|
2026-04-11 13:29:38 +00:00 |
|
OG T
|
615822dcf3
|
feat(I1): ADR-064 Rule Engine 整合 — 動態推斷 incident_type
CD Pipeline / build-and-deploy (push) Successful in 11m28s
- alert_rule_engine.py: 新增 get_incident_type(alertname)
優先從 YAML 規則 match.alertname 查找 incident_type/rule_id
Fallback: ALERTNAME_TO_TYPE 靜態 dict → "custom"
- webhooks.py: alert_type 改用 get_incident_type(alertname)
取代 ALERTNAME_TO_TYPE.get() 靜態查找
- YAML 規則 19 條 alertname 覆蓋自動生效(無需手改 dict)
- 新 alertname 觸發 generic_fallback → auto_generate_rule() 後自動加入 YAML
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:21:41 +08:00 |
|
OG T
|
1ede9f933f
|
refactor(M3): alertname_to_type 抽至 src/constants/alert_types.py
CD Pipeline / build-and-deploy (push) Has been cancelled
- 新增 src/constants/__init__.py + alert_types.py
- ALERTNAME_TO_TYPE 常數(56 筆)從 webhooks.py 內聯 dict 遷移至模組
- webhooks.py 改用 ALERTNAME_TO_TYPE.get(alertname, "custom")
- TODO I1: 下 Sprint 整合 ADR-064 Rule Engine 動態推斷(此為中間狀態)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:19:52 +08:00 |
|
AWOOOI CD
|
37dfbaf26c
|
chore(cd): deploy f23176c [skip ci]
|
2026-04-11 13:19:04 +00:00 |
|
OG T
|
f23176cbb9
|
fix(k8s): ArgoCD MCP 網路連線修復 — ARGOCD_URL 改用 120:30443
CD Pipeline / build-and-deploy (push) Has started running
- NetworkPolicy v1.4: 加入 ArgoCD MCP egress 規則
- argocd namespace Pod selector (port 8080, ClusterIP fallback)
- 192.168.0.120:30443 NodePort(ClusterIP DNAT 跨 namespace 不穩定)
- ARGOCD_URL: 192.168.0.125 → 192.168.0.120:30443(K3s Master NodePort,更穩定)
- 已驗證: 192.168.0.120:30443 從 Pod 內部可達,apps=[awoooi-prod]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:10:52 +08:00 |
|
AWOOOI CD
|
4a00573a20
|
chore(cd): deploy 4b591d1 [skip ci]
|
2026-04-11 13:07:59 +00:00 |
|
OG T
|
4b591d130f
|
chore: ArgoCD MCP egress NetworkPolicy + LOGBOOK Session 6
CD Pipeline / build-and-deploy (push) Has been cancelled
- k8s NetworkPolicy v1.4: 新增 argocd namespace egress (port 80/443)
- LOGBOOK: Session 6 審計條目
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:59:25 +08:00 |
|
AWOOOI CD
|
59dff1a478
|
chore(cd): deploy f2c18c4 [skip ci]
|
2026-04-11 12:54:21 +00:00 |
|
OG T
|
f2c18c4e63
|
feat(D1): models.json 集中化 — ADR-067 五大 Ollama 應用 hardcode 消除
CD Pipeline / build-and-deploy (push) Successful in 12m56s
- models.json v1.3.0: providers.ollama.models 新增 9 個 purpose keys
(drift_summary/drift_intent/log_anomaly/nemoclaw/playbook_draft/
code_review/embedding/rag_generate/image_analysis)
- drift_narrator_service: NARRATOR_MODEL → get_model("ollama","drift_summary")
- drift_interpreter: MODEL → get_model("ollama","drift_intent")
- log_summary_service: SUMMARY_MODEL → get_model("ollama","log_anomaly")
- local_code_review_service: _MODEL_OLLAMA → get_model("ollama","code_review")
- image_analysis_service: _MODEL → get_model("ollama","image_analysis")
- decision_manager: nemoclaw + playbook_draft 兩處 → get_model()
- embedding_service: get_embedding_service() factory → get_model("ollama","embedding")
- knowledge_service: OllamaEmbeddingService(model=...) → get_model()
所有模型名稱現在統一由 models.json 管理,修改模型只需改一個檔案。
LOGBOOK 更新:D1 完成 + B2 已完成確認
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:45:53 +08:00 |
|
AWOOOI CD
|
694471891f
|
chore(cd): deploy 82e1c05 [skip ci]
|
2026-04-11 12:45:05 +00:00 |
|
OG T
|
82e1c05df8
|
fix(review): Code Review C1/C2/I2/M2 修補
CD Pipeline / build-and-deploy (push) Has been cancelled
C1 drift_interpreter: 寫死 192.168.0.111 → settings.OLLAMA_URL
違反 feedback_frontend_internal_ip_ban 鐵律(後端 service 層同樣禁止寫死內網 IP)
C2 km_conversion_service: BUG-004 補同步 Redis Working Memory vectorized 欄位
原修復只更新 DB,Redis incident:{id} JSON 的 vectorized 未同步
→ 審計查 Redis 仍顯示 False,fly-wheel 閉環指標仍不準
修復:DB 更新後 GET → JSON patch vectorized=True → SET(保留原 TTL)
I2 decision_manager: _ALERTNAME_KEYWORDS HostHighDiskUsage→HostOutOfDiskSpace
+ 補 DockerContainerExited
+ fallback 路徑加 debug log
M2 decision_manager: import json as _json 從 for 迴圈移至方法頂部
docs: ADR-072 新增 Code Review 發現與技術債記錄
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:36:59 +08:00 |
|
OG T
|
e447f97616
|
fix(telegram): 接通 classify_notification + 修復 HostBackupFailed 亂送按鈕
三個問題同時修復:
1. classify_notification() 死程式碼接通
- _push_decision_to_telegram() 現在先呼叫 classify_notification()
- TYPE-1 (純資訊) → send_info_notification(),無按鈕
- TYPE-4D (Config Drift) → send_drift_card()
- 其餘 TYPE-2/3/4 → send_approval_card()(原有按鈕)
- decision_state + auto_executed 從呼叫端注入 proposal_data
2. alert_rules.yaml 補 host_backup_failed 規則
- HostBackupFailed / VeleroBackupFailed / VeleroBackupNotRun → NO_ACTION
- 不再走 generic_fallback → 不再產生 kubectl rollout restart deployment/backup
3. _verify_k8s_deployment_exists() 主機層告警不再保守放行
- Host*/Docker*/Backup*/Velero*/SSH* 前綴告警 → K8s MCP 不可用時 return False
- _auto_execute() 收到 NO_ACTION 或空 kubectl_command → 早退,不執行
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:35:48 +08:00 |
|
OG T
|
9382814d14
|
docs(adr-072): 全部完成 BUG-001~008
ADR-072 狀態更新為「全部修復完成」
BUG-007 確認不需修(alerts-unified.yml 全 42 規則均有 severity)
BUG-008 已修復(f34fe19)
LOGBOOK 新增 P2 完成條目 + 下一步說明
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:30:29 +08:00 |
|
OG T
|
f34fe19134
|
fix(aiops): ADR-072 BUG-008 alertname_to_type 9→56 筆
CD Pipeline / build-and-deploy (push) Has been cancelled
從 9 筆靜態 map 擴充至完整涵蓋 alerts-unified.yml 全 42 個 alertname:
- host_alerts: HostDown/HostHighCpuLoad/HostOutOfMemory/HostOutOfDiskSpace/HostBackupFailed
- k8s: K3sNodeNotReady/KubePodCrashLooping/KubeDeploymentReplicasMismatch/Velero* (8筆)
- database: PostgreSQL*/Redis* (10 筆)
- service_alerts: *Down (8 筆)
- external: *Down/SSLExpiring (5 筆)
- alert_chain: AlertChainBroken*/NoAlerts/Unhealthy (4 筆)
- docker_health: DockerContainerUnhealthy/Exited (2 筆)
- auto_repair: AutoRepairLowSuccessRate/PermanentFixRequired (2 筆)
- 舊版相容: HighCPUUsage/HighMemoryUsage/DiskSpaceLow/SSLCertExpiringSoon/TargetDown
預期效果: 69/112 incidents "custom" → 大幅降低,HostHighCpuLoad → "host_cpu"
BUG-007 確認不需修: alerts-unified.yml 全 42 規則均已有 severity label
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:29:34 +08:00 |
|
OG T
|
85c71bf73c
|
docs(adr-072): 更新 Bug 修復狀態 + LOGBOOK
ADR-072: BUG-001~006 標記已修復 (P0 commit 88e3197, P1 commit 5aa0244)
LOGBOOK: 新增 ADR-072 P0+P1 全修復條目
P2 待修: BUG-007 severity labels + BUG-008 alertname_to_type
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:26:27 +08:00 |
|
OG T
|
5aa0244c9a
|
fix(aiops): ADR-072 P1 Bug 修復 — BUG-004/005/006
CD Pipeline / build-and-deploy (push) Has been cancelled
BUG-004 KM vectorization 108/112 = False:
km_conversion_service: KM entry 建立後(embedding 已背景觸發),
補寫 incidents.vectorized = True,飛輪閉環(ADR-068)學習指標正常
BUG-005 15 ready decisions 無人審核:
decision_manager: 新增 resend_stale_ready_tokens(),
掃描 Redis decision:* key,找出 state=ready 且 dedup_key 過期的 token,
重新推送 Telegram 審核卡片
main.py: lifespan startup 排程 resend_stale_ready_tokens()(asyncio.create_task 非阻塞)
BUG-006 outcome/verification_result 全 null:
_push_auto_repair_result: Telegram 推送前先寫入
incidents.outcome + incidents.verification_result 到 DB
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:24:41 +08:00 |
|
OG T
|
2185e1755c
|
fix(aiops): ADR-072 P0 Bug 修復 — BUG-001/002/003
CD Pipeline / build-and-deploy (push) Has started running
BUG-001 drift_interpreter: nvidia_provider 已重構為 NvidiaProviderResult 物件(非 4-tuple)
→ 改用 Ollama httpx 直接呼叫 qwen2.5:7b-instruct,繞過 nvidia_provider
→ 消除所有 K8s config drift 告警的 "too many values to unpack" 永久失敗
BUG-002 deployment_name="unknown": 主機層告警(HostHighCpuLoad 等)無 component/job/pod label
→ _auto_execute() 新增 _resolve_target_from_k8s() 補救
→ K8s MCP kubectl get pods 動態查詢受影響 Pod,去掉 hash suffix 得到 deployment name
BUG-003 無效 deployment 通過 safety guard:
→ _auto_execute() safety guard 通過後加入 _verify_k8s_deployment_exists() 存在性確認
→ K8s 中找不到 deployment/pod → 拒絕執行,寫入 DecisionToken.error
→ K8s MCP 不可用時保守放行(不阻塞主流程)
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:20:39 +08:00 |
|
AWOOOI CD
|
2ad2a7ba45
|
chore(cd): deploy f323633 [skip ci]
|
2026-04-11 12:18:44 +00:00 |
|
OG T
|
f3236338a5
|
fix(security): Code Review P0+P1+P2 全修補 — MCP Phase 2b-3 + decision_manager
CD Pipeline / build-and-deploy (push) Has been cancelled
P0: decision_manager _fetch_metrics_snapshot 參數型別錯誤
- prom._instant_query(str) → prom._instant_query({"query": str})
- 結果解析 r.get("status")=="success" → r.get("result", [])
P1: prometheus_provider — alertname PromQL injection 防範
- 新增 _RE_SAFE_ALERTNAME 白名單正則
P1: decision_manager — kubectl action 危險字元注入防範
- 新增 _ALLOWED_KUBECTL_PATTERN 白名單,非法指令格式直接拒絕
P1: decision_manager — 6 個 asyncio.create_task() GC 風險
- 新增 _background_tasks: set + _fire_and_forget() helper
- 所有 bare create_task 改用 _fire_and_forget
P1: ssh_provider — Group B 寫入工具強制需要 known_hosts
- known_hosts 未設定或檔案不存在時拒絕執行,防 MITM
P2: sentry_provider — query 語意白名單驗證
- 新增 _RE_SAFE_SENTRY_QUERY,拒絕含特殊字元的 query
P2: argocd_provider — verify=False 改為 ARGOCD_VERIFY_TLS 環境變數開關
- 新增 _tls_verify() helper,預設 false(self-signed cert)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:10:33 +08:00 |
|
OG T
|
083b1a5449
|
fix(cd): 修復 gitea remote 設定邏輯 — remove+add 取代 add||set-url
CD Pipeline / build-and-deploy (push) Has been cancelled
原始 `add 2>/dev/null || set-url` 邏輯:當 remote 不存在時 set-url 也失敗
新邏輯:先強制 remove(允許失敗),再 add,確保 remote 一定存在
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:07:54 +08:00 |
|
OG T
|
09982fdfaa
|
docs(session6): Telegram 全面審計 + ADR-072 Bug 清單 + 規格整合
- LOGBOOK: Session 6 Redis DB10 審計結果(8個系統性問題,P0-P2分級)
- ADR-072: AIOps 閉環 Bug 修復清單(drift_interpreter/deployment_name/KM vectorization等)
- 規格文件 v2.2: 確認 Sprint A/B/C + MCP 1-4 + ADR-071 全部完成,標記下一步為 ADR-072
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:04:50 +08:00 |
|
OG T
|
a1432c03ed
|
docs: ADR-070/071 + ssh-mcp-setup runbook + Skill-04 v2.7
- ADR-070: 全自動 AIOps 閉環 MCP Phase 1-4 決策文件
- ADR-071: 告警通知四類型 + KM 三段資料閉環決策文件
- docs/runbooks/ssh-mcp-setup.md: SSH MCP 建立/驗證/輪換 SOP
- Skill-04: v2.7 新增 Sprint C DR + ADR-070 MCP 10 providers 完整記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:04:47 +08:00 |
|
OG T
|
0f46799d56
|
docs(logbook): MCP 全驗收完成 + Sentry/Prometheus bug 修復記錄
|
2026-04-11 19:54:05 +08:00 |
|
OG T
|
b5aa607a30
|
fix(mcp): 修正 Prometheus URL (110:9090) + Sentry DSN 改 HTTP 內網
CD Pipeline / build-and-deploy (push) Failing after 8m45s
- PROMETHEUS_URL: 188:9090 → 110:9090 (Prometheus server 正確位置)
- SENTRY_DSN: https://sentry.wooo.work → http://192.168.0.110:9000 (消除 SSL hostname mismatch)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 19:51:20 +08:00 |
|
OG T
|
a6e6f389e2
|
chore: 清理觸發 CD 的臨時注釋
CD Pipeline / build-and-deploy (push) Failing after 8m9s
|
2026-04-11 19:15:04 +08:00 |
|
OG T
|
40d6536b62
|
ci: 觸發 CD — MCP Phase 3/4 + SSH MCP 完整啟用 (providers注釋更新)
CD Pipeline / build-and-deploy (push) Waiting to run
|
2026-04-11 19:14:17 +08:00 |
|
OG T
|
a0d0d66809
|
ci: 觸發 CD
|
2026-04-11 19:14:17 +08:00 |
|
OG T
|
5c2cdff37f
|
ci: 觸發 CD — MCP Phase 3/4 + SSH MCP 啟用
|
2026-04-11 19:13:46 +08:00 |
|
OG T
|
95b61802be
|
fix(mcp): ssh-mcp-key volumeMount 路徑修正 — subPath 對齊 ssh_provider.py
CD Pipeline / build-and-deploy (push) Failing after 7m45s
- ssh_mcp_key → /run/secrets/ssh_mcp_key (SSH_KEY_PATH)
- known_hosts → /etc/ssh-mcp/known_hosts (SSH_MCP_KNOWN_HOSTS_FILE)
同步: K8s Secret 重建(含 ssh_mcp_key + known_hosts)
188/110 authorized_keys 已加入公鑰
SSH 連線驗證: 188 OK / 110 OK
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:59:29 +08:00 |
|
OG T
|
9f5120bde1
|
docs(logbook): Session 結尾 — MCP Phase 2a SSH volume + 全啟用完成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:36:11 +08:00 |
|
OG T
|
b1c1091787
|
feat(mcp): MCP Phase 2a — SSH MCP key volume + SSH/ArgoCD/Sentry MCP 啟用
CD Pipeline / build-and-deploy (push) Failing after 7m58s
- 06-deployment-api.yaml: ssh-mcp-key volume 定義(optional: true, 0400)
- 04-configmap.yaml: SSH_MCP_ENABLED/KNOWN_HOSTS_FILE + ARGOCD_MCP_ENABLED + SENTRY_MCP_ENABLED
MCP Phase 1-4 全部實作完成,10 providers 全部已啟用(ArgoCD/Sentry/SSH 需人工 Secret)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:35:52 +08:00 |
|
OG T
|
5d78c5492b
|
feat(argocd-mcp): 啟用 ArgoCD MCP Provider + token 注入流程
CD Pipeline / build-and-deploy (push) Has been cancelled
- config.py: ARGOCD_URL → https://192.168.0.125:30443(實際 HTTPS NodePort)
- config.py: ARGOCD_MCP_ENABLED=True + SENTRY_MCP_ENABLED=True(預設啟用)
- cd.yaml: 新增 ARGOCD_API_TOKEN Gitea Secret → K8s Secret 注入步驟
- K8s: ARGOCD_API_TOKEN 已手動注入 awoooi-secrets + API pods 已 rollout restart
- ArgoCD: 已開啟 admin account apiKey capability
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:32:28 +08:00 |
|
OG T
|
f14ca4b117
|
docs(logbook): Session 4 結尾更新 — MCP Phase 3/4 全完成 + ADR-070 閉環
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:17:21 +08:00 |
|
OG T
|
7eb49f9c20
|
feat(mcp-phase4c): AI 動態規則生成 — 新 alertname 自動產 Playbook 草稿
CD Pipeline / build-and-deploy (push) Failing after 8m29s
_generate_playbook_draft_if_new():
- Playbook 無命中時非同步觸發(不阻塞決策主流程)
- 先用 semantic_search(threshold=0.92) 確認 KM 無同名 Playbook
- 呼叫 qwen2.5:7b-instruct (Ollama 188) 生成五段結構化草稿
(症狀/根因/診斷步驟/修復動作/驗收條件)
- 寫入 KnowledgeEntry(type=PLAYBOOK, status=DRAFT, source=AI_EXTRACTED)
- 寫入 AlertOperationLog PLAYBOOK_DRAFT_CREATED 事件
- 失敗靜默 debug log
完成 MCP Phase 4 全三項:
4a NemoClaw second opinion (信心 < 0.7)
4b K8s 狀態快照 k8s_state_after
4c AI 動態 Playbook 草稿生成
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:16:39 +08:00 |
|
OG T
|
0fa3b35a1c
|
feat(mcp-phase4b): 自動修復後抓 K8s Pod 狀態寫入 k8s_state_after
CD Pipeline / build-and-deploy (push) Failing after 24s
_push_auto_repair_result() 成功後:
- 呼叫 K8sProvider.kubectl_get(pods, label=app=<service>)
- 結果截斷 500 字寫入 incidents.k8s_state_after
- km_conversion_service._build_content() 已支援顯示此欄位
- 失敗靜默 debug log,不阻塞主流程
完成 KM 三段資料閉環: 症狀(labels) + 情境(metrics_before) + 動作(action) + 效果(metrics_after + k8s_state_after)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:15:31 +08:00 |
|
OG T
|
f3ee577f9d
|
feat(mcp-phase4a): NemoClaw second opinion — 信心 < 0.7 觸發 deepseek-r1:14b 複審
CD Pipeline / build-and-deploy (push) Has been cancelled
- _nemoclaw_second_opinion(): 呼叫 Ollama 188 deepseek-r1:14b 做獨立推理
- 解析 <think>...</think> CoT 格式,只取正文
- 30s timeout,失敗靜默降級
- 輸出截斷 300 字
- _dual_engine_analyze(): LLM 信心 < 0.7 時非同步觸發 second opinion
- 結果附加到 proposal_data["advisory_note"]
- _push_decision_to_telegram(): advisory_note 以 NemoClaw bot 身分追加訊息
- 格式: "NemoClaw 第二意見 (信心=0.xx)"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:14:54 +08:00 |
|
OG T
|
a2cc985f60
|
feat(mcp-phase3): ArgoCD MCP + Sentry MCP + 完整 Provider 註冊
CD Pipeline / build-and-deploy (push) Has been cancelled
ArgoCDProvider (3 工具):
- argocd_list_apps: 列出所有 App + sync/health 狀態
- argocd_get_app_status: 詳細狀態 + 問題資源清單
- argocd_get_sync_history: 最近 N 筆部署記錄
- 輸入驗證: app_name 白名單 regex
- 需 ARGOCD_API_TOKEN + ARGOCD_MCP_ENABLED=true
SentryProvider (3 工具):
- sentry_list_issues: 列出最近 Issues(狀態過濾)
- sentry_get_issue: 詳情 + stacktrace 最後 5 frames
- sentry_search_issues: PromQL 風格搜尋
- issue_id 白名單驗證(只允許純數字)
- 需 SENTRY_AUTH_TOKEN + SENTRY_MCP_ENABLED=true
providers/__init__.py: 補上 Prometheus + SSH + ArgoCD + Sentry 全部 10 個 providers
config.py: 新增 ARGOCD_URL / ARGOCD_API_TOKEN / ARGOCD_MCP_ENABLED / SENTRY_MCP_ENABLED
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 09:11:53 +08:00 |
|