Your Name
|
6efd186750
|
docs(security): 建立高價值配置控管清冊 [skip ci]
|
2026-06-11 11:29:58 +08:00 |
|
Your Name
|
cfb866d055
|
feat(governance): add agent market automation surfaces
Ansible Lint / lint (push) Successful in 35s
CD Pipeline / tests (push) Failing after 13s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Failing after 11s
|
2026-06-04 21:50:55 +08:00 |
|
Your Name
|
21dcfbd991
|
fix(governance): collapse km slo fallback series
Code Review / ai-code-review (push) Successful in 11s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 22s
CD Pipeline / tests (push) Successful in 1m6s
CD Pipeline / build-and-deploy (push) Successful in 5m17s
CD Pipeline / post-deploy-checks (push) Successful in 1m38s
|
2026-05-14 19:37:15 +08:00 |
|
Your Name
|
d2a4a17969
|
fix(governance): stabilize adr100 km growth slo
Code Review / ai-code-review (push) Successful in 22s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 25s
CD Pipeline / tests (push) Successful in 1m11s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-14 19:33:52 +08:00 |
|
Your Name
|
cdb8bf6802
|
docs(governance): record adr100 slo emitter rollout
|
2026-05-14 19:22:39 +08:00 |
|
Your Name
|
13cf02b740
|
feat(governance): emit adr100 slo metrics
Code Review / ai-code-review (push) Successful in 10s
CD Pipeline / tests (push) Successful in 1m0s
CD Pipeline / build-and-deploy (push) Successful in 3m21s
CD Pipeline / post-deploy-checks (push) Successful in 1m16s
|
2026-05-14 18:57:03 +08:00 |
|
Your Name
|
9ef9633aff
|
fix(alerts): bypass proxy timeout for GCP Ollama
|
2026-05-06 08:55:14 +08:00 |
|
Your Name
|
ed7c6946cb
|
docs(awooop): define private Ollama mesh gateway
Code Review / ai-code-review (push) Successful in 10s
|
2026-05-05 22:56:22 +08:00 |
|
Your Name
|
13e51802fe
|
feat(awooop): Phase 0 全 ADR + Phase 1 control plane schema(含 critic 四項修正)
## Phase 0(文件層,全部 Accepted)
- ADR-106/107:AwoooP 平台架構 + 儲存策略
- ADR-111~118:Bootstrap → RLS 七項核心 ADR
- ADR-119~124:SAGA → Singleton Decomposition 六項 ADR
- ADR-UI-01~04:Operator Console 四個 UI ADR
## Phase 1(DB schema + migration)
- awooop_phase1_control_plane_2026-05-04.sql:7 張新表 + trigger + RLS
- Step 1:三角色(platform_admin/migration BYPASSRLS,awooop_app 受 RLS)
- Step 13:GRANT awooop_app 最小權限(7 條)
- Step 14:RLS fail-closed,移除 __platform__ 後門
- awooop_phase1_batch1_rls_2026-05-04.sql:高流量四表三步式 ADD COLUMN
- awooop_phase1_batch1_backfill.py:SKIP LOCKED 分批回填腳本
- awooop_models.py:7 個 SQLAlchemy 2.x models
## Critic 修正(4 Critical + 3 Major)
- C-1:ADD CONSTRAINT IF NOT EXISTS → DO 塊 + pg_constraint 查詢
- C-2:__mapper_args__ 字串 list → primary_key=True on mapped_column
- C-3:__platform__ RLS 後門 → 全移除,改用 BYPASSRLS role
- C-4:awooop_app role 從未建立 → Step 1 + 7 條 GRANT
- M-1:active_pointer_guard SECURITY DEFINER(FORCE RLS 跨租戶保護)
- M-2:pg_partman create_parent 加冪等防護
- M-3:immutability trigger 新增身份欄位保護(project_id/family/contract_id)
## Task 1.2 修補
- agent_loader.py:硬編碼 Mac 路徑 → AGENTS_DIR 環境變數
- Dockerfile:補 COPY .claude/agents/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-04 13:37:11 +08:00 |
|
Your Name
|
b1ef05fa8c
|
feat(ollama): ADR-110 GCP 三層容災架構(GCP-A → GCP-B → Local → Gemini)
Code Review / ai-code-review (push) Successful in 50s
CD Pipeline / tests (push) Failing after 1m14s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
## 變更摘要
- Primary: http://34.143.170.20:11434 (GCP-A SSD, 9x 載速 + 2x 推理)
- Secondary: http://34.21.145.224:11434 (GCP-B SSD)
- Fallback: http://192.168.0.111:11434 (M1 Pro Local HDD,最後防線)
- 廢止 ADR-105「111 唯一鐵律」,新建 ADR-110
## 核心改動
- config.py: 新增 OLLAMA_SECONDARY_URL;validator 加 GCP IP 白名單(34.143.170.20, 34.21.145.224)
- ollama_failover_manager.py: 三層 Ollama 決策矩陣;並行健康檢查三台;health_111 → health_gcp_a
- ollama_health_monitor.py: host label 萃取改為通用版(支援 GCP 公網 IP)
- failover_alerter.py: 故障/恢復主機動態顯示,不再硬編碼「Ollama 111 (GPU)」
- ollama_auto_recovery.py: notify_recovery 改為 ollama_gcp_a;recovered_host 動態
- k8s/awoooi-prod: configmap + deployment + network-policy 同步更新(egress 加 GCP /32)
- 服務層: 10 個服務檔案硬編碼 192.168.0.111 改為讀 settings.OLLAMA_URL
- 測試: URL 常數更新,新增三層容災場景,GCP IP 白名單驗證測試
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-03 22:49:23 +08:00 |
|
Your Name
|
0f009d9459
|
docs(adr): ADR-109 telegram_gateway unified dedup layer (P0 #1 design doc)
P0 #1 (徹底長期修系列) — 33 個 send_xxx 方法各自寫 dedup 改為統一在
`_send_request()` 一層處理,未來新增 send_xxx 方法傳兩個 kwargs
(dedup_scope + dedup_fingerprint) 即自動繼承 dedup,不再有「漏修一條鏈
就轟炸統帥」的設計缺陷。
當前是 Proposed 狀態,等首席架構師審。Tier 2 橙區。
包含:
- 33 個 send_xxx 的 dedup_scope mapping
- 5-6 小時 / 3 commits 漸進式重構計畫
- 與 ADR-108 (incident_id fingerprint) 的協同關係
兩個 ADR 都是「徹底長期修」系列的 design 階段,等統帥批准執行。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-03 01:54:19 +08:00 |
|
Your Name
|
62698158b0
|
docs(adr): ADR-108 incident_id fingerprint derivation (P1 design doc)
P1 (徹底長期修系列) — 治本所有 dedup 問題:把 incident_id 從 uuid4()[:6]
隨機改為 fingerprint hash 派生,open 期間同 fingerprint 強制復用同一 INC。
當前是 Proposed 狀態,等首席架構師審。Tier 3 紅區改動,不批不動 code。
包含:
- 影響面盤點(1435 引用點,預計實際需改 ~10 檔 ~20 處)
- 4 phase 漸進式遷移(~7 小時)
- 跨日 reuse 行為決策
- 5 條風險與緩解
- 5 條驗收標準
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-03 01:53:09 +08:00 |
|
Your Name
|
7795f027d2
|
fix(aiops): persist emergency intervention traces
CD Pipeline / tests (push) Successful in 2m56s
Code Review / ai-code-review (push) Failing after 39s
CD Pipeline / build-and-deploy (push) Successful in 12m54s
CD Pipeline / post-deploy-checks (push) Successful in 4m40s
|
2026-05-01 20:34:33 +08:00 |
|
Your Name
|
8e49f2ea88
|
fix(ci): preserve ssh mcp known hosts [skip ci]
|
2026-05-01 17:18:32 +08:00 |
|
Your Name
|
433f7b068e
|
fix(aiops): close ssh and telegram remediation gaps
CD Pipeline / tests (push) Successful in 2m7s
Code Review / ai-code-review (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Successful in 13m14s
CD Pipeline / post-deploy-checks (push) Successful in 4m29s
|
2026-05-01 16:53:02 +08:00 |
|
Your Name
|
b0da6da1e9
|
feat(aiops): structure agent loop shadow output
CD Pipeline / tests (push) Successful in 2m50s
Code Review / ai-code-review (push) Successful in 33s
CD Pipeline / build-and-deploy (push) Failing after 25m48s
CD Pipeline / post-deploy-checks (push) Has been cancelled
|
2026-05-01 15:09:57 +08:00 |
|
Your Name
|
f8e44971c1
|
feat(aiops): enable read-only agent loop canary
CD Pipeline / tests (push) Successful in 1m43s
Code Review / ai-code-review (push) Successful in 31s
CD Pipeline / build-and-deploy (push) Successful in 10m22s
CD Pipeline / post-deploy-checks (push) Successful in 4m3s
|
2026-05-01 14:20:16 +08:00 |
|
Your Name
|
7e4d995e4b
|
feat(aiops): add mcp agent loop foundation
CD Pipeline / tests (push) Successful in 1m59s
Code Review / ai-code-review (push) Successful in 28s
run-migration / migrate (push) Failing after 24s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 13:21:19 +08:00 |
|
Your Name
|
9db87f177e
|
fix(aiops): suppress repeated llm alert loops
CD Pipeline / tests (push) Successful in 1m37s
Code Review / ai-code-review (push) Successful in 28s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 13:02:07 +08:00 |
|
Your Name
|
11673d80ea
|
fix(aiops): route backup decisions through ssh
CD Pipeline / tests (push) Successful in 1m35s
Code Review / ai-code-review (push) Successful in 34s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-05-01 12:50:01 +08:00 |
|
Your Name
|
95110971f3
|
fix(telegram): close remaining DM alert routes
CD Pipeline / tests (push) Successful in 1m27s
Code Review / ai-code-review (push) Successful in 29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
|
2026-04-30 23:02:17 +08:00 |
|
Your Name
|
61f5a6a419
|
fix(telegram): route alerts to SRE war room
CD Pipeline / tests (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
Code Review / ai-code-review (push) Has been cancelled
|
2026-04-30 15:01:23 +08:00 |
|
Your Name
|
f5f41543c9
|
docs: ADR-105 推翻 A2 + LOGBOOK 2026-04-29 LLM 飛輪復活戰
ADR-105 完整記錄推翻 A2 鐵律的決策:
- Context: A2 歷史背景 + 2 個月後事實基礎變化(GPU + qwen2.5:7b)
- Decision: 4 處修改(IntentType.DIAGNOSE override / chain / openclaw.py task_type / 6 regression test)
- Consequences: 正面(飛輪復活)+ 負面(Ollama 單點)+ 已知債(ADR-106-109 後續)
- Validation: 部署前 1635 tests 全綠,部署後 5 項驗證指標
- Rollback: env 切換 / git revert
LOGBOOK 加 2026-04-29 條目:
- 真根因:4 provider 全死 + A2 鐵律排除 Ollama
- CD 連環血淚:5 個 commit 全 failure(setup_test_schema.sql 缺欄)
- 已落地(不依賴 CD):Prometheus 17 條 rule + Gemini sanitize
- Memory 索引同步更新(指向 project_revert_a2_ollama_primary.md)
注意:docs/ 不在 cd.yaml paths trigger,此 commit 不影響 CD。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-29 20:59:53 +08:00 |
|
Your Name
|
025a493f06
|
feat(p3.2+adr-100): Model Version Tracker + SLO 自治 + KB rot cleaner
run-migration / migrate (push) Failing after 12s
CD Pipeline / build-and-deploy (push) Has been cancelled
Wave 8 P3.2 模型版本追蹤 + ADR-100 SLO 自我治理 + 配套:
P3.2 — Model Version Tracking:
- model_version_probe.py (268 行) — 探測 Ollama / OpenRouter 等 provider 的 model version
- model_version_tracker.py (101 行) — 對齊 PG provider_version_history 表
- migrations/p3_2_provider_version_history.sql + rollback — 25 行 schema
- db/models.py +32 行 — ProviderVersionHistory ORM
ADR-100 — AI 自主化 SLO:
- docs/adr/ADR-100-ai-autonomous-slo.md (167 行) — 飛輪 SLO 設計與閾值
- ops/monitoring/slo-rules.yml (254 行) — Prometheus SLO recording rules + alerts
- ops/monitoring/tests/test_slo_rules.yaml (242 行) — promtool unit tests
整合修改:
- main.py +72 行 — Lifespan 啟動 model_version_probe + KB rot cleaner schedule
- gitea_webhook.py +45 行 — webhook 接收 model 版本變化通知
- ci_auto_repair.py / evidence_snapshot.py / pre_decision_investigator.py — 配合接線
新測試:
- test_kb_rot_cleaner_schedule.py (120 行) — 9 tests pass
- test_slo_rules.yaml — promtool 驗收
Tests: 9 passed (test_kb_rot_cleaner_schedule)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Multiple Engineers (P3.2 + ADR-100) <noreply@anthropic.com>
|
2026-04-27 14:54:19 +08:00 |
|
Your Name
|
86ee013cdf
|
feat(hermes-complete): Hermes NL 三項補強 + ConsensusEngine + ADR 收尾
CD Pipeline / build-and-deploy (push) Successful in 9m32s
## Hermes NL 補強(nl_gateway.py)
- T1 hermes_dispatch_log DB 寫入(asyncio.create_task 非阻擋)
- T2 Redis 速率限制:per-chat_id 20 req/min,fail-open
- T3 Multi-turn session:hermes:session:{chat_id}:{user_id} TTL=300s,最近 3 輪
## ConsensusEngine(ADR-095 宣告式設計)
- consensus_engine.py: CONSENSUS_WEIGHTS class 屬性
security=0.4 鎖定,9 個 Claude Code agent 分配 0.6
- config.py: ENABLE_12AGENT_CONSENSUS=False feature flag
## ADR 狀態
- ADR-093/094/095: Proposed → 🟡 批准實作中
- 各 ADR 加 v1.1 變更紀錄
## K8s ConfigMap
- prod 04-configmap.yaml: 加 3 個 feature flags(均 false)
- dev 02-configmap.yaml: 同步加入
## LOGBOOK
- 記錄 WS0–WS6 + 補強完成,feature flags 啟用指引
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:22:40 +08:00 |
|
Your Name
|
5675e7c3b0
|
fix(phase2+aiops): Phase 2 Agent timeout + AI Router intent hint + signoz incident_id
## Phase 2 Agent timeout(防止單步 LLM 拖垮整場辯證)
- critic_agent.py: asyncio.wait_for + PHASE2_STEP_TIMEOUT_SEC=20s
- diagnostician_agent.py: 同等超時保護
- solver_agent.py: 同等超時保護
## AI Router 優化
- ai_router.py: _resolve_intent_from_context()
Phase 2 agents 傳 intent_hint → Router 快路徑,不重跑 intent LLM
## SignOz Webhook 修復
- signoz_webhook.py: incident_id 補傳 send_approval_card()(移除 TODO 2026-04-05)
## Alert 處理流程修復
- webhooks.py: _should_bypass_alertmanager_llm()
Host 類 NO_ACTION 告警直接走人工排查卡片,不再誤觸 LLM Agent Debate
- incident_repository.py: update_incident_status 加 resolved_at 參數
- incident_service.py / proposal_service.py / incident_approval_service.py: 小修
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:10:06 +08:00 |
|
Your Name
|
054d0ae422
|
docs(ws0): Hermes × 12-Agent Telegram 整合治理文件(ADR-093/094/095)
## 新增
- ADR-093: Telegram 告警全面遷移至 SRE 戰情室群組
- 混合策略 allowlist 模式(TYPE-3/4/4D/8M → 群組 + user_id binding)
- nonce 新格式 apr:{short_id}:{action}:{user_id_hash} + Redis 後端映射
- Feature flag TG_GROUP_CUTOVER 灰階切流
- ADR-094: Hermes 自然語言介面(@mention 對話)
- Option C:單 bot + Claude Agent SDK 虛擬分派
- Webhook secret_token + allowed_updates = [message, callback_query, chat_member]
- Prompt Injection 防護:query/describe/summarize only,mutate 走 ApprovalRecord
- Redis session TTL=300s + turn>=5 壓縮
- ADR-095: 12-Agent Claude SDK 整合 × Telegram 視覺分派
- 12 位 agent 完整 emoji/hashtag/handle 表格
- ConsensusEngine weights 擴充(security=0.4 鎖定)
- display_names.py 命名隔離(.claude/agents/ vs src/agents/)
## 更新
- ADR-009: 加 v0.3 變更紀錄指向 ADR-095
- ADR-075: 加更新引用表(ADR-093 D4 allowlist 子條款、ADR-094/095)
- docs/design/hermes-telegram-flows/hermes-flows.html: F1-F7 完整流程圖
## Pre-Flight 確認
- approval_records 表尚不存在 → 將用 BIGINT 全新建立
- docker-compose.yml:78 明碼 token 🔴 P0 待 WS1 修復
- awoooi_migrator 角色尚未建立 → WS2 建立
- claude-agent-sdk 升至 0.1.66(最新)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-25 02:10:06 +08:00 |
|
Your Name
|
e2742ce9f3
|
docs: BUTTON_DATA_INVALID 根治 + Gitea Code Review 修復 記錄
LOGBOOK + ADR-092 附錄 C — 2026-04-21 修復紀錄
E2E 驗證: telegram_approval_card_sent message_id=25045 (SignOzDown) ✓
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-21 21:59:00 +08:00 |
|
Your Name
|
994817a23a
|
docs: ADR-092 附錄 A+B + LOGBOOK + MASTER §8 記錄四修與 C1-C4 全流程串接
- ADR-092: 附錄 A(B1-B4 四修 root cause + commit)+ 附錄 B(C1-C4 斷點修復表 + 架構鐵律)
- LOGBOOK: 新增 2026-04-20 晚 C1-C4 章節(斷點清單 + commits + 驗收步驟)
- MASTER §8: 追加 C1-C4 changelog(§3/§1.1 對齊 + 修復後行為說明)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-20 20:24:41 +08:00 |
|
Your Name
|
712d146129
|
docs(adr+skills): ADR-092 AI Decision LLM 層 + Skill 03 更新統一 LLM pattern
首席架構師 2026-04-19 Review 92/100 Grade A 後的完整文檔化:
**ADR-092 新建 (AI Decision LLM 擴展架構)**:
- 背景: 14 scanner 中 8 個純 threshold,違反 feedback_ai_autonomous_direction
- 決策: 4 個 LLM service + 統一 pattern (llm_json_parser)
- 約束 5 鐵律: 失敗不 raise / AI 只建議不動作 / openclaw 統一入口 /
aol 留痕 / 繁中 + JSON schema
- 節流: Daily cron + 事件觸發 (red_ratio>30% 且 scanned>=50)
- autonomy_score 0-100 量化追蹤
- 實作成果 + P1 剩餘 + 回滾計畫
**Skill 03 openclaw-cognitive-expert 更新**:
- 新增「2026-04-19 AI Decision LLM 擴展層」章節
- Pattern code 範本 (不是每次重寫 3-path parse)
- 4 LLM service 對照表 + required_key
- 擴加 5 鐵律清單
- autonomy_score 追蹤使用說明
下 session Claude 接手時能快速看到 LLM service pattern,不會重複造輪子.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 22:42:58 +08:00 |
|
OG T
|
2524aa983a
|
docs(adr): ADR-091 Telegram 子系統 Round 3 全景審計正式文件
- 11 按鈕 × handler 覆蓋矩陣定版
- 三缺一鐵律(callback格式+handler+能力)升級 ADR 層級
- callback_data 雙格式(nonce vs INFO)正式認定
- Long Polling by design 確認
- approval 三戳鐵律(editMarkup + editText + DB message_id)
- NO_ACTION 不誤標 FAILED 救 MASTER §7.1 #11
對應 commits 877c847 → 4b8be32,git tag v7.3.0
Memory: project_phase7_round3_telegram_subsystem.md
|
2026-04-19 01:32:52 +08:00 |
|
OG T
|
5ae82d1d1f
|
feat(db): ADR-090 L4 AIOps 地基 — 資產盤點 × 7 項自動化覆蓋矩陣永久化 DB
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-04-18 下午(台北時區)—— ogt + Claude Opus 4.7 (1M)
MoWoooWorkDown 假警報 RCA 暴露三重結構性失守:
- 110/188 主機 load 18/16 × 13 天 / cadvisor 288% / K3s 120/121 無監控
- Prometheus 僅 35 targets / 58 rules(覆蓋不到三成)
- HostHighCpuLoad 量錯維度(CPU idle vs load_avg)
統帥戰略指令:
- 全景資產 × 七大自動化 × 永久化 DB
- AI 四分工(OpenClaw × NemoTron × Hermes × Claude LLM)
- 所有自動化操作歷程必進 DB,不靠 MD(MD 會漂移)
本 commit 交付:
1. SQL migration (apps/api/migrations/adr090_asset_inventory_foundation.sql)
- 11 張表 + 33 indexes + 20 CHECK + 3 UNIQUE + 16 FK
- pgcrypto extension dependency
- 完整 idempotent(CREATE IF NOT EXISTS + single transaction)
- 已 apply 進 awoooi_prod(188 PG),驗證通過
2. ADR-090 (docs/adr/ADR-090-monitoring-blindspot-governance.md)
- 決策紀錄 + 7 引擎對映 + 4 替代方案否決
3. 主戰略文件 (docs/superpowers/specs/2026-04-18-blindspot-governance-capacity-l4.md)
- §0-§14: 背景 / 根因 / Schema DDL / 4 層防禦 / 7 Phase 實施 /
HARD_RULES / AI 分工矩陣 / 驗收指標 / 技術債 / 回滾 / 接手協議
4. MASTER §8 Living Changelog 追加 Phase 7 啟動條目
11 張表:
asset_inventory / asset_discovery_run / asset_coverage_snapshot /
asset_relationship / alert_rule_catalog / asset_change_event /
asset_compliance_snapshot / host_capacity_snapshot /
capacity_violation_event / automation_operation_log /
ai_collaboration_trace
首筆 bootstrap 記錄已 seed 進 asset_discovery_run
(run_id=6760c5bf-57e5-4a40-b82d-31b794464652)
相關 Memory (未 commit,存於 ~/.claude/...):
- project_blindspot_governance.md (跨 session 指針)
- feedback_monitor_self_monitoring.md (監控工具必須被監控)
- feedback_secrets_leak_incidents_2026-04-18.md (憑證外洩三防線)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-18 13:18:46 +08:00 |
|
OG T
|
ba8cf6105d
|
docs(adr): ADR-086 Telegram UI 清洗規範 + ADR-087 AutoApprove kubectl 閘門
ADR-086: Telegram 通知卡片 UI 清洗規範
- _parse_debate_summary() 設計決定與各 TYPE 欄位清洗規則
- TYPE-3 鍵盤重構:批准/拒絕永遠第一行
- 技術債:_parse_debate_summary 提升模組層級(P1-1)
ADR-087: AutoApprove 安全強化 — kubectl 強制執行閘門
- 條件 1d 設計:_raw_action 語意 + NO_EXECUTABLE_ACTION reason
- Solver Nemo 格式 kubectl 驗證
- 降級指令改為真實 kubectl 唯讀調查
- min_trust_score=0 保留理由記錄(TrustEngine 記憶體持久化技術債)
- P0-2 風險記錄:kubectl exec 未加入 _DESTRUCTIVE_PATTERNS
2026-04-17 ogt + Claude Sonnet 4.6(亞太): Session 技術債清理
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-17 15:25:34 +08:00 |
|
OG T
|
7da64eaad2
|
feat(Phase 3): 學習閉環重建 — 三根因修復 + 2x EWMA + Evolver Agent
CD Pipeline / build-and-deploy (push) Failing after 19m7s
Type Sync Check / check-type-sync (push) Failing after 1m18s
ADR-083 Phase 3 學習閉環重建:
**三根因修復**
- approval_execution.py: fire-and-forget create_task → await asyncio.wait_for(timeout=30) × 2
(成功路徑 L265 + 失敗路徑 L353,超時記錄 learning_trigger_timeout metric,主流程不 crash)
- models/approval.py: ApprovalRequestBase 新增 matched_playbook_id 欄位
- decision_manager.py: _auto_execute 建立 ApprovalRequest 時填充 matched_playbook_id
- learning_service.py: 雙路徑查找 _matched_pb_id(matched_playbook_id + metadata fallback)
**2x EWMA 負向強化**
- models/playbook.py: 新增 trust_score: float = 0.3(EWMA 動態信任度欄位)
- repositories/playbook_repository.py: update_stats 加 EWMA
成功: trust = 0.9 × old + 0.1 × 1.0
失敗: trust = 0.8 × old + 0.2 × 0.0(衰減速度 2x)
trust < 0.1 → log warning,等 Evolver 封存
**Evolver Agent(新建)**
- services/playbook_evolver.py: 三功能全靜態規則
1. 低信任封存: trust < 0.1 → DEPRECATED
2. 休眠封存: 30d 未使用 AND trust < 0.5 → DEPRECATED
3. 相似合併: 症狀 Jaccard > 0.9 → 保留高 trust,封存低 trust
AIOPS_P3_EVOLVER_ENABLED=False 預設關閉
**文件**
- ADR-083 學習閉環重建
- MASTER §8 Phase 3 完工記錄
AIOPS_P3_ENABLED=False(預設),骨架就位等統帥批准開啟
Co-Authored-By: Claude Sonnet 4.6(亞太)<noreply@anthropic.com>
|
2026-04-15 14:01:37 +08:00 |
|
OG T
|
5ddba6d6e0
|
feat(adr-082): Phase 2 多 Agent 協作 — 5 角色辯證系統骨架上線
新增 5 個 Agent + Orchestrator + DecisionManager 接線:
- protocol.py: DiagnosisReport / ActionPlan / ReviewVerdict / CriticReport / DecisionPackage 型別系統
- DiagnosticianAgent: RCA 根因分析,confidence < 0.4 → ABSTAIN
- SolverAgent: 修復方案軍師,blast_radius 評分 + 降級 rule-based mock
- ReviewerAgent: 安全審查,HARD_RULES 靜態 pattern + blast_radius 閾值 (>50 revision, >80 reject)
- CriticAgent: 刻意唱反調,強制 3 問批判性思維,critical challenge → REJECT
- CoordinatorAgent: 純規則聚合,6 級決策閘,REQUEST_REVISION → 強制人工
- AgentOrchestrator: 30s 全局超時,Reviewer ‖ Critic 並行,DB Immutable Event Sourcing + Redis Streams
- DecisionManager: AIOPS_P2_ENABLED gate + _package_to_proposal_data 橋接既有 proposal_data 格式
- AgentSession DB table + 4 個複合 index
- ADR-082 決策記錄
Gate 2 修復(7 項):
- CRITICAL: DELETE FROM regex lookahead 位置錯誤(移至 FROM 後)
- CRITICAL: REQUEST_REVISION 可抵達 auto-execute 路徑(改回 requires_human_approval=True)
- IMPORTANT: _extract_json flat regex 不支援巢狀 JSON(改 find/rfind 邊界提取)
- IMPORTANT: all_degraded 遺漏 verdict.degraded(補全 4 個 Agent)
- IMPORTANT: Solver ABSTAIN guard 放行降級假設(改為無論 hypotheses 有無均跳過)
- IMPORTANT: dataclasses.asdict() Enum 未序列化導致 DB 寫入靜默失敗(加 json.dumps default handler)
- IMPORTANT: P2 gate 直讀屬性繞過父 Phase 守衛(改用 is_phase_enabled(2))
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-15 13:48:55 +08:00 |
|
OG T
|
db9e304a14
|
feat(adr-080): Phase 0 防護欄建立 — AI 自主化飛輪啟動
- docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md
(1456 行,§0-§8 全填完:42-cell 戰術矩陣、7 Phase 計畫、7 ADR 摘要、
15 KPI、21 Feature Flags、10 風險場景)
- docs/adr/ADR-080-ai-autonomy-flywheel-overview.md
(7 Phase 結構 + 4 北極星 + 7 架構師 Review Gates + Phase 退出條件)
- apps/api/src/core/feature_flags.py
(AIOpsFeatureFlags: P1~P6 總開關全 False + 15 細粒度子開關
is_phase_enabled() / is_sub_flag_enabled() + bool cast 安全)
- apps/api/src/jobs/__init__.py + baseline_snapshot.py
(Phase 0 基線快照 Job:MCP calls / Playbook confidence / general 比例
/ learning loop rate / auto_repair — 寫入 aiops:baseline:latest)
- apps/api/tests/test_feature_flags.py (21 tests — 全綠)
- docs/HARD_RULES.md → v1.9
(新增 Phase 退出條件鐵律:禁止未過 exit conditions 宣告 Phase 完成)
- CLAUDE.md 防失憶閘門 1:強制讀 MASTER §0 Session Resume Protocol
Gate 0 Pass — 21/21 tests green
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-15 12:44:53 +08:00 |
|
OG T
|
e3d7c92100
|
docs(Phase 5): ADR-079 狀態 Completed + LOGBOOK 午夜收官
- ADR-079 Sprint 5.0-5.4 全數完成,狀態改 Completed
- LOGBOOK 新增午夜條目記錄 Phase 5 落地
本日 26 commits 總覽:
cc42aa0 aae7c12 43c9689 dedd7c2 dd0a778 0f48a50 b8b124c 8de807c
f54dea4 6cac507 10b74af aa4e575 8b7e9cb 914c7e7 ca862c5 10e3043
72dd0c5 3f8d087 2a37d1c 094aa95 2e2f5a1 36754a8 581b244 208c28e
de8bbd8 a92562d
涵蓋:
- GAP-A1/A2/A3/A4 (4 個 gap + Phase 2)
- GAP-B1/B4 (timeout fix)
- GAP-C1/C2/C3 (BP-1 + retry + SSH KM)
- GAP-D1/D5 (信任度 + 日報 + Postmortem)
- Phase 5 全 Sprint (分類按鈕完整化)
- 4 BLOCKER 修復 + Bug A 診斷 + Bug B 真修
- 下架死按鈕 + 重啟新按鈕(從 registry 動態產生)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-15 10:46:40 +08:00 |
|
OG T
|
10e3043ce8
|
fix(UX): 下架 28 個鬼魂分類按鈕 + ADR-079 Phase 5 補完計畫
CD Pipeline / build-and-deploy (push) Has been cancelled
統帥 2026-04-14 20:00 完整 audit 揭露:
_CATEGORY_BUTTONS 28 個按鈕全死 3 天(從 2026-04-11 commit 325b3851)
- callback_data 格式全錯(3-part 不符 parser 4-part/2-part)
- grep apps/api/src 無任何 dispatch handler
- 統帥今天真踩到:點「查程序」沒反應 → 信任破壞
首席架構師裁示 (C 分級):
A. 立刻下架(本 commit):_CATEGORY_BUTTONS = {} fallback 通用按鈕
B. Phase 5 完整化(ADR-079 規劃,3-5 天,另 Sprint 實作)
保留通用按鈕(全 ✅):
- 批准 / 拒絕 / 靜默(4-part nonce)
- 詳情 / 歷史 / 重診(2-part info)
新增防禦性文件:
- ADR-079 — Phase 5 工作分解 + 每按鈕 checklist
- feedback_no_ghost_buttons.md(memory)— 鬼魂按鈕鐵律
設計原則永久入檔: 寧可沒按鈕,不可有死按鈕
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-14 20:19:25 +08:00 |
|
OG T
|
aa4e5757a2
|
fix: 技術債清理 — report_generation 重試機制 + GAP-A4 文件化
CD Pipeline / build-and-deploy (push) Successful in 15m46s
技術債 #1: postmortem 發送失敗靜默吞掉
- 3 次指數退避重試 (2s → 4s → 6s)
- 全失敗後送簡化降級通知到 SRE 群組
- 防止事後檢討默默消失
技術債 #2 (QueryBuilder 抽象): DEFER
- 全專案僅 1 處用 outcome JSON path query
- 違反「Don't design for hypothetical future requirements」
- 待第二 caller 出現再抽
技術債 #3 (E2E 測試): 已涵蓋
- test_gap_a4_placeholder_resolution.py TestMatchRuleRejection
- Mission C prod 鏈路實測(KubePodCrashLooping)
- Playwright K8s/Telegram staging 留待 staging 環境就緒
新增文件:
- ADR-078-gap-a4-placeholder-resolution.md
- LOGBOOK 2026-04-14 深夜收官條目
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-14 18:46:25 +08:00 |
|
OG T
|
6cac5071e4
|
docs: MASTER 藍圖結案報告 + ADR-077 + LOGBOOK 收尾
本日 Session 終極收案(9 commits, 11/11 Task, 52 新測試):
- docs/reports/2026-04-14-MASTER-BLUEPRINT-CLOSURE.md — 完整結案報告
- docs/adr/ADR-077-master-blueprint-completion.md — 架構審查 + 決議紀錄
- docs/LOGBOOK.md — 新增深夜收官條目
審查裁定: CONDITIONAL PASS
通訊渠道: 全走 Telegram,SMTP 不需要
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2026-04-14 18:36:59 +08:00 |
|
OG T
|
079d0e89b9
|
docs(adr-075): 加入實作記錄 + LOGBOOK 更新(Phase 1+2+CR 全完成)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 18:44:57 +08:00 |
|
OG T
|
d32d494320
|
docs: 四階段細化實施步驟 + 架構轉型截圖定案 + 防偏差守則
規格書 v2.0 新增:
- §十一 四階段細化實施步驟(階段1~4各含驗收清單)
- 階段1: CD解鎖+debounce+alertname+冷啟動Playbook+KM向量化(9步)
- 階段2: DB Migration+classify_alert_early+outcome寫入(5步)
- 階段3: 分診站+SSH路由+TYPE-1/E/F+action解析+risk_level(Tier3,7步)
- 階段4: KMConversionService+手動修復記錄(4步)
- §十二 防偏差守則(不跳步驟/Tier3授權/不改範圍/異常立刻報告)
ADR-073 更新:架構轉型截圖定案(舊架構中斷→新架構分診飛輪)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 13:30:37 +08:00 |
|
OG T
|
f2b427d87c
|
docs(adr): ADR-073 補充 ADR-071 整合工作序 + ADR-074 監控補全 Sprint
新增:
- Sprint ADR-073-B 補充:DB Migration + 檢傷分類站 + KMConversionService(ADR-071-A/A0/B/C/G/H)
- Sprint ADR-074:飛輪健康度Exporter + 主機間網路 + DNS + Gitea CD + 備份還原測試等9項監控缺口
- 參考指向完整規格書 2026-04-12-aiops-complete-flywheel-repair-design.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 13:21:27 +08:00 |
|
OG T
|
c7677750b5
|
docs(adr-070): 補全 c439277 全自動化三大修復 + Tier 3 CR 修補記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-12 00:09:18 +08:00 |
|
OG T
|
99cc420429
|
docs(review): 首席架構師 Code Review 後 — ADR-064/067 + Skill 02 補全記錄
ADR-064: 補 I1 整合記錄(get_incident_type 三層降級、rule.id ≠ incident_type 設計決策)
ADR-067: 補 D1 集中化完成記錄(9 purpose keys 對應表)
Skill 02: 補 get_incident_type 使用規範 + Ollama D1 模型中央化禁令
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 21:35:25 +08:00 |
|
OG T
|
82e1c05df8
|
fix(review): Code Review C1/C2/I2/M2 修補
CD Pipeline / build-and-deploy (push) Has been cancelled
C1 drift_interpreter: 寫死 192.168.0.111 → settings.OLLAMA_URL
違反 feedback_frontend_internal_ip_ban 鐵律(後端 service 層同樣禁止寫死內網 IP)
C2 km_conversion_service: BUG-004 補同步 Redis Working Memory vectorized 欄位
原修復只更新 DB,Redis incident:{id} JSON 的 vectorized 未同步
→ 審計查 Redis 仍顯示 False,fly-wheel 閉環指標仍不準
修復:DB 更新後 GET → JSON patch vectorized=True → SET(保留原 TTL)
I2 decision_manager: _ALERTNAME_KEYWORDS HostHighDiskUsage→HostOutOfDiskSpace
+ 補 DockerContainerExited
+ fallback 路徑加 debug log
M2 decision_manager: import json as _json 從 for 迴圈移至方法頂部
docs: ADR-072 新增 Code Review 發現與技術債記錄
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:36:59 +08:00 |
|
OG T
|
9382814d14
|
docs(adr-072): 全部完成 BUG-001~008
ADR-072 狀態更新為「全部修復完成」
BUG-007 確認不需修(alerts-unified.yml 全 42 規則均有 severity)
BUG-008 已修復(f34fe19)
LOGBOOK 新增 P2 完成條目 + 下一步說明
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:30:29 +08:00 |
|
OG T
|
85c71bf73c
|
docs(adr-072): 更新 Bug 修復狀態 + LOGBOOK
ADR-072: BUG-001~006 標記已修復 (P0 commit 88e3197, P1 commit 5aa0244)
LOGBOOK: 新增 ADR-072 P0+P1 全修復條目
P2 待修: BUG-007 severity labels + BUG-008 alertname_to_type
2026-04-11 Claude Sonnet 4.6 Asia/Taipei
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:26:27 +08:00 |
|
OG T
|
09982fdfaa
|
docs(session6): Telegram 全面審計 + ADR-072 Bug 清單 + 規格整合
- LOGBOOK: Session 6 Redis DB10 審計結果(8個系統性問題,P0-P2分級)
- ADR-072: AIOps 閉環 Bug 修復清單(drift_interpreter/deployment_name/KM vectorization等)
- 規格文件 v2.2: 確認 Sprint A/B/C + MCP 1-4 + ADR-071 全部完成,標記下一步為 ADR-072
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:04:50 +08:00 |
|
OG T
|
a1432c03ed
|
docs: ADR-070/071 + ssh-mcp-setup runbook + Skill-04 v2.7
- ADR-070: 全自動 AIOps 閉環 MCP Phase 1-4 決策文件
- ADR-071: 告警通知四類型 + KM 三段資料閉環決策文件
- docs/runbooks/ssh-mcp-setup.md: SSH MCP 建立/驗證/輪換 SOP
- Skill-04: v2.7 新增 Sprint C DR + ADR-070 MCP 10 providers 完整記錄
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-04-11 20:04:47 +08:00 |
|