Aiops on Hi, I'm Muhammad Amal

Aiops on Hi, I'm Muhammad Amal https://muhammadamal.my.id/tags/aiops/ Recent content in Aiops on Hi, I'm Muhammad Amal Hugo en-us Fri, 23 May 2025 09:00:00 +0700 Postmortem Automation with LLMs, Drafts That Don't Lie https://muhammadamal.my.id/blog/postmortem-automation-with-llms-drafts-that-dont-lie/ Fri, 23 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/postmortem-automation-with-llms-drafts-that-dont-lie/ A draft-only postmortem pipeline that respects timestamps, refuses to invent causes, and produces a blameless template a human can finish in 30 minutes. Chaos Engineering with AI Augmented Hypotheses https://muhammadamal.my.id/blog/chaos-engineering-with-ai-augmented-hypotheses/ Wed, 21 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/chaos-engineering-with-ai-augmented-hypotheses/ AI-proposed chaos hypotheses, human-approved blast radii, and LitmusChaos execution on Kubernetes 1.32 with rollback on SLO breach. SLOs and Burn Rate Alerting in 2025, A Practical Guide https://muhammadamal.my.id/blog/slos-and-burn-rate-alerting-in-2025-a-practical-guide/ Mon, 19 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/slos-and-burn-rate-alerting-in-2025-a-practical-guide/ Practical SLO design, error budget math, and multi-window burn rate alerting rules ready to paste into Prometheus 3.0. Incident Response Automation with LangGraph, A Step by Step Tutorial https://muhammadamal.my.id/blog/incident-response-automation-with-langgraph-a-step-by-step-tutorial/ Fri, 16 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/incident-response-automation-with-langgraph-a-step-by-step-tutorial/ Treat incident response as a typed state machine in LangGraph 0.2, with deterministic transitions, audit logging, and bounded LLM use. Anomaly Detection on Prometheus Metrics, A Hands On Guide https://muhammadamal.my.id/blog/anomaly-detection-on-prometheus-metrics-a-hands-on-guide/ Wed, 14 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/anomaly-detection-on-prometheus-metrics-a-hands-on-guide/ A working senior SRE’s tour through metric anomaly detection, from cheap z-score rules to isolation forest sidecars on Prometheus 3.0. Building an SRE Copilot for On Call Engineers https://muhammadamal.my.id/blog/building-an-sre-copilot-for-on-call-engineers/ Mon, 12 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/building-an-sre-copilot-for-on-call-engineers/ A senior backend engineer’s design for an LLM-powered on-call assistant with tool use, context shaping, and a read-only blast radius. AI Driven Log Analysis at Scale, A Production Tutorial https://muhammadamal.my.id/blog/ai-driven-log-analysis-at-scale-a-production-tutorial/ Fri, 09 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/ai-driven-log-analysis-at-scale-a-production-tutorial/ A production pattern for AI log analysis using template mining, vector retrieval, and bounded LLM summarization on Loki 3.3. Auto Remediation Pipelines with LLM Agents and Argo Events https://muhammadamal.my.id/blog/auto-remediation-pipelines-with-llm-agents-and-argo-events/ Wed, 07 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/auto-remediation-pipelines-with-llm-agents-and-argo-events/ A practical walkthrough of LLM-proposed, deterministically-executed remediation using Argo Events and Argo Workflows on Kubernetes 1.32. AIOps in May 2025, What Actually Works in Production https://muhammadamal.my.id/blog/aiops-in-may-2025-what-actually-works-in-production/ Mon, 05 May 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/aiops-in-may-2025-what-actually-works-in-production/ Field notes on AIOps in production, what to adopt, what to defer, and where LLMs earn their keep on the platform team.