Slm on Hi, I'm Muhammad Amal

Slm on Hi, I'm Muhammad Amal https://muhammadamal.my.id/tags/slm/ Recent content in Slm on Hi, I'm Muhammad Amal Hugo en-us Wed, 29 Jan 2025 09:00:00 +0700 Benchmarking SLMs for Your Use Case, From Lmeval to Custom Suites https://muhammadamal.my.id/blog/benchmarking-slms-for-your-use-case-lmeval-to-custom/ Wed, 29 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/benchmarking-slms-for-your-use-case-lmeval-to-custom/ Public leaderboards lie about your task. Build a benchmark that measures what your users actually need. Local RAG with SLMs, Private Knowledge Without the Cloud https://muhammadamal.my.id/blog/local-rag-with-slms-private-knowledge-without-cloud/ Mon, 27 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/local-rag-with-slms-private-knowledge-without-cloud/ End-to-end local RAG, no cloud. Embeddings, vectors, retrieval, and grounded generation on a single box. Structured Output and Function Calling on Local SLMs https://muhammadamal.my.id/blog/structured-output-and-function-calling-on-local-slms/ Wed, 22 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/structured-output-and-function-calling-on-local-slms/ Get production-grade JSON and tool calls out of 3B models. Constrained decoding, schemas, and what actually works. Fine Tuning SLMs with LoRA and QLoRA, A Hands On Tutorial https://muhammadamal.my.id/blog/fine-tuning-slms-with-lora-and-qlora-hands-on/ Mon, 20 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/fine-tuning-slms-with-lora-and-qlora-hands-on/ When prompting plateaus, LoRA and QLoRA take you the next mile. A real fine-tuning walkthrough on consumer GPUs. Serving SLMs at Scale with vLLM, A Production Guide https://muhammadamal.my.id/blog/serving-slms-at-scale-with-vllm-production-guide/ Wed, 15 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/serving-slms-at-scale-with-vllm-production-guide/ When Ollama and llama.cpp stop scaling, vLLM is what you reach for. PagedAttention, batching, and the real tradeoffs. llama.cpp Deep Dive, Quantization, GGUF, and Inference Speed https://muhammadamal.my.id/blog/llama-cpp-deep-dive-quantization-gguf-inference-speed/ Mon, 13 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/llama-cpp-deep-dive-quantization-gguf-inference-speed/ Where Ollama ends, llama.cpp begins. Quantization, GGUF, KV cache, and squeezing tokens per second. Running SLMs Locally with Ollama, A Step by Step Tutorial https://muhammadamal.my.id/blog/running-slms-locally-with-ollama-step-by-step/ Wed, 08 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/running-slms-locally-with-ollama-step-by-step/ Everything I do to ship a local SLM behind Ollama 0.5, from install to a real production endpoint. Small Language Models in January 2025, A Practical Survey https://muhammadamal.my.id/blog/slm-landscape-january-2025-practical-survey/ Mon, 06 Jan 2025 09:00:00 +0700 https://muhammadamal.my.id/blog/slm-landscape-january-2025-practical-survey/ Where the small language model landscape actually stands in January 2025, from a backend engineer’s bench.