MLX
Apple's array framework for machine learning on Apple Silicon. Native Metal support, unified memory, first-class LLM inference.
MLX is Apple's machine learning framework built specifically for Apple Silicon. Unlike running llama.cpp through Metal, MLX is designed ground-up for the unified memory architecture of M-series chips, which means model weights and KV cache can be shared between CPU and GPU with no copy overhead. For local inference on a Mac, this delivers noticeably better tokens-per-second than the generic options at the same memory footprint. The ecosystem now includes mlx-lm for LLM inference with a simple Python API, mlx-vlm for vision-language models, and community-maintained quantized weights for most popular open-source LLMs. For anyone doing serious local work on a MacBook Pro or Mac Studio, MLX is the default inference layer in 2026.
Similar Tools
llama.cpp
C++ inference engine for LLMs. GGUF format, quantization, CPU and Metal/CUDA support. The foundation most local tools build on.
LM Studio
Desktop app for discovering, downloading, and running local LLMs. Clean chat UI, OpenAI-compatible API server, and automatic GPU detection. MLX engine optimized for Apple Silicon.
vLLM
High-throughput inference server for LLMs. PagedAttention memory management. The go-to for serious local or self-hosted serving.
Ollama
The easiest way to run LLMs locally. One command to pull and run any model. OpenAI-compatible API. 52M+ monthly downloads. Supports GGUF, Safetensors, and custom Modelfiles.
Get started with MLX
Apple's array framework for machine learning on Apple Silicon. Native Metal support, unified memory, first-class LLM inference.
Try MLXGet weekly tool reviews
Honest takes on AI dev tools, frameworks, and infrastructure - delivered to your inbox.
Subscribe FreeMore Local AI Tools
Ollama
The easiest way to run LLMs locally. One command to pull and run any model. OpenAI-compatible API. 52M+ monthly downloads. Supports GGUF, Safetensors, and custom Modelfiles.
LM Studio
Desktop app for discovering, downloading, and running local LLMs. Clean chat UI, OpenAI-compatible API server, and automatic GPU detection. MLX engine optimized for Apple Silicon.
Jan
Open-source ChatGPT alternative that runs 100% offline. Desktop app with local models, cloud API connections, custom assistants, and MCP integration. AGPLv3 licensed.
Related Guides
Run AI Models Locally with Ollama and LM Studio
Install Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedMCP Installation Scopes - Claude Code
Local, project, user, and plugin-level MCP configurations.
Claude CodeScheduled Tasks (Desktop) - Claude Code
GUI-based scheduling on your local machine for recurring work.
Claude CodeRelated Posts

Cerebras Stock Is a Public Test of AI Inference Demand
Google Trends put CBRS stock on the board after Cerebras' first public-company earnings. The developer takeaway is not a...

Apple's LanguageModel Protocol: Xcode 27 Just Made Model Lock-In Optional
Apple shipped a LanguageModel protocol at WWDC 2026 that lets iOS and macOS developers swap between Claude, Gemini, and...

DiffusionGemma: Google Bets Diffusion Can Make Text Generation 4x Faster
Google released DiffusionGemma today, a 26B MoE open model that generates entire 256-token blocks in parallel instead of...

KV Caching: A Practical Guide to Optimizing Transformer Inference
How KV caching speeds up LLM inference - the math, the code, the memory tradeoffs, and when it stops helping. Every dev...
