Select your hardware to see which models run locally. No dedicated GPU? No problem—Ollie also works flawlessly with your favorite cloud APIs.
Select a GPU above to see which models you can run.
VRAM estimates are based on Q4_K_M quantization, which is the most common format for running LLMs locally via Ollama. Actual usage may vary depending on context length, system overhead, and concurrent applications. Apple Silicon uses unified memory — the full system RAM is available for model loading. Models marked "Tight fit" will run but may be slow with long conversations.
Connect Ollama, Gemini, OpenAI, and more — all from one sovereign, private AI suite.
Download Ollie