
Raziq DinIf you want to run your AI models locally without relying on a cloud API (like ChatGPT public...
If you want to run your AI models locally without relying on a cloud API (like ChatGPT public website) , Ollama could give you a way for that idea! In this guide , we will guide you on how to install Ollama , downloading a model and testing it with a simple prompt or message, everything in your terminal!
Before that , let’s bring you to a simple introduction about Ollama.
Ollama is an open-source framework that allows you to run Large Language Models (LLMs) like Llama 3, Mistral, and DeepSeek etc. directly on your own hardware (laptop or desktop). It handles the "heavy lifting" of AI , like managing memory, talking to your GPU, and downloading massive model files , so you can just type a command and start chatting!
Of course you're wondering , why does Ollama exists?
before Ollama, running a model locally was a nightmare. You had to:
Ollama exists to simplify this into a single click or a single terminal command. It packages the model weights, configuration, and the "inference engine" into one neat bundle!
Your data never leaves your computer. This is a game-changer for businesses or individuals handling sensitive medical, legal, or personal data.
No "per-token" fees. You pay for the electricity to run your computer, but the AI is free to use forever.
It works on a plane, in a basement, or anywhere without an internet connection.
It automatically sets up a local API (at localhost:11434) that mimics OpenAI's API, making it incredibly easy to build your own apps.
It uses "quantization" (shrinking the model size) to let you run powerful AI on standard consumer laptops, not just $10,000 servers.
Without further ado, let's go through how to download Ollama , install it's models and run the model in Ollama!
Ollama provides both a GUI app and a command-line interface (CLI) that allows you to interact with models locally.
Steps:
Run
ollama --version
in your command prompt as shown below:
In the same terminal , if you want to download a model and start using it right away, use the run command. If the model isn't on your computer yet, Ollama will automatically pull it first. Since i have installed gemma3 , i attached a llama3 model that you can download through this command below;
After that , you can use the run command with the model name that you have downloaded , the example is attached below:
If the prompt command is shown after you type the command , then congratulations! You have successfully installed and run your model locally via your command prompt!
Here is also some responses the model will give you after you type an input!
And there you have it! This is how you can run your AI models locally using Ollama. While running the model locally is convenient for data privacy and developer friendly , there are some notable fallbacks of running the models locally.
If you try to run a model larger than your available RAM (or VRAM on your GPU), it will either be painfully slow or crash.
Rule of thumb: 8GB RAM for 3B–7B models; 16GB+ for 8B–13B models; 32GB+ for 30B+ models.
If Ollama can't find your GPU (or it runs out of VRAM), it will silently switch to your CPU. You’ll know this happened because the response speed will drop from "lightning fast" to "one word every two seconds.”
Models are HUGE in storage size. A single "medium" model (like Llama 3 8B) takes about 5GB. If you download 10 different models to "test" them, you'll lose 50GB of disk space very quickly.
By default, Ollama doesn't "remember" you between different sessions unless you use a UI (like Open WebUI) or write a script to manage the conversation history.
Even more so , Ollama plays a significant role to your future projects!
Ollama plays two critical roles in any project: it is the Librarian and the Engine.
Ollama handles the "messy" parts of working with AI models so you don't have to.
It pulls the massive model files (like Llama 3 or Mistral) from the internet.
It organizes them on your hard drive so they're ready to use.
It "shrinks" the models (quantization) so they can
actually run on a normal laptop instead of a giant server.
This is its main job during a project. Ollama sits in the background and waits for instructions.
It creates a local "doorway" (at localhost:11434). Your web app or script knocks on this door and sends a prompt.
When a prompt arrives, Ollama "starts" the model's brain, uses your computer's RAM/GPU to think, and generates the answer.
It keeps the AI separate from your code. If the AI model crashes because it ran out of memory, your actual website or app won't crash; only the Ollama "engine" stalls.
There you have it! I'm looking forward to your upcoming projects being integrated with Ollama. Hopefully this article will be helpful to you and goodluck for your future work with Ollama! Thank you and let me know your feedback on this article. Will improve more :)