Getting Started with Ollama: From Installation to Testing

# webdev# ai# programming# beginners

Raziq Din

If you want to run your AI models locally without relying on a cloud API (like ChatGPT public...

If you want to run your AI models locally without relying on a cloud API (like ChatGPT public website) , Ollama could give you a way for that idea! In this guide , we will guide you on how to install Ollama , downloading a model and testing it with a simple prompt or message, everything in your terminal!

Before that , let’s bring you to a simple introduction about Ollama.

What is Ollama?

Ollama is an open-source framework that allows you to run Large Language Models (LLMs) like Llama 3, Mistral, and DeepSeek etc. directly on your own hardware (laptop or desktop). It handles the "heavy lifting" of AI , like managing memory, talking to your GPU, and downloading massive model files , so you can just type a command and start chatting!

Of course you're wondering , why does Ollama exists?

Why does Ollama exists?

before Ollama, running a model locally was a nightmare. You had to:

Manually download multi-gigabyte files.
Configure complex Python environments and C++ libraries.
Keep praying that your GPU drivers were compatible with the specific version of the model.

Ollama exists to simplify this into a single click or a single terminal command. It packages the model weights, configuration, and the "inference engine" into one neat bundle!

What are the benefits of running Ollama?

Privacy & Security:

Your data never leaves your computer. This is a game-changer for businesses or individuals handling sensitive medical, legal, or personal data.

Zero Cost:

No "per-token" fees. You pay for the electricity to run your computer, but the AI is free to use forever.

Offline Access:

It works on a plane, in a basement, or anywhere without an internet connection.

Developer Friendly:

It automatically sets up a local API (at localhost:11434) that mimics OpenAI's API, making it incredibly easy to build your own apps.

Performance Optimization:

It uses "quantization" (shrinking the model size) to let you run powerful AI on standard consumer laptops, not just $10,000 servers.

Without further ado, let's go through how to download Ollama , install it's models and run the model in Ollama!

1.Install Ollama

Ollama provides both a GUI app and a command-line interface (CLI) that allows you to interact with models locally.

Steps:

Go to ollama official website : Ollama
Download the installer for your own OS
Run the installer and follow the prompts , after that , ollama can be opened in your system

2.Verify installation

Run

ollama --version

in your command prompt as shown below:

3.Download a Model

In the same terminal , if you want to download a model and start using it right away, use the run command. If the model isn't on your computer yet, Ollama will automatically pull it first. Since i have installed gemma3 , i attached a llama3 model that you can download through this command below;

4.Running the Model

After that , you can use the run command with the model name that you have downloaded , the example is attached below:

If the prompt command is shown after you type the command , then congratulations! You have successfully installed and run your model locally via your command prompt!

Here is also some responses the model will give you after you type an input!

And there you have it! This is how you can run your AI models locally using Ollama. While running the model locally is convenient for data privacy and developer friendly , there are some notable fallbacks of running the models locally.

Notable Fallbacks

RAM is King:

If you try to run a model larger than your available RAM (or VRAM on your GPU), it will either be painfully slow or crash.
Rule of thumb: 8GB RAM for 3B–7B models; 16GB+ for 8B–13B models; 32GB+ for 30B+ models.

Silent GPU Fallback:

If Ollama can't find your GPU (or it runs out of VRAM), it will silently switch to your CPU. You’ll know this happened because the response speed will drop from "lightning fast" to "one word every two seconds.”

Storage Hog:

Models are HUGE in storage size. A single "medium" model (like Llama 3 8B) takes about 5GB. If you download 10 different models to "test" them, you'll lose 50GB of disk space very quickly.

No "Long-term" Memory:

By default, Ollama doesn't "remember" you between different sessions unless you use a UI (like Open WebUI) or write a script to manage the conversation history.

Even more so , Ollama plays a significant role to your future projects!

Ollama’s roles in any projects

Ollama plays two critical roles in any project: it is the Librarian and the Engine.

The Librarian (Model Management)

Ollama handles the "messy" parts of working with AI models so you don't have to.

Downloading:

It pulls the massive model files (like Llama 3 or Mistral) from the internet.

Storage:

It organizes them on your hard drive so they're ready to use.

Optimization:

It "shrinks" the models (quantization) so they can
actually run on a normal laptop instead of a giant server.

The Engine (Inference Server)

This is its main job during a project. Ollama sits in the background and waits for instructions.

The API:

It creates a local "doorway" (at localhost:11434). Your web app or script knocks on this door and sends a prompt.

Brain Power:

When a prompt arrives, Ollama "starts" the model's brain, uses your computer's RAM/GPU to think, and generates the answer.

Process Isolation:

It keeps the AI separate from your code. If the AI model crashes because it ran out of memory, your actual website or app won't crash; only the Ollama "engine" stalls.

There you have it! I'm looking forward to your upcoming projects being integrated with Ollama. Hopefully this article will be helpful to you and goodluck for your future work with Ollama! Thank you and let me know your feedback on this article. Will improve more :)