Hot To Run LLMs Locally

Hot To Run LLMs Locally

# ai# llm# tutorial# learning
Hot To Run LLMs LocallyNilesh Raut

If you are using Claude API, OpenAI API, Cursor, or AI coding tools daily, your API bill can grow...

If you are using Claude API, OpenAI API, Cursor, or AI coding tools daily, your API bill can grow very fast.

A lot of developers are now moving to local LLM setups because they want:

  • Lower AI costs
  • Offline AI access
  • Better privacy
  • Faster experimentation
  • No API limits

The good news is:

You can now run powerful AI models directly on your laptop using tools like Ollama (run llm locally).

This setup works great for:

  • Coding help
  • Refactoring
  • Learning
  • Documentation
  • AI chat
  • Small local agents

Let’s set it up step by step.


Step 1: Install Ollama

Download Ollama

Install it normally like any software.

After installation, open CMD or Terminal and check:

ollama --version
Enter fullscreen mode Exit fullscreen mode

If you see a version number, it is installed correctly.


Step 2: Download Your First AI Model

Now pull a model locally.

Example:

ollama pull llama3
Enter fullscreen mode Exit fullscreen mode

Or for coding:

ollama pull qwen2.5-coder:7b
Enter fullscreen mode Exit fullscreen mode

The first download may take a few minutes because models are several GB in size.


Step 3: Run the Model

Start chatting with the model:

ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Example:

>>> Explain Docker in simple words
Enter fullscreen mode Exit fullscreen mode

You now have a local AI assistant running directly on your machine.

No API required.


Step 4: Use It Inside VS Code

Install:

  • Continue.dev
  • Cline

Both work with Ollama locally.

In Continue.dev config:

{
  "models": [
    {
      "title": "Local AI",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now VS Code can use your local model for:

  • Code generation
  • Refactoring
  • Debugging
  • Chat

Step 5: Open Chat UI in Browser

You can also use a ChatGPT-like interface locally.

Install Open WebUI using Docker:

docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Open:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Now you have your own private AI chat app.


Recommended Models

Model Best For
Qwen2.5 Coder Coding
DeepSeek Coder Refactoring
Llama 3 General AI
Phi Low-end laptops
Mistral Fast responses

Minimum Hardware

Basic setup:

  • 16GB RAM recommended
  • SSD storage
  • NVIDIA GPU helps a lot

CPU-only works too, but slower.


Why Developers Like Local AI

Main reasons:

  • No monthly API bills
  • More privacy
  • Works offline
  • Full control
  • Easy experimentation

For daily coding workflows, local LLMs are becoming surprisingly useful.

Cloud models are still stronger for advanced reasoning, but local AI is now good enough for many real-world tasks.


Final Thoughts

If you are spending too much on AI APIs, this is probably the easiest way to reduce costs.

Start simple:

  • Install Ollama
  • Pull one coding model
  • Connect it to VS Code

That alone can replace a large percentage of your daily AI usage.

Useful links: