Security Offering

Running LLMs Locally with Ollama: No GPU, No Cloud, No Excuses

In a previous post , I discussed setting up WSL2 on a Windows machine, focusing on limiting CPU and memory utilization and configuring an additional swap disk. Building upon that, this article will guide you through setting up Ollama and VS Code to prepare for AI application development.



Introduction to Ollama

Ollama makes running open-source LLMs locally dead simple — no cloud, no API keys, no GPU needed. Just one command (ollama run phi) and you're chatting with a model that lives entirely on your machine.

Built by a small team of ex-devtool and ML engineers at Ollama Inc., the project wraps the powerful but low-level llama.cpp engine in a smooth developer experience — handling model downloads, quantization, and inference behind the scenes.

Whether you’re hacking on a side project, building an AI CLI tool, or just exploring what local models can do, Ollama cuts through the setup pain and gets you straight to the fun part.

Think of it as Homebrew, but for brains — local-first, developer-friendly, and privacy-respecting. It just works.

Minimalism and simplicity are absolutely central to Ollama’s brand and product philosophy.

Ollama Home Page
Ollama Home Page

Installing Ollama

Installation of Ollama is straightforward. Let's jump in.

Step-by-step: Install Ollama on WSL Ubuntu

  • Ensure you are running WSL2 & not WSL1. Run the command below on Windows:
wsl --version
  • Install dependencies:
sudo apt update
                sudo apt install -y curl gpg
  • Add Ollama’s GPG key, repo and install it:
curl -fsSL https://ollama.com/install.sh | sh
  • Check if the Ollama service is installed:
sudo systemctl status ollama

If the above command gives an error, run the following to enable the service:

sudo systemctl enable --now ollama

If WSL is quirky about services, run manually:

ollama serve &

Check CPU and RAM usage by running following commands respectively:

nproc
free -h

Note: More details on WSL resource allocation and swap setup are in this earlier post.

Ollama supports CPU and GPU. But it won't use integrated GPUs like Intel UHD 620 due to lack of ML acceleration capabilities (e.g., no CUDA, ROCm, or tensor cores).

Ollama Models

Ollama is to LLMs what Docker is to containers — it brings repeatability, portability, and simplicity to local model usage.

Model Registry: Run ollama run <model> to pull from the central registry if not cached locally.

Model Images: Models like mistral, phi, and tinydolphin are packaged with weights and metadata in .gguf format.

Custom Models: Define a Modelfile based on existing models and build your own:

FROM mistral
                SYSTEM "You are a helpful assistant."
ollama create my-custom-model -f Modelfile

Download a model:

ollama run mistral

List downloaded models:

ollama list

Sample output:

NAME               ID              SIZE      MODIFIED
                smollm2:1.7b       cef4a1e09247    1.8 GB    2 weeks ago
                ...

Running the smallest local model:

ollama run smollm2:1.7b

Ollama vs. ChatGPT Review

Ollama rewrote my intro for clarity. I then fed both versions to ChatGPT-4o. Here's what it had to say.

Version 1 ( Original paragraph written by me)
In one of the earlier posts , I had described about WSL2 and how we set it up on a Windows Machine specifically configuring it to limit the max CPU and Memory Utilization and setting up an additional swap disk. In this article, we will take this forward and see how we setup Ollama, VS Code to get ready for AI app development.
Version 2 ( Proposed paragraph from Ollama )
In one of the earlier posts, I described WSL2 and how it can be set up on a Windows machine. Specifically, this involves configuring it to limit CPU and memory utilization, as well as setting up an additional swap disk. In this article, we will take that further and explore how to prepare your environment for AI application development using tools like Ollama and VS Code.

Why Version 2 Wins:

  • Better grammar
  • Clear structure
  • Smoother flow
  • Professional tone
  • Proper verb use

Final Thoughts

Ollama is easy to set up. It’s not great for real-time chat, but perfect as a background dev assistant.

  • Code review suggestions
  • Document summarization
  • Test case generation
  • Auto-generating FAQs


Need a Fractional CTO or CISO?

We help startups move faster without cutting corners. Connect on LinkedIn to chat.


Hungry for more hands‑on guides on coding, security, and open‑source? Join our newsletter community—new insights delivered every week. Sign up below 👇