Hey everyone! I'm back with an exciting tool that lets you run Llama 2, Code Llama, and more directly in your terminal using a simple Docker command. Say hello to Ollama, the AI chat program that makes interacting with LLMs as easy as spinning up a docker container.
What is Ollama?
Ollama Supported LLMs
Ollama offers a range of open-source models you can find at ollama.ai/library. Check out these examples of models you can download:
ollama run mistral
ollama run llama2
ollama run codellama
|Llama 2 Uncensored
ollama run llama2-uncensored
|Llama 2 13B
ollama run llama2:13b
|Llama 2 70B
ollama run llama2:70b
ollama run orca-mini
ollama run vicuna
Please remember: To run the 3B models, make sure you have a minimum of 8 GB of RAM. For the 7B models, you'll need 16 GB, and for the 13B models, you should have 32 GB.
Get Started using Ollama with Docker
Ollama can take advantage of GPU acceleration in Docker containers designed for Nvidia GPUs. I didn't use the GPU option for testing, and it ran smoothly!
Just ensure you have a computer with Linux and Docker installed. Then, use the following command to download the Ollama image to your computer.
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Don't forget, this is running on your CPU, not the GPU. If you're looking for that extra oomph with GPU support, check out the Ollama blog post for Docker image that supports Nvidia GPU.
docker exec -it ollama ollama run llama2
This will download the Llama 2 model to your system. If you use the "ollama run" command and the model isn't already downloaded, it will perform a download. To get the model without running it, simply use "ollama pull llama2."
Once the model is downloaded you can initiate the chat sequence and begin your conversation. You can exit or start a new chat by pressing Ctrl+D. If you exit, you will have to re run the command above to initiate the chat again.
Ok, so performance is completely dependant on the hardware and resources your system has. I ran llama2's 7B version on an LXC container with 15GB of ram and 4 CPU cores. Here is the response speed I was getting.
I was blown away at how fast the responses where out of an LXC container...
Going below 3 CPU cores made it almost unusable, resulting in frequent freezes. Keep in mind, this was just for testing purposes. AI chatbots always work better on dedicated systems or bare metal.
Ollama Desktop Versions
I'd like to point out that Ollama offers desktop apps for both MacOS and Linux, and it seems they're working on a Windows version as well, though I haven't personally tried the desktop versions, having focused on the Docker version as it seems more suited for self-hosting.
Final Notes and Thoughts
I have to admit, I'm genuinely impressed with how Ollama performs in this testing environment. It's so user friendly that I can see myself using it more often. Downloading models is simple and well documented on the Ollama website.
Even though a command-line tool like this might not offer the same convenience and flashy features as a web interface, there's a certain satisfaction in whipping up a chatbot quickly right in your terminal. There's a sense of accomplishment when you see it in action.
If you're new to AI chatbots and LLMs and eager to dive into the topic, don't hesitate to browse through my other articles right here for a deeper understanding. I'll be covering a lot more about the topic here on Noted so stay tuned. 😊
This was a basic example of how to setup Ollama using Docker. There is a lot I didn't cover, like customizing your own models, prompts and even the Rest API. To read more about these Ollama features, visit the following links:
Ollama website: https://ollama.ai
Ollama Github: https://github.com/jmorganca/ollama
Ollama Discord: https://discord.com/invite/ollama