Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
github.com/sammcj/gollama
Gollama is a macOS / Linux tool for managing Ollama models.
It provides a TUI (Text User Interface) for listing, inspecting, deleting, copying, and pushing Ollama models as well as optionally linking them to LM Studio*.
The application allows users to interactively select models, sort, filter, edit, run, unload and perform actions on them using hotkeys.
The project started off as a rewrite of my llamalink project, but I decided to expand it to include more features and make it more user-friendly.
It's in active development, so there are some bugs and missing features, however I'm finding it useful for managing my models every day, especially for cleaning up old models.
See also - ingest for passing directories/repos of code to markdown formatted for LLMs.
Gollama Intro ("Podcast" Episode):
go install github.com/sammcj/gollama@HEAD
I don't recommend this method as it's not as easy to update, but you can use the following command:
curl -sL https://raw.githubusercontent.com/sammcj/gollama/refs/heads/main/scripts/install.sh | bash
Download the most recent release from the releases page and extract the binary to a directory in your PATH.
e.g. zip -d gollama*.zip -d gollama && mv gollama /usr/local/bin
If you see this error, add environment variables to .zshrc
or .bashrc
.
echo 'export PATH=$PATH:$HOME/go/bin' >> ~/.zshrc
source ~/.zshrc
To run the gollama
application, use the following command:
gollama
Tip: I like to alias gollama to g
for quick access:
echo "alias g=gollama" >> ~/.zshrc
Space
: SelectEnter
: Run model (Ollama run)i
: Inspect modelt
: Top (show running models)D
: Delete modele
: Edit modelc
: Copy modelU
: Unload all modelsp
: Pull an existing modelctrl+p
: Pull (get) new modelP
: Push modeln
: Sort by names
: Sort by sizem
: Sort by modifiedk
: Sort by quantisationf
: Sort by familyl
: Link model to LM StudioL
: Link all models to LM Studior
: Rename model (Work in progress)q
: QuitTop (t
)
Inspect (i
)
Link (l
), Link All (L
) and Link in the reverse direction: (link-lmstudio
)
When linking models to LM Studio, Gollama creates a Modelfile with the template from LM-Studio and a set of default parameters that you can adjust.
Note: Linking requires admin privileges if you're running Windows.
-l
: List all available Ollama models and exit-L
: Link all available Ollama models to LM Studio and exit-link-lmstudio
: Link all available LM Studio models to Ollama and exit--dry-run
: Show what would be linked without making any changes (use with -link-lmstudio or -L)-s <search term>
: Search for models by name
'term1|term2'
) returns models that match either term'term1&term2'
) returns models that match both terms-e <model>
: Edit the Modelfile for a model-ollama-dir
: Custom Ollama models directory-lm-dir
: Custom LM Studio models directory-cleanup
: Remove all symlinked models and empty directories and exit-no-cleanup
: Don't cleanup broken symlinks-u
: Unload all running models-v
: Print the version and exit-h
, or --host
: Specify the host for the Ollama API-H
: Shortcut for -h http://localhost:11434
(connect to local Ollama API)--vram
: Estimate vRAM usage for a model. Accepts:
llama3.1:8b-instruct-q6_K
, qwen2:14b-q4_0
)NousResearch/Hermes-2-Theta-Llama-3-8B
)--fits
: Available memory in GB for context calculation (e.g. 6
for 6GB)--vram-to-nth
or --context
: Maximum context length to analyze (e.g. 32k
or 128k
)--quant
: Override quantisation level (e.g. Q4_0
, Q5_K_M
)Gollama can also be called with -l
to list models without the TUI.
gollama -l
List (gollama -l
):
Gollama can be called with -e
to edit the Modelfile for a model.
gollama -e my-model
Gollama can be called with -s
to search for models by name.
gollama -s my-model # returns models that contain 'my-model'
gollama -s 'my-model|my-other-model' # returns models that contain either 'my-model' or 'my-other-model'
gollama -s 'my-model&instruct' # returns models that contain both 'my-model' and 'instruct'
Gollama includes a comprehensive vRAM estimation feature:
my-model:mytag
), or huggingface model ID (e.g. author/name
)To estimate (v)RAM usage:
gollama --vram llama3.1:8b-instruct-q6_K
📊 VRAM Estimation for Model: llama3.1:8b-instruct-q6_K
| QUANT | CTX | BPW | 2K | 8K | 16K | 32K | 49K | 64K |
| ------- | ---- | --- | --- | --------------- | --------------- | --------------- | --------------- |
| IQ1_S | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7) | 5.5(5.5,5.5) | 7.3(7.3,7.3) | 9.1(9.1,9.1) |
| IQ2_XXS | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3) | 6.1(6.1,6.1) | 7.9(7.9,7.9) | 9.8(9.8,9.8) |
| IQ2_XS | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5) | 6.4(6.4,6.4) | 8.2(8.2,8.2) | 10.1(10.1,10.1) |
| IQ2_S | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7) | 6.6(6.6,6.6) | 8.5(8.5,8.5) | 10.4(10.4,10.4) |
| IQ2_M | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9) | 6.8(6.8,6.8) | 8.7(8.7,8.7) | 10.6(10.6,10.6) |
| IQ3_XXS | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3) | 7.2(7.2,7.2) | 9.2(9.2,9.2) | 11.1(11.1,11.1) |
| IQ3_XS | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5) | 7.5(7.5,7.5) | 9.5(9.5,9.5) | 11.4(11.4,11.4) |
| Q2_K | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6) | 7.6(7.6,7.6) | 9.5(9.5,9.5) | 11.5(11.5,11.5) |
| Q3_K_S | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7) | 7.7(7.7,7.7) | 9.7(9.7,9.7) | 11.7(11.7,11.7) |
| IQ3_S | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7) | 7.7(7.7,7.7) | 9.7(9.7,9.7) | 11.7(11.7,11.7) |
| IQ3_M | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0) | 8.0(8.0,8.0) | 9.9(9.9,9.9) | 12.0(12.0,12.0) |
| Q3_K_M | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2) | 8.2(8.2,8.2) | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5) | 8.6(8.6,8.6) | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5) | 8.6(8.6,8.6) | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8) | 8.9(8.9,8.9) | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0 | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8) | 8.9(8.9,8.9) | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9) | 8.9(8.9,8.9) | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1) | 9.2(9.2,9.2) | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2) | 9.3(9.3,9.3) | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_K_S | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8) | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_0 | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8) | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0) | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1) | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4) | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0 | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |
To find the best quantisation type for a given memory constraint (e.g. 6GB) you can provide --fits <number of GB>
:
gollama --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 6
📊 VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B
| QUANT/CTX | BPW | 2K | 8K | 16K | 32K | 49K | 64K |
| --------- | ---- | --- | --- | ------------ | ------------- | -------------- | --------------- |
| IQ1_S | 1.56 | 2.4 | 3.8 | 5.7(4.7,4.2) | 9.5(7.5,6.5) | 13.3(10.3,8.8) | 17.1(13.1,11.1) |
| IQ2_XXS | 2.06 | 2.9 | 4.3 | 6.3(5.3,4.8) | 10.1(8.1,7.1) | 13.9(10.9,9.4) | 17.8(13.8,11.8) |
...
This will display a table showing vRAM usage for various quantisation types and context sizes.
The vRAM estimator works by:
Note: The estimator will attempt to use CUDA vRAM if available, otherwise it will fall back to system RAM for calculations.
Gollama uses a JSON configuration file located at ~/.config/gollama/config.json
. The configuration file includes options for sorting, columns, API keys, log levels etc...
Example configuration:
{
"default_sort": "modified",
"columns": [
"Name",
"Size",
"Quant",
"Family",
"Modified",
"ID"
],
"ollama_api_key": "",
"ollama_api_url": "http://localhost:11434",
"lm_studio_file_paths": "",
"log_level": "info",
"log_file_path": "/Users/username/.config/gollama/gollama.log",
"sort_order": "Size",
"strip_string": "my-private-registry.internal/",
"editor": "",
"docker_container": ""
}
strip_string
can be used to remove a prefix from model names as they are displayed in the TUI. This can be useful if you have a common prefix such as a private registry that you want to remove for display purposes.docker_container
- experimental - if set, gollama will attempt to perform any run operations inside the specified container.editor
- experimental - if set, gollama will use this editor to open the Modelfile for editing.Clone the repository:
git clone https://github.com/sammcj/gollama.git
cd gollama
Build:
go get
make build
Run:
./gollama
Logs can be found in the gollama.log
which is stored in $HOME/.config/gollama/gollama.log
by default.
The log level can be set in the configuration file.
Contributions are welcome! Please fork the repository and create a pull request with your changes.
Sam |
KimCookieYa |
Denis Balan |
Doug Coleman |
Jose Almaraz |
Jose Roberto Almaraz |
Oleksii Filonenko |
SouthWolf |
anrgct |
ondrej |
Thank you to folks such as Matt Williams, Fahd Mirza and AI Code King for giving this a shot and providing feedback.
Copyright © 2024 Sam McLeod
This project is licensed under the MIT License. See the LICENSE file for details.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.