How-to: Use Different LLM Models with Ollama¶

This guide shows you how to select and configure different Large Language Models (LLMs) using Ollama for the generation stage of the RAG pipeline.

In order to work in this guide, you need to install and set up Ollama locally:

Download Ollama from ollama.com
Start the Ollama server: ollama serve
Pull a model: ollama pull llama3.1

Selecting a Model¶

Specify the model using the --model (or -m) flag:

nerxiv prompt \
  --file-path paper.hdf5 \
  --model llama3.1:70b

Popular Models for Scientific Text¶

Recommended Models (November 2025)¶

Model	Size	Best For	Speed	Quality
`gpt-oss:20b`	20B	Quick extraction, good accuracy	Fast	Very good
`llama3.1:8b`	8B	Quick extraction, general queries	Fast	Good
`llama3.1:70b`	70B	Complex reasoning, accurate extraction	Slow	Excellent
`qwen2.5:32b`	32B	Technical text, good reasoning	Medium	Very Good
`deepseek-r1:14b`	14B	Scientific reasoning, formulas	Medium	Very Good
`mistral:7b`	7B	Fast processing, simple queries	Fast	Good

Model Selection Guide¶

These are a few examples of how to select the LLM model. You can read mode about chosing model parameters in How-to: Adjust LLM Parameters.

nerxiv prompt --file-path paper.hdf5 --model llama3.1:8b

nerxiv prompt --file-path paper.hdf5 --model llama3.1:70b

nerxiv prompt --file-path paper.hdf5 --model deepseek-r1:14b

Installing Models¶

Before using a model, pull it from Ollama:

# List available models
ollama list

# Pull a specific model
ollama pull llama3.1:8b

# Pull a larger model (may take time)
ollama pull llama3.1:70b

Using Custom Ollama Endpoints¶

If running Ollama on a remote server or custom port:

nerxiv prompt \
  --file-path paper.hdf5 \
  -llmo base_url=http://192.168.1.100:11434

Python API¶

For programmatic control:

from nerxiv.rag import LLMGenerator
from nerxiv.prompts import PROMPT_REGISTRY

# Get prompt template
query_entry = PROMPT_REGISTRY["material_formula"]
prompt_template = query_entry.prompt

# Initialize generator with custom settings
generator = LLMGenerator(
    model="llama3.1:70b",
    text=retrieved_chunks,
    temperature=0.2,
    num_ctx=8192,
    top_p=0.9,
    format="json"
)

# Generate answer
prompt = prompt_template.build(text=retrieved_chunks)
answer = generator.generate(prompt=prompt)
print(answer)

Comparing Models¶

Test different models on the same paper by running in your terminal:

# Test with different models
for model in llama3.1:8b llama3.1:70b qwen2.5:32b; do
  echo "Testing $model..."
  nerxiv prompt --file-path paper.hdf5 --model $model --query material_formula
done

Troubleshooting¶

Model Not Found¶

Error: model 'llama3.1:70b' not found

Solution: Pull the model first:

ollama pull llama3.1:70b

Out of Memory¶

Error: failed to allocate memory

Solution: Use a smaller model or reduce context:

nerxiv prompt --file-path paper.hdf5 --model llama3.1:8b -llmo num_ctx=4096

Ollama Not Running¶

Error: connection refused

Solution: Start Ollama server:

ollama serve

Slow Generation¶

If generation is too slow:

Use a smaller model: llama3.1:8b instead of :70b
Reduce context: -llmo num_ctx=4096
Use GPU if available
Reduce number of retrieved chunks: --n-top-chunks 3