AI & Machine Learning None

Ollama REST API

Run large language models locally with REST API

Ollama is a local AI runtime that allows developers to run large language models (LLMs) on their own hardware through a simple REST API. It supports popular models like Llama 2, Mistral, Code Llama, and custom models with optimized performance for local inference. Developers use Ollama to build AI applications without cloud dependencies, maintain data privacy, and reduce inference costs.

Base URL http://localhost:11434/api

API Endpoints

Method	Endpoint	Description
POST	`/generate`	Generate a completion from a specified model given a single prompt string. Returns the generated text and context.
POST	`/chat`	Generate chat completions with conversation history. Accepts an array of messages with role and content fields.
POST	`/embeddings`	Generate embeddings from a model for a given text. Returns a vector array representing the semantic meaning.
POST	`/pull`	Download a model from the Ollama library by name. Streams download progress until complete.
POST	`/push`	Upload a local model to the Ollama library. Requires model name and streams upload progress.
POST	`/create`	Create a new model from a Modelfile specification. Requires name and modelfile parameters in the request body.
DELETE	`/delete`	Delete a model and its data from local storage. Requires the model name in the request body.
POST	`/copy`	Copy an existing model to a new name. Requires source and destination model names in the request.
GET	`/tags`	List all locally available models with their names, sizes, and modification dates.
POST	`/show`	Show detailed information about a specific model including modelfile, parameters, and template. Requires model name.
GET	`/ps`	List currently running models with their names, sizes, and expiration times.
POST	`/blobs/:digest`	Create a blob for model file uploads using the SHA256 digest as the path parameter.

Code Examples

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama2',
    prompt: 'Why is the sky blue?',
    stream: false
  })
});
const data = await response.json();
console.log(data.response);

import requests

response = requests.post('http://localhost:11434/api/generate',
  json={
    'model': 'llama2',
    'prompt': 'Why is the sky blue?',
    'stream': False
  }
)
print(response.json()['response'])

Use Ollama from Claude / Cursor / ChatGPT

Ollama is a self-hosted protocol — it lives on a host you operate (default http://localhost:11434/api). A hosted MCP gateway can't reach localhost on your machine, so the usual one-click setup doesn't apply. These are the tools an MCP for Ollama would expose:

ollama_generate Generate text completions from locally running LLMs with custom prompts and parameters

ollama_chat Maintain multi-turn conversations with local AI models using chat history

ollama_embed Generate vector embeddings from text using local embedding models for semantic search

ollama_list_models List all locally available models and their details including size and modified date

ollama_pull_model Download and install models from the Ollama library to local storage

Run an Ollama MCP locally

The local-CLI version of these tools is on the way (npx @meru/rest-mcp --vendor=ollama · BYO connection string · zero secrets sent to us). For now use the patterns below in your own MCP server, or self-host one from the IOX templates.

Build your own Ollama MCP →

Ollama REST API

API Endpoints

Sponsor this page

Code Examples

Use Ollama from Claude / Cursor / ChatGPT

Run an Ollama MCP locally

Related APIs