Self-hosted
AI & Machine Learning None

Ollama REST API

Run large language models locally with REST API

Ollama is a local AI runtime that allows developers to run large language models (LLMs) on their own hardware through a simple REST API. It supports popular models like Llama 2, Mistral, Code Llama, and custom models with optimized performance for local inference. Developers use Ollama to build AI applications without cloud dependencies, maintain data privacy, and reduce inference costs.

Base URL http://localhost:11434/api

API Endpoints

MethodEndpointDescription
POST/generateGenerate a completion from a specified model given a single prompt string. Returns the generated text and context.
POST/chatGenerate chat completions with conversation history. Accepts an array of messages with role and content fields.
POST/embeddingsGenerate embeddings from a model for a given text. Returns a vector array representing the semantic meaning.
POST/pullDownload a model from the Ollama library by name. Streams download progress until complete.
POST/pushUpload a local model to the Ollama library. Requires model name and streams upload progress.
POST/createCreate a new model from a Modelfile specification. Requires name and modelfile parameters in the request body.
DELETE/deleteDelete a model and its data from local storage. Requires the model name in the request body.
POST/copyCopy an existing model to a new name. Requires source and destination model names in the request.
GET/tagsList all locally available models with their names, sizes, and modification dates.
POST/showShow detailed information about a specific model including modelfile, parameters, and template. Requires model name.
GET/psList currently running models with their names, sizes, and expiration times.
POST/blobs/:digestCreate a blob for model file uploads using the SHA256 digest as the path parameter.

Code Examples

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Use Ollama from Claude / Cursor / ChatGPT

Ollama is a self-hosted protocol — it lives on a host you operate (default http://localhost:11434/api). A hosted MCP gateway can't reach localhost on your machine, so the usual one-click setup doesn't apply. These are the tools an MCP for Ollama would expose:

ollama_generate Generate text completions from locally running LLMs with custom prompts and parameters
ollama_chat Maintain multi-turn conversations with local AI models using chat history
ollama_embed Generate vector embeddings from text using local embedding models for semantic search
ollama_list_models List all locally available models and their details including size and modified date
ollama_pull_model Download and install models from the Ollama library to local storage

Run an Ollama MCP locally

The local-CLI version of these tools is on the way (npx @meru/rest-mcp --vendor=ollama · BYO connection string · zero secrets sent to us). For now use the patterns below in your own MCP server, or self-host one from the IOX templates.

Build your own Ollama MCP →

Related APIs