Ollama REST API
Run large language models locally with REST API
Ollama is a local AI runtime that allows developers to run large language models (LLMs) on their own hardware through a simple REST API. It supports popular models like Llama 2, Mistral, Code Llama, and custom models with optimized performance for local inference. Developers use Ollama to build AI applications without cloud dependencies, maintain data privacy, and reduce inference costs.
http://localhost:11434/api
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /generate | Generate a completion from a specified model given a single prompt string. Returns the generated text and context. |
| POST | /chat | Generate chat completions with conversation history. Accepts an array of messages with role and content fields. |
| POST | /embeddings | Generate embeddings from a model for a given text. Returns a vector array representing the semantic meaning. |
| POST | /pull | Download a model from the Ollama library by name. Streams download progress until complete. |
| POST | /push | Upload a local model to the Ollama library. Requires model name and streams upload progress. |
| POST | /create | Create a new model from a Modelfile specification. Requires name and modelfile parameters in the request body. |
| DELETE | /delete | Delete a model and its data from local storage. Requires the model name in the request body. |
| POST | /copy | Copy an existing model to a new name. Requires source and destination model names in the request. |
| GET | /tags | List all locally available models with their names, sizes, and modification dates. |
| POST | /show | Show detailed information about a specific model including modelfile, parameters, and template. Requires model name. |
| GET | /ps | List currently running models with their names, sizes, and expiration times. |
| POST | /blobs/:digest | Create a blob for model file uploads using the SHA256 digest as the path parameter. |
Sponsor this page
AvailableReach developers actively building with Ollama. See live pageview data and self-serve checkout — your slot goes live in minutes.
View inventory & pricing →Code Examples
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Use Ollama from Claude / Cursor / ChatGPT
Ollama is a self-hosted protocol — it lives on a host you operate (default http://localhost:11434/api). A
hosted MCP gateway can't reach localhost on your machine, so the usual one-click setup doesn't apply.
These are the tools an MCP for Ollama would expose:
ollama_generate
Generate text completions from locally running LLMs with custom prompts and parameters
ollama_chat
Maintain multi-turn conversations with local AI models using chat history
ollama_embed
Generate vector embeddings from text using local embedding models for semantic search
ollama_list_models
List all locally available models and their details including size and modified date
ollama_pull_model
Download and install models from the Ollama library to local storage
Run an Ollama MCP locally
The local-CLI version of these tools is on the way (npx @meru/rest-mcp --vendor=ollama · BYO connection string · zero secrets sent to us). For now use the patterns below in your own MCP server, or self-host one from the IOX templates.