AI & Machine Learning
None
Ollama REST API
Run large language models locally with REST API
Ollama is a local AI runtime that allows developers to run large language models (LLMs) on their own hardware through a simple REST API. It supports popular models like Llama 2, Mistral, Code Llama, and custom models with optimized performance for local inference. Developers use Ollama to build AI applications without cloud dependencies, maintain data privacy, and reduce inference costs.
Base URL
http://localhost:11434/api
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /generate | Generate a response from a model with a single prompt |
| POST | /chat | Generate chat completions with conversation history |
| POST | /embeddings | Generate embeddings from a model for a given text |
| POST | /pull | Download a model from the Ollama library |
| POST | /push | Upload a model to the Ollama library |
| POST | /create | Create a new model from a Modelfile |
| DELETE | /delete | Delete a model and its data |
| POST | /copy | Copy a model to a new name |
| GET | /tags | List all locally available models |
| POST | /show | Show information about a specific model |
| GET | /ps | List currently running models |
| POST | /blobs/:digest | Create a blob for model file uploads |
Code Examples
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Connect Ollama to AI
Deploy a Ollama MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Ollama through these tools:
ollama_generate
Generate text completions from locally running LLMs with custom prompts and parameters
ollama_chat
Maintain multi-turn conversations with local AI models using chat history
ollama_embed
Generate vector embeddings from text using local embedding models for semantic search
ollama_list_models
List all locally available models and their details including size and modified date
ollama_pull_model
Download and install models from the Ollama library to local storage
Deploy in 60 seconds
Describe what you need, AI generates the code, and IOX deploys it globally.
Deploy Ollama MCP Server →