An MCP Server that works with Roo Code or Cline.Bot (Currently Untested with Claude Desktop or CoPilot MCP VS Code Extension) to optimize costs by intelligently routing coding tasks between local LLMs and paid APIs.
LocalLama MCP Server is designed to reduce token usage and costs by dynamically deciding whether to offload a coding task to a local, less capable instruct LLM (e.g., LM Studio, Ollama) versus using a paid API.
# Clone the repository
git clone https://github.com/yourusername/locallama-mcp.git
cd locallama-mcp
# Install dependencies
npm install
# Build the project
npm run build
Copy the .env.example
file to create your own .env
file:
cp .env.example .env
Then edit the .env
file with your specific configuration:
# Local LLM Endpoints
LM_STUDIO_ENDPOINT=http://localhost:1234/v1
OLLAMA_ENDPOINT=http://localhost:11434/api
# Configuration
DEFAULT_LOCAL_MODEL=qwen2.5-coder-3b-instruct
TOKEN_THRESHOLD=1500
COST_THRESHOLD=0.02
QUALITY_THRESHOLD=0.7
# Benchmark Configuration
BENCHMARK_RUNS_PER_TASK=3
BENCHMARK_PARALLEL=false
BENCHMARK_MAX_PARALLEL_TASKS=2
BENCHMARK_TASK_TIMEOUT=60000
BENCHMARK_SAVE_RESULTS=true
BENCHMARK_RESULTS_PATH=./benchmark-results
# API Keys (replace with your actual keys)
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Logging
LOG_LEVEL=debug
Local LLM Endpoints
LM_STUDIO_ENDPOINT
: URL where your LM Studio instance is runningOLLAMA_ENDPOINT
: URL where your Ollama instance is runningConfiguration
DEFAULT_LOCAL_MODEL
: The local LLM model to use when offloading tasksTOKEN_THRESHOLD
: Maximum token count before considering offloading to local LLMCOST_THRESHOLD
: Cost threshold (in USD) that triggers local LLM usageQUALITY_THRESHOLD
: Quality score below which to use paid APIs regardless of costAPI Keys
OPENROUTER_API_KEY
: Your OpenRouter API key for accessing various LLM servicesNew Tools
clear_openrouter_tracking
: Clears OpenRouter tracking data and forces an updatebenchmark_free_models
: Benchmarks the performance of free models from OpenRouterWhen integrating with Cline.Bot or Roo Code, you can pass these environment variables directly:
npm start
The server integrates with OpenRouter to access a variety of free and paid models from different providers. Key features include:
clear_openrouter_tracking
tool to force a fresh update of modelsTo use the OpenRouter integration:
OPENROUTER_API_KEY
in the environment variablesclear_openrouter_tracking
tool through the MCP interfaceCurrent OpenRouter integration provides access to approximately 240 models, including 30+ free models from providers like Google, Meta, Mistral, and Microsoft.
To use this MCP Server with Cline.Bot, add it to your Cline MCP settings:
{
"mcpServers": {
"locallama": {
"command": "node",
"args": ["/path/to/locallama-mcp"],
"env": {
"LM_STUDIO_ENDPOINT": "http://localhost:1234/v1",
"OLLAMA_ENDPOINT": "http://localhost:11434/api",
"DEFAULT_LOCAL_MODEL": "qwen2.5-coder-3b-instruct",
"TOKEN_THRESHOLD": "1500",
"COST_THRESHOLD": "0.02",
"QUALITY_THRESHOLD": "0.07",
"OPENROUTER_API_KEY": "your_openrouter_api_key_here"
},
"disabled": false
}
}
}
Once configured, you can use the MCP tools in Cline.Bot:
get_free_models
: Retrieve the list of free models from OpenRouterclear_openrouter_tracking
: Force a fresh update of OpenRouter models if you encounter issuesbenchmark_free_models
: Benchmark the performance of free models from OpenRouterExample usage in Cline.Bot:
/use_mcp_tool locallama clear_openrouter_tracking {}
This will clear the tracking data and force a fresh update of the models, which is useful if you're not seeing any free models or if you want to ensure you have the latest model information.
The project includes a comprehensive benchmarking system to compare local LLM models against paid API models:
# Run a simple benchmark
node run-benchmarks.js
# Run a comprehensive benchmark across multiple models
node run-benchmarks.js comprehensive
Benchmark results are stored in the benchmark-results
directory and include:
The repository includes benchmark results that provide valuable insights into the performance of different models. These results:
npm run dev
npm test
.gitignore
file is configured to prevent sensitive data from being committed to the repository.env
file, which is excluded from version controlISC
Seamless access to top MCP servers powering the future of AI integration.