Documentation
Query thousands of datasets via MCP or REST API
Introduction
Turn your LLM into a data analyst with the Subsets data warehouse. We index thousands of datasets from statistical agencies, and make them easily queryable via MCP/REST.
All data is indexed via open source connectors, so you can inspect lineage and verify the numbers.
If you prefer to keep things local, you can sync datasets to your machine. We use Delta Lake internally which allows for efficient syncing. Once synced, query with our local MCP or CLI.
Remote
Connect to our hosted MCP server. No installation required.
Everything runs on our servers. Search and query all datasets directly without downloading anything.
MCP Setup
Claude Code
claude mcp add --transport sse subsets https://mcp.subsets.io/sseClaude Desktop
Add to your config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"subsets": {
"url": "https://mcp.subsets.io/sse"
}
}
}Authentication
Browse and search datasets without logging in. To run SQL queries, Claude will prompt you to authenticate via OAuth.
When you ask Claude to query data, it will give you a code. Enter it at subsets.io/authorize to grant access.
MCP Tools
Tools available to Claude via the hosted MCP server.
search_catalogSearch the dataset catalog
query:stringlimit:int(optional)default: 20max: 100inspect_datasetsGet metadata, schema, and stats
dataset_ids:string[]max: 20 datasetsexecute_queryauth requiredRun SQL against datasets
query:stringoutput_format:json | tsv(optional)default: tsvREST API
Prefer REST? Use your API key from subsets.io/settings.
Search datasets
curl "https://api.subsets.io/datasets?q=gdp&limit=10"Get dataset info
curl "https://api.subsets.io/datasets/bea_gdp_chained_dollars_quarterly/summary"Execute SQL (requires auth)
curl -X POST https://api.subsets.io/sql/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM bea_gdp_chained_dollars_quarterly LIMIT 10"}'Local
Run everything on your machine. Sync datasets once, query offline forever.
The CLI manages your local dataset collection and runs the MCP server. Use search to discover datasets from the catalog,add to download them, and query to run SQL locally with DuckDB.
Install
Clone the repo and install with uv. Then login with your API key from subsets.io/settings.
git clone https://github.com/subsetsio/subsets-mcp-server.git
cd subsets-mcp-server
uv tool install .
subsets loginMCP Setup
Claude Code
claude mcp add subsets -- subsets mcpClaude Desktop
Add to your config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"subsets": {
"command": "subsets",
"args": ["mcp"]
}
}
}CLI
Manage your local dataset collection. Search hits the remote catalog, queries run locally.
subsets search <query>Search the remote catalog to discover datasets
subsets search inflationsubsets inspect <dataset_id>View dataset details, schema, and stats
subsets inspect wdi_gdp_per_capitasubsets add <dataset_id>Download a dataset to your local collection
subsets add wdi_gdp_per_capitasubsets listList datasets in your local collection
subsets query <sql>Run SQL against local datasets (DuckDB)
subsets query "SELECT * FROM wdi_gdp_per_capita LIMIT 10"subsets syncUpdate local datasets to latest versions
subsets remove <dataset_id>Remove a dataset from your local collection
subsets statusShow config and disk usage
subsets mcpStart the local MCP server
MCP Tools
Tools available to Claude via the local MCP server. Discovery tools hit the remote catalog, queries run locally.
search_catalogSearch the remote catalog to discover datasets
query:stringlimit:int(optional)default: 20max: 100inspect_datasetsGet metadata, schema, and stats from remote catalog
dataset_ids:string[]max: 20 datasetsadd_datasetsDownload datasets to local collection
dataset_ids:string[]search_localSearch your local collection
query:string(optional)execute_queryRun SQL against local datasets (DuckDB)
query:stringoutput_format:json | tsv(optional)default: tsvsync_datasetsUpdate local datasets to latest versions
dataset_ids:string[](optional)default: allremove_datasetsRemove datasets from local collection
dataset_ids:string[]