Documentation

Query thousands of datasets via MCP or REST API

Introduction

Turn your LLM into a data analyst with the Subsets data warehouse. We index thousands of datasets from statistical agencies, and make them easily queryable via MCP/REST.

All data is indexed via open source connectors, so you can inspect lineage and verify the numbers.

If you prefer to keep things local, you can sync datasets to your machine. We use Delta Lake internally which allows for efficient syncing. Once synced, query with our local MCP or CLI.

Remote

Connect to our hosted MCP server. No installation required.

Everything runs on our servers. Search and query all datasets directly without downloading anything.

MCP Setup

Claude Code

Bash
claude mcp add --transport sse subsets https://mcp.subsets.io/sse

Claude Desktop

Add to your config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

JSON
{
  "mcpServers": {
    "subsets": {
      "url": "https://mcp.subsets.io/sse"
    }
  }
}

Authentication

Browse and search datasets without logging in. To run SQL queries, Claude will prompt you to authenticate via OAuth.

When you ask Claude to query data, it will give you a code. Enter it at subsets.io/authorize to grant access.

MCP Tools

Tools available to Claude via the hosted MCP server.

search_catalog

Search the dataset catalog

query:string
limit:int(optional)default: 20max: 100
inspect_datasets

Get metadata, schema, and stats

dataset_ids:string[]max: 20 datasets
execute_queryauth required

Run SQL against datasets

query:string
output_format:json | tsv(optional)default: tsv

REST API

Prefer REST? Use your API key from subsets.io/settings.

Search datasets

Bash
curl "https://api.subsets.io/datasets?q=gdp&limit=10"

Get dataset info

Bash
curl "https://api.subsets.io/datasets/bea_gdp_chained_dollars_quarterly/summary"

Execute SQL (requires auth)

Bash
curl -X POST https://api.subsets.io/sql/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT * FROM bea_gdp_chained_dollars_quarterly LIMIT 10"}'

Local

Run everything on your machine. Sync datasets once, query offline forever.

The CLI manages your local dataset collection and runs the MCP server. Use search to discover datasets from the catalog,add to download them, and query to run SQL locally with DuckDB.

Install

Clone the repo and install with uv. Then login with your API key from subsets.io/settings.

Bash
git clone https://github.com/subsetsio/subsets-mcp-server.git
cd subsets-mcp-server
uv tool install .
subsets login

MCP Setup

Claude Code

Bash
claude mcp add subsets -- subsets mcp

Claude Desktop

Add to your config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

JSON
{
  "mcpServers": {
    "subsets": {
      "command": "subsets",
      "args": ["mcp"]
    }
  }
}

CLI

Manage your local dataset collection. Search hits the remote catalog, queries run locally.

subsets search <query>

Search the remote catalog to discover datasets

Bash
subsets search inflation
subsets inspect <dataset_id>

View dataset details, schema, and stats

Bash
subsets inspect wdi_gdp_per_capita
subsets add <dataset_id>

Download a dataset to your local collection

Bash
subsets add wdi_gdp_per_capita
subsets list

List datasets in your local collection

subsets query <sql>

Run SQL against local datasets (DuckDB)

Bash
subsets query "SELECT * FROM wdi_gdp_per_capita LIMIT 10"
subsets sync

Update local datasets to latest versions

subsets remove <dataset_id>

Remove a dataset from your local collection

subsets status

Show config and disk usage

subsets mcp

Start the local MCP server

MCP Tools

Tools available to Claude via the local MCP server. Discovery tools hit the remote catalog, queries run locally.

search_catalog

Search the remote catalog to discover datasets

query:string
limit:int(optional)default: 20max: 100
inspect_datasets

Get metadata, schema, and stats from remote catalog

dataset_ids:string[]max: 20 datasets
add_datasets

Download datasets to local collection

dataset_ids:string[]
search_local

Search your local collection

query:string(optional)
execute_query

Run SQL against local datasets (DuckDB)

query:string
output_format:json | tsv(optional)default: tsv
sync_datasets

Update local datasets to latest versions

dataset_ids:string[](optional)default: all
remove_datasets

Remove datasets from local collection

dataset_ids:string[]