Remote MCP Server Guide¶
Build an MCP server, run it locally, and deploy it as a remote service any MCP client (agents, IDEs)
can call over the network. A complete, runnable reference implementation ships alongside this guide:
example-mcp-server/.
🤖 LLM agents: if you're an agent helping a user wrap their own code as a remote MCP server, read the agent playbook →
for-llm-agents.mdfirst — it has the build steps, the conventions to follow, and the decisions to confirm with your user before writing code. Machine-readable site index:/llms.txt.
Who this is for¶
You're comfortable with Python and know the basics of MCP (an LLM/agent calls "tools" your server exposes), and you want to put an MCP server online so an agent can reach it.
Prerequisites
- Python 3.10+ and uv (or plain pip).
- For the remote part only: a Cloudflare account with a domain on it.
What you'll have at the end: a working MCP server reachable at https://<your-host>/mcp, protected by
an auth token, that an agent can connect to and call.
Reading path: Concepts → Architecture → Run locally → Deploy remotely → Connect a client → Troubleshooting. If you just want it running, skip to Run it locally.
Concepts (1 minute)¶
MCP (Model Context Protocol) is a standard way for an LLM/agent to discover and call your code. Your server exposes tools (functions the agent can call), and optionally resources (readable data) and prompts (templates). Full spec: https://modelcontextprotocol.io.
An MCP server can run two ways:
| Local (stdio) | Remote (HTTP) ← this guide | |
|---|---|---|
| How the client reaches it | spawns your process locally | over the network at a URL |
| Good for | quick local tools | heavy/long jobs, GPU, shared across teams, wrapping a service |
| Transport | stdio | Streamable HTTP (endpoint at /mcp) |
This guide uses the official mcp Python SDK (FastMCP) with Streamable HTTP.
Architecture (mental model)¶
flowchart LR
A["Agent / IDE"] --> B["MCP Client"]
B -->|"HTTPS /mcp + token"| C["Cloudflare Access<br/>(checks service token)"]
C --> D["Cloudflare Tunnel<br/>(cloudflared)"]
D --> E["Local FastMCP server<br/>127.0.0.1:8900"]
E --> F["Tool / Job / File"]
What each hop does:
- Agent / IDE wants to call a tool.
- MCP Client opens a Streamable-HTTP session to
https://<your-host>/mcp, sending the auth headers. - Cloudflare Access checks the service token at the edge; no token → blocked (401/403).
- Cloudflare Tunnel (
cloudflared) forwards allowed traffic to your machine — no open ports, no public IP, TLS handled for you. - FastMCP server (bound to
127.0.0.1) handles the MCP request. - It runs a tool, kicks off a job, or serves a file.
Locally (the next section) you talk straight to step 5 — no Cloudflare needed.
Run it locally¶
Get example-mcp-server/ (it ships with this guide; on the docs site use the Download button).
cd example-mcp-server
python --version # need 3.10+
uv sync # create venv + install deps (or: pip install mcp)
python server.py # start the server
You should see uvicorn start:
INFO: Started server process [12345]
INFO: Uvicorn running on http://127.0.0.1:8900 (Press CTRL+C to quit)
In a second shell, verify it:
curl -s http://127.0.0.1:8900/healthz
# -> {"ok": true}
python smoke_test.py http://127.0.0.1:8900/mcp
Expected success output:
tools: ['add', 'echo', 'start_render', 'get_render_status', 'get_render_result']
add(2,3) -> {'sum': 5.0}
start_render -> {'job_id': '2669ffa4...', 'state': 'queued'}
status -> running 1/3 [1/3] preparing assets
status -> running 2/3 [2/3] rendering frames
status -> running 3/3 [3/3] encoding output
status -> succeeded 3/3 done
result -> {'ready': True, 'url': 'http://127.0.0.1:8900/files/2669ffa4...'}
OK ✅
No .env is needed for local runs. When that works, you have a real MCP server — now make it remote.
Server reference¶
Defining tools¶
A tool is a decorated function. Its docstring is what the agent reads to decide when to call it, so
make it precise. Return JSON-serializable data; on failure return {"error": "..."} instead of raising.
Validate inputs.
from typing import Annotated
from mcp.server.fastmcp import FastMCP
from pydantic import Field
mcp = FastMCP("example-mcp", host="127.0.0.1", port=8900)
@mcp.tool()
def add(
a: Annotated[float, Field(description="First number to add.")],
b: Annotated[float, Field(description="Second number to add.")],
) -> dict:
"""Add two numbers and return {"sum": a+b}."""
return {"sum": a + b}
def main():
mcp.run(transport="streamable-http") # endpoint at /mcp
Per-parameter descriptions. FastMCP derives inputSchema from the type hints, but the docstring only
becomes the tool-level description — FastMCP does not parse a docstring Args: block into
per-parameter docs. To describe each argument (which the agent sees in the schema), annotate it with
Annotated[T, Field(description="...")], as above. Keep the python default outside the Annotated
(x: Annotated[int, Field(description="...")] = 8) so the parameter stays optional. Field can also carry
validation (ge/le/min_length/pattern) and examples, which flow into the schema too — but note that
adds rejection of out-of-range values, so don't use it on a parameter you intend to silently clamp.
Tool names: ^[A-Za-z0-9_-]+$, unique, non-empty description, valid JSON-Schema inputSchema (FastMCP
derives it from the type hints, with per-parameter descriptions coming from Annotated[..., Field(...)]).
HTTP endpoints¶
| Method & Path | Purpose | Auth |
|---|---|---|
POST /mcp |
MCP Streamable HTTP (initialize, tools/list, tools/call, …) | required |
GET /healthz |
Liveness probe → {"ok": true} |
required* |
GET /files/{id} |
Download an artifact produced by a job | required* |
* Behind Cloudflare Access, every path on the host requires the service token. With the app-level
Bearer option (3B), /mcp is gated by the built-in verifier, /files by an in-handler check, and
/healthz is intentionally public.
Add non-MCP routes with @mcp.custom_route:
from starlette.requests import Request
from starlette.responses import JSONResponse, FileResponse
@mcp.custom_route("/healthz", methods=["GET"])
async def healthz(request: Request):
return JSONResponse({"ok": True})
Async job contract (long tasks)¶
Never block a tool for minutes. Split long work into three tools:
| Tool | Input | Output |
|---|---|---|
start_<x> |
job spec | {job_id, state} |
get_<x>_status |
job_id |
{job_id, state, stage, total_stages, message, error?} |
get_<x>_result |
job_id |
{ready, url, …} |
state ∈ {queued, running, succeeded, failed} — exactly these four; clients treat
succeeded / failed as terminal. The client calls start_*, polls get_*_status (typically every
10–60 s) until terminal, then calls get_*_result. See start_render / get_render_status /
get_render_result in example-mcp-server/server.py (a background thread does the work; the demo job
store is in-memory — use a DB/redis/file in production so it survives restarts).
Progress reporting (required for long jobs)¶
Status responses drive the caller's UI — consuming agents render a live progress bar automatically from these fields, so report them on every poll:
| Field | Type | Meaning |
|---|---|---|
stage |
int | current step, 0..total_stages |
total_stages |
int | total number of steps |
message |
str | one-line human-readable current step, e.g. "[2/3] rendering frames" |
job_id |
str | echo it in every payload so clients can correlate polls |
Rules:
get_<x>_statusmust return immediately — clients poll it; never block inside it.- Derive
total_stagesfrom the real pipeline (e.g.len(STEPS), or thread the total through your progress callback). A hardcoded total silently desyncs when you add a step — the caller's progress bar then shows6/5. - On success, set
stage = total_stagesso the bar completes at 100%. get_<x>_resultreturnsready: trueonly afterstate == "succeeded".
File delivery¶
Return a downloadable URL, never a local path (remote clients can't read your disk):
import os
PUBLIC_BASE_URL = os.getenv("PUBLIC_BASE_URL", "https://example-mcp.<your-zone>")
@mcp.custom_route("/files/{job_id}", methods=["GET"])
async def serve_file(request: Request):
job_id = request.path_params["job_id"]
... # 404 if not ready
return FileResponse(path, filename=f"{job_id}.bin")
# get_render_result returns: {"ready": True, "url": f"{PUBLIC_BASE_URL}/files/{job_id}"}
Deploy remotely¶
1. Configure & run the server¶
Bind to 127.0.0.1 (only the tunnel reaches it). Set env (e.g. in .env):
| Var | Default | Purpose |
|---|---|---|
MCP_HOST |
127.0.0.1 |
bind address |
MCP_PORT |
8900 |
port |
PUBLIC_BASE_URL |
— | public origin, used to build file URLs |
2. Expose over HTTPS¶
The server binds 127.0.0.1, so something in front must terminate TLS and forward to it. Two paths — pick
the one matching your auth choice in step 3:
- Option A — Cloudflare Tunnel (recommended): Cloudflare gives you TLS, a hostname, and edge auth (the service token in 3A); your app needs no auth code.
- Option B — your own reverse proxy: you terminate TLS and the app verifies a Bearer token with FastMCP's built-in token verification (3B). No Cloudflare involved.
Option A — Cloudflare Tunnel¶
No open ports, no public IP, no self-managed TLS.
cloudflared tunnel login
cloudflared tunnel create example-mcp
# edit ~/.cloudflared/config.yml (template: deploy/cloudflared.config.example.yml)
cloudflared tunnel route dns example-mcp example-mcp.<your-zone>
cloudflared tunnel run example-mcp
# ~/.cloudflared/config.yml
tunnel: <TUNNEL_ID>
credentials-file: /home/<user>/.cloudflared/<TUNNEL_ID>.json
ingress:
- hostname: example-mcp.<your-zone>
service: http://127.0.0.1:8900 # /mcp, /healthz, /files all go here
- service: http_status:404
Use a first-level subdomain (
example-mcp.<your-zone>), not a deeper one (example-mcp.api.<your-zone>). Free Cloudflare Universal SSL only covers the apex and*.<your-zone>; a 2-level subdomain has no cert and its custom domain hangs at "Verifying".
Option B — Your own reverse proxy (no Cloudflare)¶
Terminate HTTPS yourself in front of the 127.0.0.1:8900 origin. Caddy is the least effort (automatic
Let's Encrypt certificates):
# Caddyfile — serves https://example-mcp.example.com
example-mcp.example.com {
reverse_proxy 127.0.0.1:8900 # forwards /mcp, /healthz, /files
}
(nginx + certbot, or a cloud load balancer, work the same way.) Point the host's DNS A/AAAA record at
your machine and open :443. There is now no edge auth — the proxy forwards everything — so you MUST
add the app-level Bearer check (Option B of step 3) or your tools are open to anyone who finds the URL.
3. Authenticate every request¶
A public endpoint must be gated, or anyone can call your tools. Use the option matching how you exposed it.
Option A — Cloudflare Access service token (pairs with 2A)¶
- Zero Trust → Access → Applications → Add → Self-hosted; domain =
example-mcp.<your-zone>. - Access → Service Auth → Create Service Token → copy Client ID and Client Secret.
- Add a policy: Action = Service Auth, Include = that token.
Only requests with CF-Access-Client-Id / CF-Access-Client-Secret reach your origin. Give each server
its own token, named per server (e.g. EXAMPLE_CF_CLIENT_ID / EXAMPLE_CF_CLIENT_SECRET). The app itself
needs no auth code — the edge enforces it.
Option B — App-level Bearer token (pairs with 2B)¶
With no Cloudflare Access there is no edge auth, so verify a shared secret inside the app. FastMCP has
token verification built in: pass a TokenVerifier to the constructor and the SDK itself gates /mcp,
answering missing/bad tokens with a spec-correct 401 + WWW-Authenticate: Bearer header:
import hmac
import os
from mcp.server.auth.provider import AccessToken, TokenVerifier
from mcp.server.auth.settings import AuthSettings
EXPECTED = os.environ["MCP_BEARER_TOKEN"] # a long random secret, supplied via env
class StaticVerifier(TokenVerifier):
"""Accept exactly one shared token (constant-time compare)."""
async def verify_token(self, token: str) -> AccessToken | None:
if hmac.compare_digest(token, EXPECTED):
return AccessToken(token=token, client_id="shared-secret", scopes=[])
return None
mcp = FastMCP(
"example-mcp", host=MCP_HOST, port=MCP_PORT,
token_verifier=StaticVerifier(),
auth=AuthSettings( # OAuth-shaped plumbing the SDK insists on:
issuer_url=PUBLIC_BASE_URL, # nominal "issuer" — never contacted
resource_server_url=None, # skip the RFC 9728 metadata endpoint
),
)
Built-in auth covers only /mcp — @mcp.custom_route paths stay public by design (the SDK intends
them for health checks). That's exactly right for /healthz (the container HEALTHCHECK — plain urllib,
no token — keeps working) but wrong for /files, so gate that one inside its handler:
@mcp.custom_route("/files/{job_id}", methods=["GET"])
async def serve_file(request: Request):
token = request.headers.get("authorization", "").removeprefix("Bearer ").strip()
if not hmac.compare_digest(token, EXPECTED):
return JSONResponse({"error": "unauthorized"}, status_code=401)
... # 404 if not ready, then FileResponse — unchanged
- Nothing else changes:
mcp.run(transport="streamable-http")stays (no uvicorn wrapper, no custom middleware), and the client just sendsAuthorization: Bearer <token>(see Connect a client). - The two
AuthSettingsfields are required because the SDK models auth as an OAuth resource server; for a static shared secret they are inert —issuer_urlis informational andresource_server_url=Nonedisables the/.well-known/oauth-protected-resourcemetadata route. Inert but still validated:issuer_urlmust parse as a URL, so a literalhttps://example-mcp.<your-zone>placeholder left inPUBLIC_BASE_URLcrashes at startup (ValidationError: … invalid international domain name) — export the real value first. TokenVerifieris the real seam: swapStaticVerifierfor a JWT or token-introspection verifier later without touching anything else. (The standalone FastMCP package — gofastmcp.com — ships ready-made verifiers behind the same idea, e.g.JWTVerifier; note itsStaticTokenVerifieris dev/test-only per its own docs.)- The shipped example server is unauthenticated by design (it expects to sit behind Access); this constructor swap is for when you front it with your own TLS instead. Hand the consumer the token the same way you would a service token — over a secure channel, never in the repo.
4. Keep it running (systemd)¶
Run the server and the tunnel as user services so they survive crashes/reboots (templates:
deploy/example-mcp.service, deploy/cloudflared-example-mcp.service):
systemctl --user daemon-reload
systemctl --user enable --now example-mcp cloudflared-example-mcp
loginctl enable-linger "$USER" # survive logout / reboot
4b. Or run it in a container (Docker)¶
Containerizing is an alternative to the systemd path above — pick one. A container pins the
runtime and deps and restarts on its own; the worked example ships a Dockerfile, a standalone
docker-compose.yml, and a .dockerignore so you can build and run it directly:
cd example-mcp-server
docker build -t example-mcp .
docker run --rm -p 8900:8900 example-mcp # MCP at http://localhost:8900/mcp
# verify (second shell):
curl -s http://localhost:8900/healthz # -> {"ok": true}
python smoke_test.py http://localhost:8900/mcp
Or with the bundled standalone compose file (host-published on 8900, auto-restart, healthcheck):
Container specifics:
MCP_HOST=0.0.0.0inside the container (the image sets this by default), not127.0.0.1. The localhost bind that's correct on a host keeps the port unreachable from outside the container, so the published port / compose network can't see it. Binding0.0.0.0here is safe because the container boundary (and, when remote, Cloudflare Access) is the real perimeter — there is no app-level auth.- The image declares a
HEALTHCHECKthat hitsGET /healthzwith stdliburllib(no curl in the slim image), sodocker ps/ compose report the container healthy only once the server answers. Use it to health-order dependents. - Set
PUBLIC_BASE_URLat run time (e.g.-e PUBLIC_BASE_URL=https://example-mcp.<your-zone>) so the URLs fromget_render_resultpoint at your public origin, not127.0.0.1.
Remote + auth are unchanged. A plain docker run is unauthenticated — Docker only replaces the
"keep it running" step. To go remote you still front the container with one of the step 2–3 paths: either
Cloudflare Tunnel + Access (point the tunnel ingress at the published port,
service: http://127.0.0.1:8900; the service token gates every path at the edge), or your own reverse
proxy + the built-in Bearer verifier (apply the 3B constructor swap, rebuild the image, and set
MCP_BEARER_TOKEN in the container env). Either way the auth lives outside docker run itself.
Connect a client / agent¶
Point any MCP client at https://example-mcp.<your-zone>/mcp with the service-token headers:
import asyncio, os
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
HEADERS = {
"CF-Access-Client-Id": os.environ["EXAMPLE_CF_CLIENT_ID"],
"CF-Access-Client-Secret": os.environ["EXAMPLE_CF_CLIENT_SECRET"],
}
async def main():
async with streamablehttp_client("https://example-mcp.<your-zone>/mcp", headers=HEADERS) as (r, w, _):
async with ClientSession(r, w) as s:
await s.initialize()
print([t.name for t in (await s.list_tools()).tools]) # ['add', 'echo', 'start_render', ...]
print(await s.call_tool("add", {"a": 2, "b": 3})) # -> {'sum': 5.0}
asyncio.run(main())
For a Bearer-token deployment (step 3 Option B), send one Authorization header instead of the two CF
headers — everything else is identical (per-server naming again: the consumer's EXAMPLE_MCP_TOKEN holds
the same secret the server reads as MCP_BEARER_TOKEN):
Agent frameworks register it the same way: a streamable-HTTP MCP server at the /mcp URL with whichever
auth headers your deployment uses.
Hand-off: the registration JSON¶
When your server is deployed, the deliverable to the consuming agent team is one JSON block — most agent frameworks (nanobot, Claude Desktop, …) register MCP servers in exactly this shape — plus the secret values (service token or Bearer token) sent over a secure channel:
{
"mcpServers": {
"example-mcp": {
"type": "streamableHttp",
"url": "https://example-mcp.<your-zone>/mcp",
"headers": {
"CF-Access-Client-Id": "${EXAMPLE_CF_CLIENT_ID}",
"CF-Access-Client-Secret": "${EXAMPLE_CF_CLIENT_SECRET}"
},
"enabledTools": ["add", "echo", "start_render", "get_render_status", "get_render_result"]
}
}
}
For a Bearer-token deployment, swap the headers block for a single Authorization entry (the rest is
unchanged):
- Keep
${VAR}placeholders in the JSON — the consumer stores the real token values in their own.envand the framework substitutes them at startup. Never put the secret itself in the JSON. - Name the env vars after your server (
EXAMPLE_CF_*) so a consumer can hold tokens for several MCP servers side by side without collisions. enabledToolsis the consumer-side whitelist — list exactly the tools you intend them to call.- Alongside the JSON, hand over: the tool list with one-line descriptions, the async-job contract fields if you have long tasks (see Server reference), and the Client ID/Secret or Bearer token (transmit it securely; an Access secret is shown only once at creation).
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError: No module named 'mcp' |
wrong/old Python (e.g. system 3.8) | use Python 3.10+; uv sync then .venv/bin/python server.py |
client hangs / ConnectError to localhost |
server not running on that port | confirm python server.py is up; curl 127.0.0.1:<port>/healthz |
curl: (35) … handshake failure / code 000 on the public URL |
no TLS cert yet, or nothing serving | check the tunnel + origin (below); for a new custom domain wait for the cert to issue |
| public URL returns 530 | tunnel has no active connection (origin unreachable) | cloudflared tunnel info <name>; restart cloudflared tunnel run <name>; verify curl 127.0.0.1:<port>/healthz locally |
| 401 / 403 | missing/wrong auth (Access headers, or the Authorization header in Bearer mode) |
Access: send both CF-Access-Client-Id and CF-Access-Client-Secret; check the Service Auth policy on the Access app. Bearer (3B): send Authorization: Bearer <token> matching the server's MCP_BEARER_TOKEN |
| custom domain stuck at "Verifying" | 2-level subdomain not covered by free Universal SSL | use a first-level subdomain (or enable Advanced Certificate Manager / Total TLS) |
(CI) wrangler … Project not found [code: 8000007] |
the Pages/target project doesn't exist | create it first (e.g. wrangler pages project create <name> --production-branch=main) |
Healthy smoke test prints the tools: [...], add(2,3) -> {'sum': 5.0}, the start_render →
status → result lines, and OK ✅ (see Run it locally). If you get that locally but not remotely, the
problem is in the tunnel/Access (or proxy/Bearer) layer, not your server.
Checklist¶
- [ ] Server binds
127.0.0.1; not exposed directly (tunnel or reverse proxy only). - [ ] Auth in place — Option A: Cloudflare Access service token (edge gates all paths incl.
/files); or Option B: built-in Bearer verifier on/mcp+ in-handler check on/files(/healthzpublic by design). - [ ] First-level subdomain (free SSL coverage).
- [ ] Secrets in env /
.env, never committed. - [ ] Inputs validated; tools return JSON,
{"error": ...}on failure. - [ ] Long tasks use the async job contract and report
stage/total_stages/messagein status; tools never block. - [ ] Hand-off JSON delivered (
mcpServersentry with${VAR}placeholders + per-server env-var names); token secret sent securely, never committed. - [ ] Artifacts returned as URLs, not local paths.
- [ ]
GET /healthzpresent. - [ ] Server kept running — either systemd (
Restart=always, linger enabled) or a container (restart: unless-stopped,MCP_HOST=0.0.0.0,/healthzHEALTHCHECK); tunnel + Access (or reverse proxy + Bearer) still front it.