Building an MCP Server for Fitness Data: Lessons from Arvo
How we built a Model Context Protocol server to let AI assistants query workout history, training insights, and exercise data. Architecture decisions, tool design, and what we learned.
What is an MCP server and how does it work with fitness data?
A Model Context Protocol (MCP) server lets AI assistants like Claude access your fitness data through structured tools. Instead of copy-pasting workout logs into a chat, the AI can directly query your training history, analyze volume trends, and suggest programming adjustments. Arvo's MCP server exposes 8 tools covering workout history, exercise stats, volume analysis, and AI-generated insights.
TL;DR
- •Arvo's MCP server exposes 8 tools that let AI assistants query workout history, analyze volume, track progression, and surface training insights.
- •Tool design matters more than you think: tools with narrow, specific parameters (getVolumeByMuscle) outperform broad ones (analyzeTraining) because LLMs handle constrained choices better.
- •Authentication was the hardest part — MCP doesn't prescribe an auth model, so we built OAuth2 PKCE flow with scoped permissions per tool.
- •The killer use case wasn't what we expected: users ask Claude to compare their volume to evidence-based recommendations, not to generate workouts (Arvo already does that).
- •MCP adoption is early but growing — exposing your data to AI assistants is a competitive moat for fitness apps.
Why Fitness Data Needs MCP
Here's a scene that plays out thousands of times a day: someone opens ChatGPT, types “analyze my training,” and then spends five minutes copy-pasting their workout logs from whatever app they use. The AI gets a lossy, unstructured dump — missed exercises, rounded numbers, no context about RPE or progression. It does its best, but it's working with a napkin sketch of your training history.
The problem isn't the AI. It's the interface. Fitness data is trapped inside apps, accessible only through their UIs. There's no standard way for an AI assistant to say “give me this user's last 10 workouts as structured data.”
Model Context Protocol (MCP) changes this. Created by Anthropic, MCP is an open standard that lets AI assistants connect to external data sources through typed, discoverable tools. Instead of “here are my last 10 workouts [paste],” the AI calls getRecentWorkouts(limit: 10) and gets structured JSON back — every set, rep, weight, RPE score, and timestamp intact.
For Arvo, this unlocks a fundamentally different interaction. Claude can analyze your actual training volume across weeks, spot plateaus in specific lifts, compare your programming to evidence-based volume recommendations, and flag recovery issues — all using real data instead of whatever you remember to paste.
A quick primer on MCP's three primitives: tools are functions the AI can call (this is what we use most), resources are data the AI can read on demand, and prompts are reusable templates. Arvo's server is almost entirely tool-based — we want the AI to actively query data, not passively receive it.
The 8 Tools We Built
We started with 14 tool definitions, cut to 8 after testing. Fewer, sharper tools outperform a large surface area — LLMs get confused when they have too many similar options. Here are the final definitions (simplified from the actual Zod schemas):
const tools = [
{
name: 'getRecentWorkouts',
description: 'Get the user\'s recent workout sessions',
parameters: { limit: z.number().max(30).default(7) },
},
{
name: 'getExerciseHistory',
description: 'Get progression data for a specific exercise',
parameters: { exerciseName: z.string(), weeks: z.number().default(8) },
},
{
name: 'getVolumeByMuscle',
description: 'Weekly set volume per muscle group',
parameters: { weeks: z.number().default(4) },
},
{
name: 'getTrainingInsights',
description: 'AI-generated training insights and flags',
parameters: {},
},
{
name: 'getProgressionTrend',
description: 'Load progression trend for top exercises',
parameters: { topN: z.number().default(5) },
},
{
name: 'compareToRecommendations',
description: 'Compare user volume to evidence-based targets',
parameters: { muscleGroup: z.string().optional() },
},
{
name: 'getWorkoutStreak',
description: 'Training consistency and streak data',
parameters: {},
},
{
name: 'getSplitAnalysis',
description: 'Analyze current training split structure',
parameters: {},
},
];The design principle behind this set: narrow tools beat broad tools. We originally had an analyzeTraining tool that tried to do everything — volume, progression, insights, recommendations in one call. It produced mediocre results because the LLM couldn't predict what shape the response would take. Splitting it into specific tools like getVolumeByMuscle and compareToRecommendations meant the AI could compose exactly the analysis the user asked for, calling two or three tools in sequence with predictable outputs.
Tool Design Lessons
After several iterations and observing how Claude, GPT, and other models interact with our tools, four patterns emerged that meaningfully improved response quality.
Lesson 1: Name tools as actions, not nouns. getVolumeByMuscle outperforms volumeData. LLMs use tool names as semantic cues for when to call them. A verb-based name like compareToRecommendations makes it obvious that this tool should be called when a user asks “am I doing enough chest work?” A noun-based name like recommendations is ambiguous — is it for reading recommendations or generating them?
Lesson 2: Return structured data, not prose. Early versions of getVolumeByMuscle returned strings like “Your chest volume is 14 sets which is within the optimal range.” This created two problems: the AI would parrot the string instead of synthesizing across multiple tool calls, and the formatting was locked in. Now it returns { chest: { sets: 14, mev: 10, mav: 16, status: "optimal" } } and the AI formats the data however best fits the conversation — a table, a bullet list, or woven into a paragraph.
Lesson 3: Make parameters optional with sane defaults. Most of our tools have zero to two required parameters. When someone asks “how's my training going?”, the AI needs to call tools without asking clarifying questions first. If getVolumeByMuscle required a muscle group parameter, the AI would have to ask “which muscle group?” before doing anything useful. With weeks defaulting to 4 and no required params, it can immediately return a full overview.
Lesson 4: Cap result sizes. Returning 90 days of workout data in a single call overwhelms the context window and degrades response quality. Every tool has a default limit and a maximum. getRecentWorkouts defaults to 7 sessions, caps at 30. getExerciseHistory defaults to 8 weeks. If the user needs more, they can ask and the AI will call with a higher limit — but the default path stays fast and focused.
Authentication: The Hard Part
MCP defines how tools are discovered and called. It does not define how users authenticate. This is left entirely to the server implementer, and it's where most of the complexity lives.
Our approach uses an OAuth2 PKCE flow:
- User opens Arvo's settings and navigates to the MCP integration page
- They click “Generate MCP Token” — this creates a scoped, read-only token tied to their account
- They paste the token into their Claude Desktop (or other MCP client) configuration
- Every tool call from the AI includes this token for authentication
The auth middleware on the server side is straightforward:
async function authenticateRequest(token: string) {
const { data: session } = await supabase.auth.getUser(token);
if (!session.user) throw new McpError('UNAUTHORIZED');
const scopes = await getTokenScopes(token);
return { userId: session.user.id, scopes };
}The tricky part is the architecture. MCP servers typically run locally on the user's machine (via stdio transport), but our data lives in Supabase. We can't embed a Supabase service-role key in a locally-running process — that would be a security disaster. Instead, the local MCP server acts as a thin proxy: it receives tool calls from Claude, forwards them to Arvo's REST API with the user's auth token, and the API queries Supabase with Row Level Security (RLS) ensuring users can only access their own data. The user's token never touches Supabase directly from the client — it's validated server-side through our API layer.
What Users Actually Ask Claude
We had assumptions about how people would use the MCP integration. We expected the primary use case to be workout generation — “create me a push day.” That made sense: it's what people do with ChatGPT today.
We were wrong. Arvo already generates workouts with its multi-agent periodization engine, and users know that. What they wanted was something Arvo's chat interface doesn't do: open-ended analysis. They wanted Claude to be a training analyst that reads their Arvo data, not a replacement for Arvo's workout generator.
The top use cases, ranked by query frequency:
Top MCP Use Cases
| Use Case | Tools Called | % of Queries | |
|---|---|---|---|
| Volume audit | compareToRecommendations, getVolumeByMuscle | 34% | |
| Plateau analysis | getExerciseHistory, getTrainingInsights | 28% | |
| Program review | getSplitAnalysis, getVolumeByMuscle | 19% | |
| Consistency check | getWorkoutStreak | 12% | |
| Raw data export | getRecentWorkouts | 7% |
Some real examples of what users ask:
- Volume auditing: “Am I training enough back? Compare my volume to what the research recommends.” Claude calls
compareToRecommendationsandgetVolumeByMuscle, cross-references the user's actual sets with MEV/MAV/MRV landmarks, and identifies undertrained muscle groups. - Plateau detection: “My bench press has stalled for 3 weeks, what should I change?” Claude pulls 8 weeks of bench history via
getExerciseHistory, confirms the stall in the data, then checksgetTrainingInsightsfor related flags like insufficient chest volume or high fatigue accumulation. - Program analysis: “Is my PPL split balanced?” Claude calls
getSplitAnalysisto understand the split structure, thengetVolumeByMuscleto check if any muscle groups are disproportionately under- or over-trained relative to the split's design. - Accountability: “How consistent have I been this month?”
getWorkoutStreakreturns current streak, longest streak, training frequency over the last 30 days, and missed-day patterns.
The pattern is clear: users treat the MCP integration as a second opinion on their training. Arvo generates the program; Claude audits it. These are complementary, not competing.
Technical Architecture
The full request flow from AI client to data:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Claude / │────▶│ Arvo MCP │────▶│ Arvo API │
│ AI Client │◀────│ Server (local) │◀────│ (Supabase) │
└──────────────┘ └─────────────────┘ └──────────────┘
stdio HTTP + Auth RLS queriesThree layers, each with a clear responsibility:
- AI Client (Claude Desktop, Cursor, etc.) communicates with the MCP server over stdio — the simplest transport. The client discovers available tools on startup and calls them as needed during conversations.
- Arvo MCP Server runs as a local process on the user's machine. It validates tool parameters, forwards requests to Arvo's API with the user's auth token attached, and transforms responses into the format the AI client expects. No data is stored locally.
- Arvo API (backed by Supabase) handles authentication, authorization, and data access. Every query runs through Row Level Security — even if someone tampered with the local MCP server, they could only access data belonging to the authenticated user.
Performance is solid for an interactive use case: p50 = 180ms, p95 = 450ms per tool call. Most of that time is the Supabase query — the MCP server itself adds less than 20ms of overhead. Since Claude typically calls 2–3 tools per user question, the total data-fetching time is under a second even at the 95th percentile.
For the full setup instructions and tool reference, see the MCP documentation.
What's Next for Fitness + MCP
Today's server is read-only. That's a deliberate choice — we wanted to nail the data access patterns before letting AI modify anything. But the roadmap is clear:
Write tools. “Move my leg day to Thursday” should work. A rescheduleSession tool with confirmation flow (the AI proposes the change, you approve) would cover the most-requested write operation. Swapping exercises, adjusting target sets, and marking deload weeks are next.
Cross-app integration. The real power of MCP is composability. Imagine one Claude conversation with both your Arvo training data and your MyFitnessPal nutrition data. “I've been stalling on bench — am I eating enough protein on push days?” Each app provides its own MCP server; Claude orchestrates across both. No integration partnership required.
Persistent coaching mode. Today, each Claude conversation starts fresh. With MCP, the AI could maintain a running context of your training — noticing that your squat has been trending up for 6 weeks and proactively suggesting a deload before you ask. This shifts from reactive Q&A to proactive coaching.
The competitive angle. MCP adoption is still early, but it's accelerating. The first fitness app with solid MCP support gets a distribution advantage through every AI assistant that supports the protocol — Claude, ChatGPT, Cursor, and whatever comes next. Your data becomes accessible everywhere, and the app that provides that access becomes the one users stick with.
If you're curious about the AI architecture that generates the training data MCP exposes, read about Arvo's multi-agent periodization engine.
Related Articles
MCP (Model Context Protocol) is an open standard created by Anthropic. Arvo's MCP integration is available to Pro subscribers. See our MCP documentation for setup instructions and the developer docs for the full API reference.