The Dataset Imperative to Guide AI-Driven Healthspan and Performance

With the launch of LLM-Health, carefully designed and curated datasets will determine how healthy we become. Physician and Performance Scientist-led health companies have a distinct advantage.

Jan 25, 2026

I participated this weekend as a mentor at Yale’s Healthcare Hackathon. More than 220 participants (47% Yale-affiliated, spanning CT, NYC, CA, and Brazil) proposed over 50 health-focused ventures.

Nearly half involved using large language models (LLMs) to make health data more understandable to individuals.

Everyone innovation-adjacent sees the opportunity—including the major AI companies—which makes the past two weeks particularly intriguing.

The Launch of ChatGPT Health, Claude for Healthcare, and Amazon’s AI Health Assistant Marks a New Era

“based on our de-identified analysis of conversations, over 230 million people globally ask health and wellness related questions on ChatGPT every week”

Within the last two weeks, OpenAI, Anthropic, and Amazon all launched an LLM that helps you interpret one’s health and medical data. There is absolutely no doubt that LLMs will guide medical and health decision-making in the future. And the rationale is simple: Healthcare has spent the last 150 years classifying diseases, defining diagnostic criteria, and developing evidence-based guidelines to standardize treatment. This is a perfectly tractable problem for AI.

Today, the prevailing workflow looks like this:

Upload Healthcare Data → Align to Medical Guidelines → Return Optimal Treatment Plan → Prescribe Rx (we also recently saw a pilot in Utah where AI can autonomously refill prescription meds without a doctor in the loop)

Yet very few people feel they truly understand how healthy they are.

There are many reasons for this—some deeply structural—but LLMs are among the first ubiquitous tools capable of translating complex health information into something people can actually understand.

Even with the largely unknown security risks of uploading one’s entire medical and health history into an LLM, the value proposition of feeling empowered to understand one’s own health is so compelling that hundreds of millions of people will choose to do it anyway.

What I’ve been thinking about most is how these LLMs reason about health—and, critically, what data and health frameworks fuel their recommendations.

Health Is Not the Absence of Disease—and LLMs Need New Benchmarks

From Static to Functional Measurements: Towards a Truer Approximation of Health

Brooks Leitner, MD, PhD

January 24, 2025

Read full story

Health, defined not as the absence of disease but as functional capacity, lacks clear, widely adopted benchmarks.

As a parallel health system is built alongside the traditional sick-care system, solving this problem becomes existential.

LLMs trained primarily on disease, decline, and pathology will naturally optimize for those outcomes. If we want models that guide health promotion, performance, and longevity, we need fundamentally different datasets.

To Train the Models That Guide Health Promotion, Deep Expertise Must Shape the Data

I believe it’s a good thing that nearly everyone now has access to a model capable of helping them understand complex health information that was previously inaccessible.

But if LLMs will guide medical treatment, health promotion, and performance enhancement, the real question becomes: how do health-tech companies and academic institutions arm these models with the right datasets?

At minimum, there are three dataset components that must be optimized to guide the health plans of the future:

Benchmarks of Health
Trajectories of Health Improvement (because decline is the default dataset)
Personalization

Health and fitness companies that measure meaningful benchmarks of healthspan and performance—especially VO₂max, body composition, and insulin sensitivity—integrated with next-generation biomarkers (e.g., omics, biobanks) will have a sustained advantage.

Because generative AI models are designed to predict the next state in a sequence, longitudinal datasets showing how people achieve high benchmarks—or depart from healthy states—are essential. Today, those datasets are rare.

Why Personalization Has Stalled—and How Omics Changes the Equation

Personalization has yet to be fully realized. I believe omics has the potential to unlock it in ways health and longevity have not yet achieved.

Consider two people with identical VO₂max values (45 mL/kg/min) and HbA1c levels (5.4%). Even if they differ by age and sex, most AI systems today would generate nearly identical protocols.

That’s the ceiling of personalization when we rely solely on traditional clinical and fitness metrics.

Now imagine having access to the transcriptome or metabolome underlying fitness or insulin sensitivity. From decades of academic literature, we know we’d observe variation across molecular pathways related to inflammation, growth factors, carbohydrate metabolism, and fat metabolism—independently targetable biological systems responsive to diet, exercise prescription, and even specific drugs or supplements (e.g., creatine).

Simply measuring thousands of molecules isn’t enough. The real challenge—and opportunity—is identifying the meaningful few that drive outcomes.

The Scientist-Led Advantage in the Healthspan Era

My thesis is that researcher-led companies—those combining integrative physiology, medicine, laboratory science, bioinformatics, and AI—are uniquely positioned to curate these datasets responsibly and effectively. (By the way, we are seeing this distinction between Anthropic’s CEO Dario Amodei and Deepmind’s Demis Hassabis— a Nobel Prize Winning scientist, vs. OpenAI’s Sam Altman— listen more to Dario’s thoughts at Davos)

They are best equipped to consider the full biological context required to usher in a new era of personalized healthspan and performance.

If you want to understand how we’re integrating omics into a practical guide to improve health and performance, reach out to my team and me at hello@vohealth.co.

Z.B. Quarn

Jan 26

Once LLMs become the layer people use to interpret health data, the real differentiator won’t just be the model. It will end up being the datasets and frameworks behind the recommendations. If training is mostly disease and decline, the outputs will naturally skew toward risk management instead of true health optimization.

The next era needs real benchmarks of healthspan plus longitudinal data that shows what improvement actually looks like over time. On the ops side, platforms like Alora already capture structured, visit-level outcomes in the real world, exactly the kind of context models will need to support better decisions, not just better explanations.

3 replies

3 more comments...

Healthspan, Decoded.

From Static to Functional Measurements: Towards a Truer Approximation of Health

Discussion about this post

Ready for more?