What Is a Biological Foundation Model? How AI Learns to Read the Epigenome

You’ve probably used an AI foundation model already, even if the phrase means nothing to you. The systems behind tools like ChatGPT are foundation models. They’re trained on enormous amounts of text until they’ve absorbed the patterns of human language so well that one model can be pointed at all kinds of jobs: writing, summarizing, translating, and answering questions.

Now take that same idea and train it on biology instead of language. Not on words and sentences, but on the biochemical signals your cells use to run themselves. That’s what a biological foundation model is, and it may turn out to be one of the more consequential tools in the next era of medicine.

What makes something a “foundation” model

Most AI you’ve run into over the years was narrow. One model, one job. A spam filter couldn’t recognize faces. A model trained to read chest X-rays was useless on an EKG.

A foundation model breaks that pattern. It’s trained broadly enough on a rich underlying set of signals that it picks up the deep structure of an entire domain, and that general grasp can then be adapted to lots of specific tasks. The “foundation” is the shared base of understanding everything else is built on.

The important part is that the model isn’t memorizing answers. Instead, it’s learning the underlying logic of biological patterns. Once the logic is learned, a wide range of clinical applications become possible.

Biology has a language, and the epigenome is where it’s spoken

So what’s the right “language” for a foundation model of human health and aging?

Your genome, the DNA sequence itself, is fundamental but static. It’s identical in every cell and it doesn’t change over your life. Think of it as the dictionary of set words rather than a live, evolving conversation.

The epigenome is where the conversation happens. It’s the layer of biochemical tags, including DNA methylation, that decides which genes are switched on or off at any given moment. The epigenome dictates the identity of each cell and explains why, despite having the same genome, we have so many different cell types. It changes substantially in response to age, disease, diet, stress, and environment. If the genome is the dictionary, the epigenome is closer to what your body is saying right now.

That makes it a remarkably good thing to train an AI on. The signal is dense: modern arrays read about a million methylation marks from a few drops of blood, compared to the roughly fifty marks that a standard panel manages.¹ The epigenetic marks shift with what your body is doing now, and because they sit across the whole genome (there are approximately 28 million widely distributed DNA methylation sites), they touch every system in the body at once.

For a long time these epigenetic signals were effectively unreadable at scale. The data was too vast and tangled for people, and computing was too expensive to brute-force. Two things shifted. Epigenetics matured into a validated, dynamic readout of the body, and both ML and AI got powerful and cheap enough to interpret it.² A biological foundation model lives right at that meeting point.

The flywheel: why the model keeps getting smarter

What sets a biological foundation model apart from a one-off algorithm? Its compounding nature.

Every test run on the platform reads a million signals from a new person, and each of those samples becomes new training data. More tests mean more data. More data means a sharper model. A sharper model produces better, more accurate insights. Better insights pull in more adoption. And more clinical adoption means more tests again.

People sometimes call that loop a data flywheel. It’s self-reinforcing. The platform doesn’t just hold its value over time; it actually improves the more it gets used. Infinite Epigenetics built its model, which it calls Biological Intelligence™, on one of the world’s largest DNA-methylation datasets, more than 120,000 samples and climbing with every test.

One model, many powerful answers

Because a foundation model learns the underlying grammar of biology rather than one narrow task, the same model can be adapted to a surprising range of questions off a single blood sample. It can flag disease risk before symptoms appear. It can estimate biological age, meaning how your body is aging on a molecular and cellular level rather than how many birthdays you’ve had. It can anticipate who is likely to respond to a given therapy before a trial starts. The list keeps growing, from metabolic and cardiovascular risk through applications still sitting on the research roadmap.

The leverage is in what comes after the foundation is built. Once the base exists, adding a new application costs almost nothing extra. The same finger-prick that screens for one condition can, in principle, be read for many. Build the base once, apply it broadly.

The breakthrough behind foundation models was the realization that if you train an AI deeply enough on the right signal, general understanding and a lot of specific abilities emerge from it. For human health and aging, the right signal is the epigenome: dynamic, dense, and finally readable now that AI can do the reading. A biological foundation model learns to read the operating system of the body and translate it into earlier, clearer, and more personal health insights. And because it iteratively improves with every test, it’s a dynamic system that is continuously learning.

Infinite Epigenetics is building a biological AI foundation model trained on one of the world’s largest DNA-methylation datasets. Every test its operating companies TruDiagnostic and Tally Health run adds to the shared dataset behind the model. Written for general education; not investment advice.

Sources