Foundation Model

A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks — GPT-4, Claude, Gemini, and Llama are all foundation models.

A foundation model is a large machine intelligence system trained on broad data at scale, then adapted to downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models.

For most companies, the real decision is not "train a model" — it's "which foundation model do we build on, and how do we adapt it?" The adaptation stack is usually prompting, retrieval, and occasionally fine-tuning. Training from scratch is almost never the right call unless you have dataset advantages that no foundation model provider can replicate.

Foundation models also make AI a platform game. Vendor differences show up in cost, latency, capability ceiling, and policy behavior — and those differences matter at the margins. The bigger risk: if you build too tightly around one provider's quirks, switching later gets expensive. Design your system so the model is a swappable component, not a load-bearing wall.

The term was coined in Stanford's 2021 paper mapping the opportunities and risks of large, general-purpose models. It's since become the default framing for how organizations think about deploying machine intelligence.

Further reading:

On the Opportunities and Risks of Foundation Models (Stanford HAI, 2021) — the paper that coined the term and mapped the landscape of risks and opportunities.
Stanford CRFM — the research center dedicated to foundation model study, benchmarking, and transparency.

Related Terms