The release of ChatGPT just over a year ago has led to widespread market interest in Foundation Models (FMs) and Large Language Models (LLMs). While we are still in the early days of AI, many companies are rapidly exploring adoption of these models. For example, Notion, Figma, and Zoom have already deeply integrated LLMs into a core part of their product offerings. At the same time, larger enterprises including Morgan Stanley, PwC, and Walmart are rolling out both internal and customer-facing solutions that leverage AI. All of these efforts have kicked off a race for the one thing powering it all: Graphics Processing Units (GPUs). As we start 2024, we wanted to reflect on the state of GPU capacity and opportunities for startups.
Using GPUs in model buildout
The advent of FMs has led to a mad dash to acquire as many GPUs as possible. Most of the current capacity buildout is going towards the initial model creation step, known as training.
Model training is the process by which a machine learning model is fed a large dataset as part of its creation process. To help LLMs and FMs achieve "reasoning," they are generally trained with publicly available data sets like Common Crawl. These models are also trained on private datasets curated by the model provider themselves or datasets that are publicly available. This information is then "parameterized" so it is in a format that can be used by the model. Each parameter is given a weighting to encapsulate how much it should impact the results from the model.
This process is highly compute-intensive, and requires GPUs, catered towards running computation in a parallelized fashion. For example, some speculate that OpenAI required 25k Nvidia A100 GPUs to train their 1.76T parameter GPT-4 model for over 100 days straight. Meta took roughly 1.7M GPU hours to train its 70B-parameter Llama 2 model (equating to roughly 10K GPUs running over 7 weeks!). And just recently, Meta publicly announced it will utilize an equivalent of 600k Nvidia H100s to train their upcoming Llama 3 model.
A training job requires a large amount of compute capacity. Hundreds of thousands of GPUs need to be interconnected with a high-throughput network fabric to run together on a single training job. To fill this market gap, we have seen new companies being founded, including Coreweave, Foundry, Lambda Labs, and Together AI.
The role of GPUs in model inference
After the training stage, a large foundation model still requires continuous compute capacity to run, which is inference.
Training requires a large cluster of interconnected GPUs running over an extended period of time. Model inference, however, may require much less compute capacity depending on when a model is being prompted.
New startups have emerged alongside cloud providers to provide model inference. These startups are looking to differentiate themselves via developer experience, product design, and lower costs. This category includes players like Anyscale, Baseten, Banana.dev, Fermyon, Fly.io, Modal, and Runpod.
The rise of "models-as-a-service" companies
Some companies are happy owning the underlying infrastructure required for deploying their models. Others want a solution with a much higher level of abstraction. This has led to the rise of "models-as-a service" companies. These companies are focused on providing a single-click solution to deploy a wide assortment of the most popular LLMs and FMs.
This category includes Hugging Face, which is primarily known as the go-to solution for model sharing. Hugging Face now also offers the ability to host their 60k+ models for customers. Other providers include Anyscale, Replicate, Fireworks, and Lepton AI. These providers have a smaller amount of model offerings as they focus on the most popular open-source models, while looking to compete on performance/cost/warm-starts.
Questions about the future of GPU cloud startups
We are strong believers in the future use-cases of LLMs and FMs being built out today. By extension, we are investing heavily into the AI-native infrastructure powering this technology shift. Within the GPU cloud world, some questions that we are still thinking about include:
1. Where’s the money?
As companies race to adopt LLMs, how much incremental revenue will these businesses be able to drive through these product additions? The processes for either training an in-house LLM or consistently running LLM inference for a 3rd party model are currently very expensive.
2. Margin expansion?
Since GPU clouds can be considered re-sellers of NVIDIA GPUs today, how will the margins of these companies look over time? These providers will continue to build out their software offerings to improve their margin structure and create differentiated solutions for their end customers.
3. What about the cloud providers?
Most GPU cloud customers have defaulted to large cloud providers, including services from AWS (Bedrock) and Microsoft Azure (Open AI Service). Given this situation, GPU cloud startups will be forced to innovate. They will be challenged to compete with large cloud providers with established customer bases.
At Unusual Ventures, we are big believers in the future of generative AI, and are actively investing at the infrastructure layer. Please reach out directly at rwexler@unusual.vc, or learn more at our website, https://www.unusual.vc/
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.