Cloud Only

You can't run Large Language Models on your laptop, bro.

Computing for commerce and industry started out in the cupboard at most companies. They’d invest in a server or two and create a network. This bloomed into companies hosting ever larger on-premise infrastructures.

Then cloud happened and workloads began to migrate there. You could still get by using your own on-premise data centre, though. There were big negatives but it was possible.

We began to refer to apps that were completely migrated, or were only ever built on cloud, as cloud native.

Then AI happened. LLMs, specifically.

Now, pragmatically, realistically, you can only build using LLMs on cloud. The vast amount of computing and data power required simply makes it unfeasible to try and do it on-premise at anything approaching enterprise scale.

We’re moving into a new paradigm. Generative AI applications are cloud only.

There is an emerging trifecta of strategies for companies wishing to use LLM models:

  1. Thin Client - use public (usually paid) APIs that deliver inference, e.g. OpenAI. Only the user application side needs to be managed by you.

  2. Roll Your Own (RYO) - deploy and fine-tune publicly available models from Hugging Face. You need to manage the inference as well as the user app.

  3. Build Your Own (BYO) - create your own model from scratch. You do everything, from the data science up.

Our contention at Temrel is that you can only do RYO and BYO on cloud. You have no other realistic choice.

Why is it so?

Scalable Compute Resources

LLMs require an enormous amount of computational power, often utilizing multiple high-end GPUs or TPUs in parallel. Training such models on-premise is rarely feasible for most organizations given the hardware investment needed, not to mention the availability of the components.

Cloud platforms offer on-demand access to state-of-the-art computational resources without upfront capital costs. They can scale out to accommodate the needs of training large models and then scale in to save costs when not in use. What’s more, the big players tend to get a steadier supply of the essential kit.

Massive Data Storage & Throughput

LLMs also often require vast amounts of data for training. Efficient training necessitates high-speed access to these datasets, and the ability to store, retrieve, and manage them with minimal latency.

‘Primitives’ cloud storage solutions like AWS S3, as well as third-party ML offerings like Snowflake offer unparalleled storage capacity, throughput, and redundancy. They allow for seamless integration with compute resources, ensuring that data can be quickly accessed and fed to the model during training. Storing and managing petabytes of data on-premise would be logistically challenging and costly.

Elasticity & Flexibility

The needs of LLM projects can vary over time. For example, you might require vast amounts of compute resources during the training phase but less during the inference phase.

All cloud platforms allow for rapid provisioning and de-provisioning of resources based on demand, ensuring that you pay only for what you use. An on-premise setup would lack this kind of flexibility and could lead to underutilized or strained resources. Given the top-dollar cost of purchasing the required hardware, the waste will be large.

Perhaps more realistically is that you won’t find enough hardware for your needs, leading to underwhelming output and negative user experiences, which will be equally catastrophic to project success.

Advanced Tooling Ecosystem

LLMs require sophisticated tooling for monitoring, optimization, and management. Furthermore, they often need to integrate with other services like data pipelines, analytics platforms, and more.

Cloud providers offer a suite of tools and services that seamlessly integrate with each other. This holistic ecosystem simplifies the task of managing LLMs and integrating them into larger workflows. Replicating this level of integration on-premise would demand significant time and effort.

Collaboration & Distributed Workflows

AI research and development for LLMs often involve collaboration among teams located in different parts of the world. They need to simultaneously access models, data, and tools.

Cloud platforms support globally distributed access, ensuring that teams, regardless of their location, have consistent and efficient access to resources. Setting up a globally accessible, high-performance, on-premise infrastructure for LLMs would be daunting and fraught with challenges.