LLMOps Introduction

3 min readMay 12, 2024

I recently completed a short course called LLMOps by DeepLearning.AI, in collaboration with Google Cloud, instructed by Erwin Huizenga

In a typical corporate setting, when presented with a new idea, we often start by conducting a Proof of Concept o assess the project’s feasibility. This step, usually performed on an employee’s computer, is crucial for gauging the potential success of the project. However, what happens when this POC proves promising and we need to transition it into production?

Questions arise about how to deploy the model, monitor its performance, manage the infrastructure, handle large volumes of requests, and ensure low latency. These considerations are often overlooked during a POC. In our current era, especially when dealing with LLMs, it’s crucial to understand MLOps and LLMOps to effectively transition products into production.

I took a course on LLMOps and here are some key takeaways:

1. Automation and monitoring

They Should be integral parts of our ML system, encompassing everything from integration, testing, releasing, deployment, to infrastructure management.

2. The MLOps framework consists of several components

Data ingestion: Fetching the latest data from the database.
Data validation: Checking for missing data.
Data transformation: Transforming data into a format suitable for LLM.
Model: Training the model for our use case.
Model analysis: Evaluating the model’s performance.
Serving: Deploying the model.
Logging: Logging the results of LLM calls and monitoring the data.

3. MLOps vs System design

MLOps focuses on LLM development and management in production, while System Design encompasses the entire flow from front-end to back-end, including data engineering. In MLOps, we concentrate more on conducting experiments, training, evaluating the model, managing prompts, and so on. In System Design, we discuss designing steps of LLM chains that are executed (in case of multiple steps), grounding the LLM to ensure the information generated is factual, and tracking history.

4.Orchestration

defines how our components are connected and the order in which they are executed. Automation, on the other hand, involves running the built pipeline in an automated way without human intervention. Tools like Apache Airflow and KubeFlow are used to automate workflows.

5.LLMOps pipeline

It includes data preparation and versioning, model training, configuration, and workflow. These metrics are used to start the deployment process. Responsible AI is used to check the safety of LLMs when they generate answers.

6. Prompt usage

Always use the same prompt used during training or expect different behavior unless you trained the model with different versions of instructions.

7. Data Warehouse

It is a central repository for storing and analyzing large amounts of data, providing insights and information. Pandas are a great fit in memory, while SQL is better when dealing with data warehouses.

8. Handling data

It’s advisable to use SQL and export results into a Solid-State Drive (SSD) or a cloud bucket. Track where the data comes from and what transformations it has undergone. When dealing with large data, it’s best to store them in an SSD or a storage bucket from a cloud provider for efficient reading during training and tuning.

9. File formats to train and evaluate LLMs

Save data in JSONL format (text-based format where each object is on a row) good for small to medium-sized datasets. TFRecord is a good choice for efficient training, and Parquet is a suitable file format for large and complex datasets.

10. Data versioning

Versioning of our data is also important because many times when conducting several experiments, each time we use a different training, test data split. This way, I can know which file was created from which data warehouse.

11. Sending data between different components

In case we have a pipeline where we need to send data from one component to another, and the data parsing is dependent on the component we are sending to, it’s best to specify the location of the stored data instead of sending the data itself to the next component. This way, we are not sending data in memory and we can ensure scalability.

There was a lot more information discussed in this short course, but these are some of the main points.

If you like what you see, hit the follow button! You can also find me on LinkedIn, and we can follow each other there too. 😊