SoftStackers

View Original

How Our Clients are Operationalizing Generative AI

First, Customer Story Time!

Recently, I worked with a well-known retail brand that wanted to implement a chatbot to handle customer inquiries. As is typical with GenAI, they struggled with getting accurate and relevant responses from their model. This is pretty common during the proof of concept (POC) phase, but, with that in mind, everything seemed promising. The model generated insightful responses, and the team was excited about the potential cost savings and improved customer engagement. But, moving to production is a different animal and exposed several challenges—response times varied, the need for accurate outputs was critical, and maintaining the model's performance required continuous fine-tuning. We helped them refine their prompts, curate data, develop more few-shot examples, and test different foundation models (FMs). By systematically evaluating models, it was clear that using Anthropic’s Claude model via Amazon Bedrock would provide the quick, accurate answers to their customer’s common questions. It’s early days, but in early testing, their customer satisfaction scores but also freed up their human agents to handle more complex issues.

Obviously, the subject of generative AI and large language models (LLMs) are on fire. While funding is slowing a bit as the market evaluates the ROI of leveraging them, companies are still eager to see how GenAI can improve their products or operations. But, integrating these models into daily business activities represents several challenges.

I’m going to spend some time sharing insights into how we operationalize generative AI applications using MLOps principles, what we are seeing unfold is a new buzzword/acronym (lord knows we need more of those!) to be called Foundation Model Operations (FMOps) and we will take a closer look at LLM Operations (LLMOps!.. I know) for text-to-text applications.

Understanding MLOps and Its Evolution

MLOps, short for Machine Learning Operations, is all about getting machine learning models from development to production efficiently. Like most customer and employee facing technologies, it’s composed of people, processes, and tech in order to keep everything running smoothly. Implementing MLOps requires some talented individuals, including data engineers, data scientists, accountable business stakeholders, AWS architects, and, if necessary, compliance teams. In developing these solutions, the team needs to work across various environments—data lakes leveraging secure private data, sandbox accounts, and deployment pipelines—to ensure models transition smoothly from development to production.

Primary Components of MLOps:

1. Data Management: Engineers gather and prepare data from multiple sources.

2. Model Development: Data scientists create and refine models based on specific KPIs.

3. Deployment: ML engineers deploy models using CI/CD pipelines.

4. Governance: Auditors ensure compliance with data privacy and regulatory standards.

Transitioning to FMOps and LLMOps

Image demonstrating the iterative process of managing the LLM lifecycle: Design, Test, Catalog, Deploy, and Monitor.

Generative AI extends the principles of MLOps to foundation models (FMs) like ChatGPT or Claude, who handle insanely large datasets and complex architectures. FMOps focuses on operationalizing these models across various use cases, including text-to-image and text-to-audio, while LLMOps zeroes in on text-based applications.

Key Differences from Traditional MLOps:

1. People and Processes: Specialized roles like data labelers and prompt engineers become essential.

2. Model Selection: Choosing between proprietary and open-source models based on criteria like licensing, performance, compliance, and cost.

3. Evaluation and Monitoring: New metrics and human-in-the-loop evaluations ensure model accuracy and relevance.

4. Deployment: Handling data privacy concerns, especially with proprietary models.

The Journey of Generative AI Users

The users/groups in the Generative AI space fall into three categories: providers, fine-tuners, and consumers. Each group follows different operational steps based on their expertise and requirements.

1. Providers: Build FMs from scratch, requiring deep ML and data science skills (not to mention insanely deep pockets and a desire to make Nvidia rich!)

2. Fine-tuners: Adapt existing models to specific contexts, balancing computational power and accuracy.

3. Consumers: This is the majority of that utilize generative AI services via APIs or UI, and focus on application development or usage rather than ML intricacies.

Consumers make up the bulk of the market and our customer base. Here are the:

 Operational Steps for Consumers:

- Work Backwards: Select and test appropriate FMs based on use cases.

- Develop backend systems (Private datasets, prompt engineering, ideal models, agents etc) to handle inputs and generate outputs.

- Implement frontend interfaces for user interaction and feedback. We usually lean on React, out of the box solutions  like Amazon Q, or AWS native services like Amazon Quicksights.

 Developing Generative AI Applications

Developing generative AI applications involves creating both backend and frontend components. Backend developers integrate FMs, while frontend developers design user interfaces for smooth interaction. As mentioned above, there are several options based on a client’s goals which lead to a build vs buy decision.

To focus on the build option, here are the key tasks involved:

  • Prompt Engineering: Designing effective prompts to guide model responses. This sounds really simple. It’s not prompting in the end user sense. Many times it comes down to creating a strategic pipeline or decision tree and passing those prompts through a limited context window while maintaining output accuracy.

  • Application Development: Building interfaces for users to input prompts and receive outputs. Again, we typically look at React to develop the UI.

  • Evaluation and Feedback: Continuously assessing model performance and incorporating user feedback. This process does not end. Ongoing QA is essential as the models are updates, as we build a database of good vs bad outputs, and as our datasets change.

By implementing most of these steps, our retail customer was able to move from a promising POC to a robust production system, improving both customer experience and operational efficiency. As with most GenAI projects, even production outputs should be evaluated for truthfulness and explainability is key, but we can get closer than we have ever been to using the technology to generate true value.

Conclusion

Operationalizing generative AI requires extending MLOps practices into FMOps and LLMOps. It involves new roles, processes, and technologies tailored to the requirements of foundation models. By understanding these unique requirements, businesses can effectively integrate generative AI into their operations, unlocking its transformative potential.

 For further insights, check out resources like the "MLOps Foundation Roadmap for Enterprises with Amazon SageMaker", which served as inspiration for this post and provides deeper exploration to end-to-end solutions available through platforms like Amazon SageMaker JumpStart. We'll be sharing more detailed guides on monitoring generative AI models and integrating advanced architectural patterns into FMOps/LLMOps soon.

 By adopting these practices, businesses can stay ahead in the rapidly evolving landscape of artificial intelligence, ensuring they not only deploy but also sustain powerful generative AI solutions effectively.

Contact us if you’d like to have a conversation around how you can adopt these service. No commitment required. We just love talking about GenAI. 😊