Enterprise LLM Platforms, AI Strategy and Continuous Learning
If we prove that this ChatGPT thing actually works, how can we quickly catch up with it? How can we employ this new flavour of AI across the company in a pragmatic way?
Hi!
In this newsletter, I’ll share with you new insights from companies that are building ML-driven products (mostly ChatGPT-driven these days).
These insights come from my discussions with decision makers, driven by my research “Identifying patterns for AI solutions in business”.
Ok. So everybody is talking about AI. Even Microsoft has bet on being known as the AI company in its Super Bowl ad this year.
This reflects a common trend: when talking about products powered by AI, companies are not interested in just a single project, service or feature. If the evaluation works out, companies want to start moving toward a long-term strategy that might eventually incorporate AI.
Common thought is: “If we prove that this ChatGPT thing actually works, how can we quickly catch up with it? How can we employ this new flavour of AI across the company in a pragmatic way?”
Here is one way to think about a systematic adoption of AI:
We start with processes for evaluating AI-driven features and shipping them.
Along the way, we ensure that AI continuously learns more about business and adapts to it. We can achieve this by collecting data and user feedback at every step, and deploying new models regularly.
At some point in the future, it might be worth setting up an AI platform to enable self-serving and direct collaboration within the company.
All along the way we keep these execution details aligned with the global strategy of the company.
Let’s quickly go through these points.
Set up processes for evaluating AI-driven features and shipping them
Bigger companies tend to struggle with AI evaluation more than the others, due to the organisational inertia. It doesn’t help that the technology is still rapidly changing and evolving.
To make things even more challenging, most of the technology and expertise is thinly spread between the companies and siloed within the teams.
For example, you can read a lot about fine-tuning LLMs and building RAGs with vector databases. Internet is full of tutorials. However, you will hear only whispers about disappointing results of fine-tuned models or low quality of AI systems driven by vector databases. You will hear even less about using LLMs for working with spreadsheets or navigating business-specific domains.
How can companies even make progress in such a world? Same way that has worked for decades: we switch AI product teams into the startup mindset, focus on rapid experiments and setup fast feedback loop between product teams and the consumers.
While talking to the customers of the technology (real customers or internal users), it is also important to get them motivated in collaborating and helping to train their AI assistants. People will never be replaced by AI, but they can accomplish much more with it.
Collect data to let AI continuously learn more about your business
This one is easy. We take a pause from building AI (no matter how exciting the technology is) and make sure that our bases are covered. Before building we need to ensure that user feedback is captured at every single step.
We don’t need to build a perfect product from the first try. It is usually hard on its own, even more so with a brand new technology.
There isn’t enough data, expertise and intuition, yet.
Instead, we can allow our first alpha release to be a mediocre AI-driven product that provides a tiny bit of assistance in a specific business process. The important bit - it has to be eager to learn and accept user feedback.
For some inspiration check out Capture Feedback section at ML Labs. As a subscriber you already have access to it.
Once we have feedback and usage data flowing in - it is time to start reviewing it regularly (in semi-automatic way with ChatGPT, of course). Any new findings could be continuously incorporated into the deployed solution to get it better.
This doesn’t even have to be fine-tuning (we probably don’t have enough data, yet). It could be good enough to just tweak the prompt and input data based on the statistics. Anything works, as long as it makes the model more adapted to business nuances and workflows of each individual user.
For example, in my own personal AI assistant, task-specific models get updated with new versions as soon as there enough new data for the next upgrade. Usually this happens once in a few days.
Ideally, we’ll make a habit of repeating that process for every single AI-driven feature that is being developed.
Yes, this process can not scale well across teams and departments. it is ok to do things that don’t scale for a little while (Paul Graham wrote about that eloquiently)
You see, as we repeat the process a couple of times, your teams will start building an intuition about what it takes for your company to deliver an AI-driven product feature that continuously learns and gets better.
If all works as planned, at this point your IT teams will be a bottleneck for rolling out AI-driven features and integrating them with existing systems.
This would be a good time to draw a boundary between the teams that run the underlying AI infrastructure and people that use it to define new AI-driven workflows, use and improve them.
In other words:
Setup an AI platform to enable self-serving and direct collaboration within the company
An internal AI platform allows development teams to focus on the foundational technology, while users can build and maintain new AI skills on their own. Users would be able to do that at their own pace, while relying on the platform to directly exchange knowledge, expertise and recipes. Development teams are no longer needed in this loop.
Your teams will be in a good position to build such a platform, because they have repeated this process of building new AI features - a couple of times already. They know what it takes and they know the pain points.
By the way, the platform might be too big of a word. It can be as simple as a configurable rules engine that can call out to a few APIs.
Align execution details with the global strategy of the company
If your teams have been following through the steps above, you don’t need to worry about the tactical details and implementation nuances (too much)
Why? We already know that implementation nuances don’t matter on really larger scale. Processes, data and organisational structures matter more. They simply last longer.
Now let’s look back at our three bullet points for setting up an adoption of an AI:
Set up processes for evaluating AI-driven features and shipping them.
Collect data and user feedback to let AI continuously learn about your business and adapt to it
Build an AI platform at company to enable self-serving and direct collaboration
See? We have already been working on the important parts all the time along! This is the foundation for establishing AI strategy at the company.
Before wrapping up the outline of a strategy, we also need at least three feedback loops:
Executive: ensure that individual development teams stay aligned with the long-term strategic goals of the company
Tech: ensure that AI initiatives stay up-to-date with the latest tech improvements, while being pragmatic about the choices they make.
Collaboration: keep different AI initiatives on the same page, ensure collaboration and some standardisation. This will help with resource pooling and moving people between the teams.
All this will help us setup a stable organisational system that is capable of some self-correction.
To wrap things up. While setting up a company strategy for adopting AI/LLM/GPT in a pragmatic way, think about:
R&D processes with rapid iterations and fast feedback loops
Systematic capture of user feedback to drive continuous AI improvement.
Using upcoming organisational bottlenecks to drive further change in the company.
AI Research (or where does all this come from?)
If you want to learn more on the topic, I invite you to collaborate together on the research “Identifying patterns for AI solutions in business”.
Here is the gist:
Duration: 45-minute focused session
Your input: highlight your key business challenges and review potential AI applications together
My role: provide expert AI insights and potential solutions
Outcome: collaborative exploration to identify practical AI applications for business
You can find more details and sign up on my website.
Trustbit LLM Benchmarks for January 2024
There aren’t that many dramatic changes on the benchmarks, which is an indication that the world of LLM is starting to cool down a little bit.
There is still an important milestone. Mistral 7B OpenChat v3 managed to beat the oldest version of ChatGPT-3.5 on our benchmark.
It could be the coincidence, but soon after that OpenAI further lowered prices on ChatGPT-3.5:
for the third time in the past year, we will be decreasing prices on GPT-3.5 Turbo to help our customers scale. Input prices for the new model are reduced by 50% to $0.0005 /1K tokens and output prices are reduced by 25% to $0.0015 /1K tokens.
By the way, Mistral AI has been working on bigger and better models, on top of their widely used Mistral 7B and Mixtral 8x7B. There is an even bigger model called “mistral-medium” that is available solely through their API.
I expect both Mixtral 8x7B (also known as `mistral-small` behind Mistral API) and `mistral-medium` to perform competitively on the LLM leaderboard. However, at the moment there is a small regression in the second generation of Mistral models. They ignore instructions and talk way too much, making them unusable for building complex AI-driven features.
As a result, 2nd generation of Mistral models fails the Trustbit LLM Benchmark rather badly.
Since the issue is relatively minor and already acknowledged by the engineers, we don’t include current results in the leaderboard. Once it is fixed, we’ll do a proper benchmark.
For more details, check out Trustbit LLM Leaderboard for January 2024.
Until the next time!