Google IO happened this week. It was fun to see how a large company is trying to catch up with OpenAI that has ~375 employees. During Google I/O, a few new models were announced, including versions that can run on the device. This is important - putting competitive pressure. Plus, they are rolling out new features in Google Bard.
Then, LMSYS.ORG released a new leaderboard showing that the Anthropic Claude model is better than OpenAI ChatGPT-3.5, but still worse that ChatGPT-4.
Anthropic followed up by expanding the context window of Claude from 9k to 100k tokens, which roughly equates to 75 000 words in English.
You can fit “The Great Gatsby” in that window and ask to write an epilogue. Or ask the model to analyse 85-page Form 10-k of corporate filing.
The larger model of OpenAI ChatGPT-4 can handle only 1/3 of that - 32k tokens.
Competition is good, so OpenAI started responding to that pressure by making ChatGPT plugins available to its Plus subscribers.
Plugins allow ChatGPT to interact with the outside world. It can use the latest trends to draft marketing content, generate on-brand visuals or even automatically post them to social networks.
Everybody else now will also be trying to keep up.
This also means that technology will also continue developing and changing at a break-neck speed. This will impact companies that build products with ML-driven features under the hood.
Designs and implementations might become obsolete over night.
For instance, consider a smart assistant for answering company-specific questions. It uses multiple knowledge bases for that (e.g. Confluence, internal Sharepoint, Github Issues and StackOverflow questions). Current “state of the art” is to use Vector databases with embeddings to provide ChatGPT with relevant information for the answer. This requires careful work and benchmarking.
However, why even bother with all the complexity, if you could soon load the entire knowledge base (or the most important part of it that fits into 75000 words) into the model and ask questions.
So, how do you even ship features in the world, where any work could get stale and useless any week now?
Easy. Technical implementations are irrelevant. You should design your product development process in a way that you could rebuild things easy.
Research is Traction, not a Waste
Let me tell you another story along these lines.
Last week, my colleague and I were working on a prototype. This was a lead generation tool for a specific niche. It can scan through public corporate reports and find answers to very specific questions. Answer to this question are a gold for another company that can turn these hints into a perfect sales opportunity.
While setting up the prototype, I knew exactly what was the objective and which questions we had to ask. The execution path to success was short. However, I gave my colleague a wider, more “wasteful” task:
Please take this document and build a small chat bot that can answer any question. To keep the experiment clean, I’m not going to tell you what the question is going to be. What I can tell for certain - you will not expect it. Do your best.
The prototype took a couple of days. Once ready, I asked “The Question”. The prototype failed to retrieve the information.
This was expected. This was a huge win.
Why?
The whole point of this exercise is to de-risk ongoing development of multiple prototypes and features in the pipeline.
Any unaddressed uncertainty can cause days, weeks or even months of waste down the road, depending on how much we need to back-track in order to fix that.
The only way to de-risk potential problems is by addressing them as soon as possible. This is the best point in time to pivot and avoid these problems, or incorporate solution into the design.
The only way that is more efficient than that - to avoid some of the research by pooling up the resources with the others that move in this direction.
Smaller your team is, more experiments and research you need to run in order to get feedback from the real world and correct the course as soon as possible.
The same applies to product development even without ML - in order to find a market fit you need to release early, get to the market and fail fast. Even if that means sometimes cutting corners, creating imperfect solutions and throwing failed experiments out.
Your progress will be measured in the number of failures you have learned from. Each failure - a precious data point. Each data point - helps to plot a better course and grow moats for your business.
This failed prototype experiment was a huge win from this perspective. We have learned the limitation of our initial approach: generic GPT-driven indexes aren’t very good out-of-the-box for smart-assistant over wide knowledge bases.
The sales lead generator prototype was also good enough. Once we knew about the problem, fixing it was a matter of a few minutes.
Additionally, I estimate that this experiment has saved at least 3-4 days of effort in the next product prototype (this one has to work actively with large knowledge bases). Based on the failure, we didn’t even waste time trying to make the default embeddings approach work. Instead, we took the step back and investigated performance of different ChatGPT information retrieval indexes on this kind of data. Cost, latency, accuracy.
Your business can use this approach, too. If you are building products with Machine Learning and Language Models, then continuous learning and adaptation are a must. Thanks to the competitive pressure, the technology itself will continue evolving at a break-neck speed. It will not wait for anybody.
However, you could turn this into a business advantage, even with a small team and a limited budget. Design your product development processes to be flexible, run real-world experiments and ship features, even if they are imperfect. Even if a feature is a failure, it is still one more data point. It can help to avoid future waste and plot a better course for the business.
I encourage you to apply this approach to your own business. What feature could you ship right now? What experiment could you run? What data could you start gathering?
If you would like to exchange thoughts, feel free to reply in the comments or reach out directly for further discussion.
It could be fun to write a little toolkit for making "chatgpt projections"