What to do in a Cambrian explosion?

Technical progress accelerates. Don't try to chase it all, just gather data and prepare for all that tech to be commoditised. When the dust settles, you'll be ready.

May 06, 2023

Down the hill we go

One of the most notable events this week - potential leak from an engineer at Google or “We Have No Moat, And Neither Does OpenAI”.

I don’t agree with the business analysis part but still recommend to read this leak. It provides a great overview on how fast tech has started moving forward in the past months.

Here is just a short segment of the timeline:

Feb 24 2023. Meta announces LLaMA. It is a good large language model that took a lot of resources to train. It was limited to the researchers only.
within week - LLaMA was leaked to the public. It wasn't possible to use legally, but people worldwide could experiment.
within days - people learned how to compress LLMs to fit on smaller GPUs and even run them on laptops, phones and ultimately on RaspberryPi 4 (retail price of 100 EUR)!
next day - thanks to Stanford Alpaca and LoRa repo, everybody could finetune LLMs on a single consumer-grade GPU
within a week - cross-university project called Vicuna reaches parity with Google Bard. While some aspects (data sources and evaluation) are questionable, results are really impressive. It is a fine-tune of LLaMa. Training costs - $300!
within a week - gpt4all is created. It is not just a model, but an ecosystem of open-source language models and chat bots.

gpt4all was released on March 25, just a month after LLaMA announcement. Naturally, things didn’t stop there. They are still going. Andrej Karpathy calls this “Cambrian explosion”.

As I’ve been saying in the previous newsletter, specific models and technical details are currently irrelevant for building products. Things could completely change over and over a couple of times. The trends are more important.

One trend is: when every student, entrepreneur and researcher is enthusiastic about the tech and wants to play with it on owned hardware, they will find a way. It will be clever, scrappy and full of horrible hacks. But it will work.

Then big companies will take the best parts and commoditise them. This will make things even cheaper and simpler to use.

Let’s take a few most promising tech capabilities (potentially irrelevant, so I’m trying not to get too attached to them). They also show the trends and can already define some capabilities of future products.

Promising tech capabilities

First, LoRA. It is a clever way to fine-tune a large language model for your own tasks at a fraction of the cost. You can personalise a larger language model on consumer hardware in a few hours! This was unheard before. This enables information retrieval systems that learn new things at a deep level in near-realtime.

Next, Unlimiformer. It is a potentially promising way to incorporate information retrieval (with embeddings and vector databases) into an existing LLM, essentially giving LLMs unlimited context. The paper hasn’t been peer-reviewed, but the direction is interesting.

MosaicML has released a new model called MPT-7B. It is commercially-usable transformer trained on 1T tokens. It can follow human instructions (InstructGPT), works better than LLaMA-7B. The most impressive part: it was trained on up to 65k input tokens and could handle up to 84k tokens.

Other open source models can typically handle 2k-4k tokens. The biggest unreleased version of GPT-4 can handle only 32k tokens.

Just imagine, what if the next iteration of “Cambrian explosion” would use this wide foundational model as a baseline?

And the last finishing touch. LMSYS (of Vicuna fame) released a LLM leaderboard that uses chess-based Elo rating system and human feedback.

Is it time to train and run my own LLMs?

It all seems to come together, right? Use LMSYS leaderboard to identify best LLMs, fine-tune them for cheap with LoRA, plug Unlimiformer and then use in your products for profit💰💰💰? Or go straight to MPT-7B with its huge context window.

Yes, this can be a valuable R&D exercise. Yet, as a technical consultant I would advise to defer training until you can prove this is the best choice right now. Especially, if your business depends on shipping something fast.

Let’s unpack that.

Proving a hypothesis is a normal approach of any data-driven product development. Any other way is just throwing money on the wind, making wild bets and wasting time.

In order prove a hypothesis, we need data. It doesn’t have to be fancy (although tools like Amplitude can help), but you should be able to prove with numbers that your hypothesis is working. Or not working, that is fine, too.

Coincidentally, data is the most important long-term resource in the current AI/ML race. LLM tech is irrelevant, it will change and evolve. Hard data is what is used to train models. It is what is needed to train or fine-tune models to specific needs. This is also what is needed to prove hypothesis.

Another tricky part with data - it has a tendency to disappear forever, if you don’t actually gather and record it.

Long story short: start gathering data for proving product hypotheses as soon as possible. Quite often, it is the same data as the data you could later use for training.

By the way, if you are into CQRS/DDD/ES, then event sourcing, might be a good approach. But even appending all product events to a text log file (e.g. JSON per line) can be good enough for a start.

Technology is rolling downhill at an accelerating speed. There is no point in trying to go deep and chase it all. It is more efficient to observe the trends from the side. Use saved energy to gather data and develop moats for your business.

Once the dust settles, the most important technology will be commoditised and publicly available. You’ll be ready for the next upgrade. Until then, just stick to the best paid technology like ChatGPT4 for fastest time to market.

Till the next week!

PS: While writing this newsletter I have figured out an architecture for a personal GPT-driven assistant that could be trusted to actively reorganise and even prune its own memory. Use vector database as an append-only storage, CAS for documents and event-sourced index to link all bits together. Even if LLM goes crazy and decides to wipe all of its own data, I could rewind everything to a previous stable version.

ML Under the Hood

Discussion about this post