ML Under the Hood

Share this post

Breaking the curse of LLM v2

abdullin.substack.com

Discover more from ML Under the Hood

Let's talk about ChatGPT and building ML-driven software products. You'll hear from me once a month: relevant industry news, interesting cases and lessons learned.
Continue reading
Sign in

Breaking the curse of LLM v2

New releases of large language models focus on efficiency. Sometimes quality is sacrificed. LLaMA v2 was a nice surprise.

Jul 18, 2023
Share this post

Breaking the curse of LLM v2

abdullin.substack.com
Share

In June I wrote a newsletter about GPU starvation in major LLM/GPT providers. Companies were trying to make their models run faster and use less hardware. This affected development of their produces.

Let’s talk about new versions GPT-4, Claude and LLaMA v2.

Thanks for reading ML Under the Hood! Subscribe for free to receive new posts and support my work.

OpenAI GPT 0613 Update

GPT-4 API got the first major update since march. The notable features there is Javascript function calling. As OpenAI announced:

These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. Function calling allows developers to more reliably get structured data back from the model.

Similar change also applies to GPT-3.5-Turbo. Both models are “updated and improved”, according to the documentation.

However, if you dig deeper, the major improvement seems to be about the performance of these models, while quality stays the same. In some cases, it could even get worse.

Preliminary results of Trustbit LLM Product benchmarks show that GPT-4 got much better in “code”, “marketing” and “reasons” tasks, while tasks related to automating business processes around CRM systems - got worse.

Refuel AI also notes a similar trend:

We found that the new gpt-3.5-turbo model had poorer labeling quality on six out of eight datasets. However, the new model is significantly faster (~40% lower turnaround time). The labeling performance for the gpt-4 model was more or less the same.

So if we focus on model quality, jump from GPT-4 API v0314 to v0613 is not even a jump, but more like a limp.

Anthropic Claude v2

Anthropic similarly focused on performance in Claude v2, as well: “Claude 2 has improved performance, longer responses”.

Quality of the new model has got worse on product benchmarks:

LLaMA v2

It looks like Meta (Facebook) did manage to break the curse of the subpar second release, though.

The second version of LLaMA was just announced:

  • More permissive license, that allows commercial use (see gotchas below)

  • Better quality.

  • Includes variants of 7B, 13B and 70B (huge).

Meta also made a really smart move. They kept model architecture similar to v1. This allows to leverage all the infrastructure and tools that community have already built.

If you have read my previous newsletter on the Cambrian explosion - now we are witnessing start of another spiral.

Model compatibility has already enabled the community to start porting LLaMA v2 binary files to GGML format that can run on CPU.

There are a few gotchas about the license, though.

First one - companies with more than 700M monthly users can use model only if Meta allows that. That is so anti-Google.

Second one comes from the Model card. Working in English is in scope. Any other language - out of scope. That is a serious blow, especially for the minor languages.

Meta claims that it has invested a lot of effort in safety and guardrails. Probably most that effort went in the English language, so they just added this clause to avoid being liable for any other languages.

Expect updated benchmarks with LLaMAv2 to be published closer to the end of the month.

Thanks for reading ML Under the Hood! Subscribe for free to receive new posts and support my work.

Share this post

Breaking the curse of LLM v2

abdullin.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Rinat Abdullin
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing