NingG +

AI 系列:Models

原文:Models

Work in Progress

This chapter is still being written & reviewed. Please do post links & discussion in the comments below, or open a pull request!

Some ideas:

The emergence of Large Language Models, notably with the advent of GPT-3, ChatGPT, Midjourney, Whisper helped bloom a new era. Beyond revolutionising just language models, these models also pushed innovation in other domains like Vision (ViT, DALL-E, Stable Diffusion SAM, etc), Audio Wave2vec [91], Bark) or even Multimodal models.

https://static.premai.io/book/models_llms-landscape.png

Fig. 27 Page 7, A Survey of Large Language Models [90]#

1.Proprietary Models 专有模型#

1.1.Text#

For performance comparisons, Chatbot Arena helps (though it’s a bit old and doesn’t reflect latest results).

PaLM-2#

PaLM-2谷歌的下一代大型语言模型,经过深度训练,包括100多种语言的多语言文本。

PaLM-2高级推理翻译代码生成等任务方面也表现出色。与其前身PaLM相比,PaLM-2体积更小,但效率更高,性能更出色,包括更快的推理速度、更少的参数供服务和更低的服务成本。

PaLM-2在某些领域的推理任务方面表现出比OpenAI的GPT-4更强的结果。PaLM-2的多语言能力使其能够理解来自各种语言的成语、谜语和影射文本。此外,PaLM-2具有快速响应的优势,可以一次提供三个响应。他们还发布了一篇论文,提供更多详细信息。

ChatGPT#

ChatGPT is a language model developed by OpenAI. It is fine-tuned from a model in the GPT-3.5 series and was trained on an Azure AI supercomputing infrastructure. ChatGPT is designed for conversational AI applications, such as chatbots and virtual assistants.

ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. It’s still not fully reliable and can “hallucinate” facts and make reasoning errors.

GPT-4#

GPT-4 is a language model developed by OpenAI. It is the successor to GPT-3 and has been made publicly available via the paid chatbot product ChatGPT Plus and via OpenAI’s API.

它是一款多模态模型,可以接受图像和文本输入,并输出文本输出,尽管多模态能力尚未向公众发布。

它在各种专业学术基准上表现出与人类水平相当的性能,可以遵循自然语言中的复杂指令,并以高准确度解决困难问题。

它可以处理长达 32k 个标记的输入提示,这相比于GPT-3.5的 4k 个标记有了显著增加。

它可以解决比GPT-3.5更复杂的数学和科学问题,例如高级微积分问题,或者比其前身更有效地模拟化学反应。它更可靠、富有创造力,能够处理比GPT-3.5更加微妙的指令。

Despite its capabilities, GPT-4 still sometimes “hallucinates” facts and makes reasoning errors.

Claude#

Claude 2 是由Anthropic开发的一款语言模型。它于2023年7月11日宣布推出,与其前身Claude相比,具有更好的性能和更长的回应,用户可以通过API和他们的网站访问它。

据Anthropic称,用户发现与Claude进行对话很容易,它能清晰地解释其思维过程,不太可能产生有害的输出,而且具有更长的记忆。在编码、数学和推理方面进行了改进,相对于以前的模型有了提高。

1.2.Audio#

StableAudio#

StableAudio is a proprietary model developed by Stability AI. It is designed to improve the accuracy of audio processing tasks, such as speech recognition and speaker identification.

1.3.Vision#

Midjourney#

Midjourney is a proprietary model for Image generation developed by Midjourney.

2.Open-Source Models#

Note: “Open source” does not necessarily mean “open licence”.

Subsection Description
Before Public Awareness Pre-ChatGPT; before widespread LLMs use, and a time of slow progress.
Early Models Post-ChatGPT; time of Stable Diffusion and LLaMA
Current Models Post-LLaMA leak; open-source LLMs quickly catching up to closed-source, new solutions emerging (e.g. GPU-poor), Alpaca 7B, LLaMA variants, etc.

如果把这个情景看作是大型语言模型(LLMs)如何快速改进的故事,ChatGPT将发挥重要作用。

早期性能出色的LLMs都是专有的,只能通过组织的付费API访问,这限制了透明度,引发了关于数据隐私、偏见、模型对齐和鲁棒性的担忧,使得满足特定领域用例的可能性受到限制,而不受 RLHF 对齐(alignment)的干扰。

2.1.Before Public Awareness#

认识到需要开放性,LLM研究社区做出了回应,创建了开源变种,奠定了提高透明度和开发更强大模型的基础。

There has been few notable open LLMs pre-ChatGPT era like BLOOM, GPT-NewX 20B [93], GPT-J 6B, OPT [94].

GPT-J 6B#

GPT-J 6B is an early English-only casual language model, which at the time of its release was the largest publicly available GPT-3 style language model. Code and weights are open sourced along with a blog by Aran Komatsuzaki, one of the authors of the model.

Uniqueness#
Limitations#

Before ChatGPT‘s (GPT-3.5) public release we had GPT-3 being one of the “best” Base Language Model which released ~2.1 years before ChatGPT. And following that we’ve had LLMs like Bard, Claude, GPT-4 and others.

2.2.Early Models#

There has been a few visible marks across modalities of AI models, highly catalysing growth of open source:

Stable Diffusion#

Stable Diffusion is a latent text-to-image diffusion model [97]. Created by Stability AI and support from LAION, where they used 512x512 images from a subset of the LAION 5B database for training. Similar to Google’s Imagen [98], this model uses a frozen CLIP ViT-L/14 [99] text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.

Uniqueness#

While training:

Limitations#

LLaMA#

Under LLaMA [100], Meta AI released a collection of foundation language models ranging from 7B to 65B parameters, pre-trained over a corpus containing more than 1.4 trillion tokens. It was designed to be versatile and applicable for many different use cases, and possibly fine-tuned for domain specific tasks if required.

It showed better performance across domains compared to its competitors.

https://static.premai.io/book/models_llama-scores.png

Fig. 28 LLaMA: Open and Efficient Foundation Language Models [100]#

LLaMA 13B outperforms GPT-3 (175B) on most benchmarks while being more than 10x smaller, and LLaMA 65B is competitive with models like Chinchilla 70B [101] and PaLM 540B. LLaMA 65B performs similarly to the closed-source GPT-3.5 on the MMLU and GSM8K benchmarks [100].

Uniqueness#

LLaMA架构,从其他LLMs中汲取了一些关键灵感:

  1. 预规范化(GPT-3):使用 RMSNorm 来规范化 Transformer子层的输入[102]。
  2. SwiGLU激活函数(PaLM):用SwiGLU代替ReLU激活函数[103]。 1.** 旋转嵌入(GPTNeo)**:用旋转位置嵌入替代绝对位置嵌入 [95]。
Limitations#

Interestingly within a week from LLaMA’s launch, its weights were leaked to the public. facebookresearch/llama#73 created a huge impact on the community for all kinds innovations coming up, even though there was still license restrictions not permitting commercial usage.

2.3.Current Models#

After 2 weeks from the LLaMa weights leak, Stanford releases Alpaca 7B.

Alpaca 7B#

It’s a 7B parameter model fine-tuned from LLaMA 7B model on 52K instruction-following data-points. It performs qualitatively similarly to OpenAI’s text-davinci-003 while being smaller and cheaper to reproduce i.e taking only < 600 USD. Github repository here.

https://static.premai.io/book/models_alpaca-finetuning.png

Fig. 29 Alpaca 7B fine-tuning strategy#

Uniqueness#
Limitations#

Right after that alpaca-lora came out, using low rank fine-tuning it made possible to reproduce Alpaca within hours on a single NVIDIA RTX 4090 GPU with inference being possible even on a Raspberry PI.

Things moved fast from here when first promising inference speed was achieved without GPU for LLaMA using 4 bit quantisation by the LLaMA GGML. A new wave of quantised models started coming from the community.

In a day after, Vicuna came in.

Vicuna#

Vicuna was released under a joint effort by UC Berkeley, CMU, Stanford, UC San Diego, and MBZUAI. It was trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. GPT-4 was used for its evaluation. They released a demo and code, weights under non-commercial license following LLaMa.

https://static.premai.io/book/models_vicuna-finetuning.png

Fig. 30 Vicuna fine-tuning strategy#

Uniqueness#
Limitations#

After the release they also conducted a deeper study on GPT4-based evaluation approach.

Then came in updates like LLaMa-Adapter [107], Koala and in less than a month Open Assistant launches a model and a dataset for Alignment via RLHF [108].

Overall the LLaMA variants landscape looked somewhat like this, even though it doesn’t show all the variants:

https://static.premai.io/book/models_llama-variants.png

Fig. 31 Page 10, A Survey of Large Language Models [90]#

After a month, WizardLM dropped in which gained a lot of popularity mainly due to its ground breaking performances compared to other open LLMs. And in next few days the community did an open reproduction of LLaMA, named OpenLLaMA.

WizardLM#

WizardLM is created by fine-tuning LLaMA on a generated instruction dataset which was created by Evol-Instruct [109].

Uniqueness#
Limitations#

OpenLLaMA#

Students at UC Berkeley started OpenLM Research group through which they trained in collaboration with Stability AI to release OpenLLaMA v1, a permissively licensed open source reproduction of Meta AI’s LLaMA. They released a series of 3B, 7B and 13B models trained on different mix of datasets. And the weights released can serve as drop in replacement of LLaMA.

Uniqueness#
Limitations#

Around same time MosaicML released its MPT models series, and TII also released Falcon models.

MPT#

MosaicML released MPT (MosaicML Pretrained Transformer) models series consisting:

Uniqueness#
Limitations#

Falcon#

TII released Falcon series of 40B, 7.5B and 1.3B parameters LLMs, trained on their open sourced and curated RefinedWeb dataset. After the release it has dominated the Huggingface’s open llm leaderboard for the State of the Art open sourced LLM for more than 2 months.

Uniqueness#
Limitations#

LLaMA-2#

On 18th July, Meta AI released LLaMA-2, breaking most SotA records on open sourced LLMs performances.

Meta AI facebookresearch/llama with both pre-trained and fine-tuned variants for a series of 7B, 13B and 70B parameter sizes.

Some win rate graphs on LLaMA-2 after evaluation comparisons against popular LLMs where it roughly ties with GPT-3.5 and performs noticeably better than Falcon, MPT and Vicuna.

https://static.premai.io/book/models_llama2-rates.png

Fig. 33 Page 3, LLaMA 2: Open Foundations and Fine-Tuned Chat Models [114]#

Uniqueness#
Limitations#

Till now we’ve mostly been looking at LLMs in general and not other models, let’s look at the vision domain now.

Stable Diffusion XL#

StabilityAI released Stable Diffusion XL 1.0 (SDXL) models on 26th July, being current State of the Art for text-to-image and image-to-image generation open sourced models. They released a base model and a refinement model which is used to improve the visual fidelity of samples generated by SDXL.

Few months back they released Stable-diffusion-xl [117] base and refinement models versioned as 0.9, where license permitting only research purpose usages.

SDXL consistently surpasses all previous versions of Stable Diffusion models by a significant margin:

https://static.premai.io/book/models_sdxl-winrate.png

Fig. 38 SDXL Winrate#

Uniqueness#
Limitations#

In the domain of Image generation currently Midjourney is one of the most popular proprietary solutions for simple users.

Following the timeline and going back to text domain, coder models are gaining lot of popularity too, specially looking at the code generation or code analysis capabilities of OpenAI’s codex and GPT-4, there has been few releases on code LLMs like WizardCoder [118], StarCoder, Code LLaMA (state of the art) and many more.

Code LLaMA#

Code LLaMA release by Meta AI (right after ~1.5 month from LLaMA 2’s release) caught lot of attention being full open source. And currently its fine-tuned variants are state of the art among open source coder models.

Uniqueness#
Limitations#

Persimmon 8B#

Persimmon 8B is a standard decoder-only transformer model released under an Apache-2.0 license. Both code and weights are available at persimmon-ai-labs/adept-inference.

Uniqueness#
Limitations#

Mistral 7B#

Mistral 7B is released by Mistral AI, a french startup which recently raised a good seed round. The team comprises of ex-Deepmind and ex-Meta researchers, who worked on LLaMA, Flamingo [122] and Chinchilla projects.

Uniqueness#
Limitations#

3.Comparisons#

在这里,我们回顾了文本和视觉领域中流行模型的特性。

将大型语言模型与一个唯一的真实标准进行比较是一项非常困难的任务,而比较视觉模型则更加困难。因为在一般化能力的同时,非常重要的是要注意模型可能存在的种族性别宗教其他偏见。有许多流行的排行榜,用来跟踪这些模型的综合或特定性能:基于社区筛选的评估数据集/任务,可以衡量各种能力。

我们当前的比较方法包括在每个数据集上评估每个模型,并计算数据集之间的平均分数。结合人工评估GPT-4的比较,可以得到一种相对可信赖的分数,用于追踪当前的最佳模型。但是,当前的方法还不足以完全满足需求,即使像GPT-4这样的支柱模型也会出现问题,而且很难确定评估集中有多少相似的数据实际上是训练集的一部分。

3.1.Language#

Open LLM Leaderboard shows us that Falcon 180B is currently just ahead of Meta’s LLaMA-2 70B, and TII claims that it ranks just behind OpenAI’s GPT 4, and performs on par with Google’s PaLM-2 Large, which powers Bard, despite being half the size of the model. But it required 4x more compute to train and it’s 2.5 times larger compared to LLaMA-2, which makes it not so cost-effective for commercial usages.

For practical commercial usage models ranging below 14B parameters has been a good candidate, and Mistral 7B, LLaMA-2 7B, Persimmon 8B does a great job showing that.

Overall let’s take look at the few discussed LLMs’ attributes to get the bigger picture.

Table 5 Under 15 Billion Parameters#

LLMs Params/[B] Dataset Release Details Tokens/[B] VRAM/[GB] License Commercial Usage
Mistral 7B 7.3 - Blog - 17+ Apache-2.0
LLaMA-2 13B 13 - [114] 2000 29+ LLaMA-2
LLaMA-2 7B 7 - [114] 2000 15.8+ LLaMA-2
Persimmon 8B 9.3 - Blog 737 20.8+ Apache-2.0
WizardLM 13B 13 evol-instruct [109] ~2000 30+ LLaMA-2
WizardLM 7B 7 evol-instruct [109] ~2000 15.8+ Non-Commercial
Falcon 7B 7 RefinedWeb (partial) - 1500 16+ Apache-2.0
MPT 7B 6.7 RedPajama Blog 1000 15.5+ Apache-2.0

3.2.Vision#

StabilityAI’s SDXL vs Midjourney comparison shows that it is on par with favourability.

https://static.premai.io/book/models_sdxl-midjourney.png

Fig. 44 Page 14, SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis [117]#

Note

Above experiment is against Midjourney v5.1, whereas current latest is Midjourney v5.2.

Future#

To recap current advancements we can see that few key moments were:

Even though Open Source AI is advancing, it is evident that it remains heavily regulated by major corporations such as Meta, OpenAI, Nvidia, Google, Microsoft, and others. These entities often control critical parameters, creating a myth of open source AI [125], including:

尽管开源人工智(Open Source AI)能正在不断发展,显然,它仍然受到主要跨国公司的严格监管,如Meta、OpenAI、Nvidia、Google、Microsoft等等。这些组织/团队通常掌控关键参数,并构造了关于开源人工智能的神话,其中包括:

Returning to actual state, there are significant gaps that need to be addressed to achieve true progress in the development of intelligent models. For instance, recent analyses have revealed the limited generalization capabilities [126], current LLMs learn things in the specific direction of an input context window of an occurrence and may not generalize when asked in other directions.

回到实际情况,有一些重要的问题需要解决,来推动 AI 的向前发展。例如,最近的分析显示了有限的泛化能力 [126],当前的LLMs在出现的输入上下文窗口特定方向中学习东西,当在其他方向提出问题时可能无法泛化。

MoE(Mixture-of-Experts)模型的崛起引起了人们的关注和研究兴趣,尤其是在有关GPT-4架构的传闻之后。开源社区已经在实现各种MoE变体方面取得了进展(例如XueFuzhao/OpenMoE),展示了朝着更多功能的模型架构的发展方向。

另一方面,使用模型的量化版本的各种应用正在涌现,因为它使在低精度上运行大型模型(>30B参数)成为可能,即使在只有CPU的机器上也可以运行。Specially lots of contributions in this area is coming up by ggerganov/ggml community and TheBloke.

原文地址:https://ningg.top/ai-series-prem-04-models/
微信公众号 ningg, 联系我

同类文章:

微信搜索: 公众号 ningg, 联系我, 交个朋友.

Top