NingG +

AI 系列:Desktop Apps

原文:Desktop Apps

ChatGPT 和 GPT-4 在过去半年内席卷了人工智能领域,但开源模型也在迎头赶上。要达到 OpenAI 模型的性能水平,还有很多工作要做。在许多情况下,与部署在云服务器上的大型语言模型相比,ChatGPT 和 GPT-4 显然更胜一筹,因为每个 OpenAI API 请求的成本相对较低,而云服务(如 AWS、Azure 和 Google Cloud)上的模型托管成本较高。但对于某些业务案例,开源模型始终比像 ChatGPT/GPT-4 这样的封闭 API 更有价值。像法律、医疗保健、金融等行业的人对数据和客户隐私有所顾虑。

一个新而令人兴奋的领域是支持在本地运行强大语言模型的桌面应用程序(desktop apps)。可以说,成功的桌面应用程序,在某些敏感情况下比基于云的服务更有用。这是因为数据、模型和应用程序都可以在通常可用的硬件上本地运行。本文,我会介绍一些新兴的 LLM 桌面应用解决方案,包括不同应用的优势、局限性以及差异点的比较。

Table 13 Comparison of Desktop Apps#

Desktop App Supported Models GPU support Layout Configuration Extra Features OS Future Roadmap
LM Studio 🟡 GGML 🟢 Yes Clean, clear tabs. Hardware config choices (GPU, RAM, etc.). Can choose multiple inference params (temperature, repeat penalty, etc.). Local server deployments Windows, Linux, MacOS Not mentioned
GPT4All 🟡 GGML 🔴 No Unclear tabs. Minimal hardware config options. Can choose inference params. Contribute & use training data from the GPT4All datalake Windows, Linux, MacOS Building open-source datalake for future model training
koboldcpp 🟡 GGML 🔴 No Cluttered UI. Some hardware config options. Unique inference/app params e.g. scenarios. Cool story, character, and adventure modes Windows, Linux, MacOS Not mentioned 🟡 GGML 🔴 No Clear tabs. Minimal hardware config options. Can choose inference params. Light/dark modes Windows, Linux, MacOS Text-to-audio, OpenAI functions
Ollama 🔴 few GGML models 🟡 Yes (metal) Basic, terminal-based UI. Multiple hardware configurations, need to save as a file prior to running. Multiple inference params, need to save as a file. Run from terminal MacOS Windows, Linux support

LM Studio#

LM Studio is an app to run LLMs locally.

UI and Chat#

LM Studio is a desktop application supported for Windows and Mac OS that gives us the flexibility to run LLMs on our PC. You can download any ggml model from the HuggingFace models hub and run the model on the prompts given by the user.

The UI is pretty neat and well contained:

Fig. 67 LM Studio UI#

There’s a search bar that can be used to search for models from the HuggingFace models to power the chat.

Fig. 68 LM Studio Model Search#

The Chat UI component is similar to ChatGPT to have conversations between the user and the assistant.

Fig. 69 LM Studio Chat Interface#

This is how the TheBloke/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q5_K_S.bin responds to a simple conversation starter.

Fig. 70 LM Studio Chat Example#

Local Server#

One useful aspect is the ability to build a Python or Node.js application based on an underlying LLM.

Fig. 71 LM Studio Local Server#

This enables the user to build applications that are powered by LLMs and using ggml models from the HUggingFace model library (without API key restrictions).

Think of this server like a place where you make API calls to and get the response. The only change is that this is a local server and not a cloud based server. This makes it quite exciting to use the hardware in your system to power the LLM application that you are building.

Let’s spin up the server by hitting the Start server button🎉. That was a quick one and by default it is served in port 1234 and if you want to make use of some other port then you can edit that left to the Start server button that you pressed earlier. There are also few parameters that you can modify to handle the request but for now let’s leave it as default.

Go to any Python editor of your choice and paste the following code by creating a new .py file.

import openai
\# endpoint:port of your local inference server (in LM Studio)
openai.api\_key\=''  \# empty
prefix \= "### Instruction:\\n"
suffix \= "\\n\### Response:"

def get\_completion(prompt, model\="local model", temperature\=0.0):
    formatted\_prompt \= f"{prefix}{prompt}{suffix}"
    messages \= \[{"role": "user", "content": formatted\_prompt}\]
    print(f'\\nYour prompt: {prompt}\\n')
    response \= openai.ChatCompletion.create(
    return response.choices\[0\].message\["content"\]

prompt \= "Please give me JS code to fetch data from an API server."
response \= get\_completion(prompt, temperature\=0)
print(f"LLM's response:{response}")

This is the code that I ran using the command python3 <filename>.py and the results from server logs and terminal produced are shown below:

Fig. 72 LM Studio Local Server Example#

Model Configurations & Tools#

By default we have a few presets already provided by LM studio but we can tweak them and create a preset of our own to be used elsewhere. The parameters that are modifiable are:

Tools focus on the response and UI of the application. The parameters modifiable are as follows:



The GPT4All homepage states that

GPT4All is an ecosystem to train and deploy powerful and customised large language models that run locally on consumer grade CPUs.

UI and Chat#

The UI for GPT4All is quite basic as compared to LM Studio – but it works fine.

Fig. 73 GPT4All UI#

However, it is less friendly and more clunky/ has a beta feel to it. For one, once I downloaded the LLaMA-2 7B model, I wasn’t able to download any new model even after restarting the app.

Local Server#

Like LM Studio, there is a support for local server in GPT4All. But it took some time to find that this feature exists and was possible only from the documentation. The results seem far better than LM Studio with control over number of tokens and response though it is model dependent. Here’s the code for the same:

import openai
openai.api\_base \= "http://localhost:4891/v1"
openai.api\_key \= ""
\# Set up the prompt and other parameters for the API request
prompt \= "Who is Michael Jordan?"
model \= "Llama-2-7B Chat"
\# Make the API request
response \= openai.Completion.create(
\# Print the generated completion

The response can be found for the example prompt:

Fig. 74 GPT4All UI Example#

Model Configurations & Tools#

As you can see – there is not too much scope for model configuration, and unlike LM Studio – I couldn’t use my GPU here.

Fig. 75 GPT4All UI Model Configuration#


LostRuins/koboldcpp is a fun twist on LLMs – adding game like scenarios and adventures. It supports adding base ggml models as the LLM engine, and spinning stories based on user inputs.

UI and Chat#

The UI is pretty basic – and you get some surprising answers. Here I ask a simple icebreaker question – and you see that it responds that it is a friendly AI that likes to play games.

Fig. 76 koboldcpp UI#


You can also enter different sorts of scenarios and modes.

Fig. 77 koboldcpp Scenarios#

Below is the Julius Caesar scenario!

Fig. 78 koboldcpp Julius Caesar Chat#

Model Configuration and Tools#

Many of the model configurations are similar to the default that is offered. But there are some interesting twists like story mode, adventure mode, and instruct mode.

Fig. 79 koboldcpp Julius Model Configuration#

The App from louisgv/ (not to be confused with LocalAI from mudler/LocalAI) is a simple application for loading LLMs after you manually download a ggml model from online.

UI and Chat#

The UI and chat are pretty basic. One bug that I noticed was that it wasn’t possible to load models from the UI – I had to manually download the model and then use the app.

Fig. 80 UI#

Model Configuration and Tools#

Pretty standard prompt related configurations. It appears there is no GPU.


Ollama is an LLM based conversational chat bot that can be run from a MAC terminal. It is simple to get started. Currently, it is available only for the Mac OS but support for Windows and Linux are coming soon.

UI and Chat#

Neat clean and crisp UI, just >>> in the terminal and you can paste your prompt. The response time will vary according to the model size but responses are mostly acceptable. I tested the LLaMA model which is the most recently supported model and the results were good.

Fig. 81 Ollama Example#

Note: It just takes some time initially for the model to download locally, but later whenever you need to access the model there is no lag in accessing the requested model.

Model Configuration and Tools#

The list of ~20 models can be accessed here.

They are constantly growing and multiple changes have happened quite recently. It can support models ranging from lite to robust models. It also has special support for specific functionality like performing Mathematical calculations. There is a WizardMath model that addresses these use case – read more about this in their official blog published by the Ollama team.


ollama run “model location in the system”

微信公众号 ningg, 联系我


微信搜索: 公众号 ningg, 联系我, 交个朋友.