LM Studio has quickly become the best way to run local LLMs on an Apple Silicon Mac: no offense to vllm/ollama and other terminal-based approaches, but LLMs have many levers for tweaking output and sometimes you need a UI to manage it. Now that LM Studio supports MLX models, it's one of the most efficient too.
I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.
This isn't true. You can `ollama run {model}`, `/set parameter num_ctx {ctx}` and then `/save`. Recommended to `/save {model}:{ctx}` to persist on model update
As of 2 weeks back if I did this, it would reset back the moment cline made an api call. But lm studio would work correctly. I’ll have to try again. Even confirmed cline was not overriding num context
I just wish they did some facelifting of UI. Right now is too colorfull for me and many different shades of similar colors. I wish they copy some color pallet from google ai studio or from trae or pycharm.
Near as I can tell it's supposed to make calling other people's tools easier. But I don't want to spin up an entire server to invoke a calculator. So far it seems to make building my own local tools harder, unless there's some guidebook I'm missing.
Your not spinning up a whole server lol, most MCP's can be run locally, and talked to over stdio, like their just apps that the LLM can call, what they talk to or do is up to the MCP writer, its easier to have a MCP that communicates what it can do and handles the back and forth, than writing a non-standard middleware to handle say calls to an API or handle using applescript, or vmware or something else...
I wish the documentation was clearer on that point; I went looking through their site and didn't see any examples that weren't oversimplified REST API calls. I imagine they might have updated it since then, or I missed something.
It's a protocol that doesn't dictate how you are calling the tool. You can use in-memory transport without needing to spin up a server. Your tool can just be a function, but with the flexibility of serving to other clients.
Are there any examples of that? All the documentation I saw seemed to be about building an MCP server, with very little about connecting an existing inference infrastructure to local functions.
If you go to huggingface.co, you can tell it what specs you have and when you go to a model, it'll show you what variations of that model are likely to run well.
So if you go to this[0] random model, on the right there is a list of quantifications based on bits, and those you can run will be shown in green.
LM Studio's model search is pretty good at showing what models will fit in your VRAM.
For my 16gb of VRAM, those models do not include anything that's good at coding, even when I provide the API documents via PDF upload (another thing that LM Studio makes easy).
So, not really, but LM Studio at least makes it easier to find that out.
I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.