
Choosing a model for BYOK mode
If you want to use a local LLM with VS Code’s bring-your-own-model system, the first thing you need is a way to host the model. VS Code lacks a model-hosting mechanism of its own, although it’s conceivable that a VS Code extension may offer something like that in the future. That said, hosting models is complicated enough that a dedicated app is really needed for the job.
One easy way to host models is via a product like LM Studio, a convenient GUI for standing up, serving, and managing LLMs on one’s own hardware. The model host does not have to be the same system you run VS Code on, either. It can be on a server box you control, or on a cloud instance.
The choice of model is also important. Many models are powerful but won’t run well on commodity hardware because they’re simply too big. A good rule of thumb is to choose a model that fits into existing VRAM, along with the memory needed for a sizable token context (the more, the better). Also, the model should be suited to coding and development work. Some models in this vein that fit comfortably into 8GB VRAM include:

