Are you using ollama for the local hosting?Reporting in.
Been nerding out hard tonight.
I've been researching locally hosted LLMs and trying them on my Nvidia 4070 for years now, and only until today was i actually impressed.
New open source models like Alibaba's Qwen3 pack a huge punch for their size and run great on limited hardware.
There's also unsloth re-quantizations of these models that substantially increase performance + let you pack a bigger model into a GPU.
This combined with an 'agentic' VS code plugin ( continue.dev ) connected to my local LLM allows me to chat with source code and gradually compose the code by telling the AI what to do.
View attachment 369925
This test for memory cache engines is 95% written by AI and it wrote the code a little faster than i would have done by hand.
It was completely accurate and the LLM required minimal steering on my part.
I can't imagine what kind of better model you could run on a Nvidia 5090.. as the model's parameter size increases, so does the intelligence and accuracy. With the right hardware, these open source models can start to compete with stuff that big companies are selling.
So basically yes, i'd say we're past the threshold where these tools are useful !
Do you have equipment to measure power draw during longer requests?