About six to nine months ago a lot of you were hot on self hosting LLMs. I wasn’t really keeping up, but i tried by installing ollama and open-webui on hardware that i already owned.
I’m currently working on a large language model “agent”. I’m playing catch up on that subject too. From what i can gather, you can write a web ui that lets users submit their question to the LLM - nothing new so far.
The “agent” aspect of all of this has two parts…
- this intervening web app has a bunch of methods that it can call on behalf of the upstream LLM, and
- a “system” message is included at the beginning of the whole engagement detailing how those methods can be used on the LLM’s behalf to achieve a result more informed than what the LLM would have been able to achieve by itself.
So, that initial question froom the user sets in motion a back and forth between the llm and your agent. The LLM can include in it’s response that it would like some data retrieved from your database (for example), and the LLM can in corporate the results in it’s subsequent answers. What then plays out is a kind of “tennis match” of back and forth between the LLM and the agent trying to fulfil the user’s request.
So, while i’m wrestling with all of this unfamiliar stuff, i thought i’m put all of this to you guys. Do any of you have any experience with this kind of thing?
I know that @techman went all in on building his own self hosted LLM. I think you got it to the point where it could get an answer back to you quickly enough that you had little need to use cloud hosted alternatives. My machine has 32G of RAM and an old GPU with 4G. It’s been okay for testing, but the models that i can host on it fall short of what i’ve seen Gemini and ChatGPT achieve. I feel silly saying this, but those models just seemed “smarter”. The models that i can run on my old gear do not pass the turing test IMO, whereas the cloud hosted ones do.
Would you say that it’s time for me to trade one of my cars in for a computer that can do what i’m trying to do @techman ?