Large Language Model Agents

About six to nine months ago a lot of you were hot on self hosting LLMs. I wasn’t really keeping up, but i tried by installing ollama and open-webui on hardware that i already owned.

I’m currently working on a large language model “agent”. I’m playing catch up on that subject too. From what i can gather, you can write a web ui that lets users submit their question to the LLM - nothing new so far.

The “agent” aspect of all of this has two parts…

  • this intervening web app has a bunch of methods that it can call on behalf of the upstream LLM, and
  • a “system” message is included at the beginning of the whole engagement detailing how those methods can be used on the LLM’s behalf to achieve a result more informed than what the LLM would have been able to achieve by itself.

So, that initial question froom the user sets in motion a back and forth between the llm and your agent. The LLM can include in it’s response that it would like some data retrieved from your database (for example), and the LLM can in corporate the results in it’s subsequent answers. What then plays out is a kind of “tennis match” of back and forth between the LLM and the agent trying to fulfil the user’s request.

So, while i’m wrestling with all of this unfamiliar stuff, i thought i’m put all of this to you guys. Do any of you have any experience with this kind of thing?

I know that @techman went all in on building his own self hosted LLM. I think you got it to the point where it could get an answer back to you quickly enough that you had little need to use cloud hosted alternatives. My machine has 32G of RAM and an old GPU with 4G. It’s been okay for testing, but the models that i can host on it fall short of what i’ve seen Gemini and ChatGPT achieve. I feel silly saying this, but those models just seemed “smarter”. The models that i can run on my old gear do not pass the turing test IMO, whereas the cloud hosted ones do.

Would you say that it’s time for me to trade one of my cars in for a computer that can do what i’m trying to do @techman ?

1 Like

I wouldn’t because as you have already deduced, any small LLM, say only 70GB has a lot less training than the huge models online and as such start hallucinating earlier, and just don’t know the answer.

While I still do use my main in house model “michaelneale/deepseek-r1-goose:70b” at 42 GB to bounce ideas off, it’s a great partner to discuss designs with when I don’t have a kick-arse engineer buddy handy.

And I don’t have any kick-arse engineer buddies anyway.

But It can’t solve things I can’t and it only knows trivial stuff I’ve forgotten or never bothered to learn, but the old truism, “two heads are better than one’“ still holds true in my case.

When I really need to talk to a eight armed sword whirling dervish monster of a AI about something hard, I use a online instance of Kimi-K2 because its awesome, one of the few that knows Forth and is dirt cheap. A typical effort costs me around 2 cents, where Claude AI would cost 50 cents or more,

I always use Kimi-K2 via ‘Aider’ which is an agentic editor that keeps the AI muzzled and away from my PC except where I give it specific permission to access a file because it’s bad mannered and can do all kinds of damage. The file I give it permission to write is versioned so I can restore it if needed and run a ‘diff’ to see what really happened.

I use ‘Kimi-k2’ via ‘Aider’ thru ‘Openrouter” where I maintain a $10 credit. I’m down to $8 after 3 months.

So would I spend money on two expensive GPU’s again ? No, I’d still buy one and dual use it as a video card and then still have “michaelneale/deepseek-r1-goose:70b” which has replaced Google and is indispensable in many ways.

As regards obtaining a kick-arse GPU, I’d wait 6 months, because I suspect we will all be drowning in dirt cheap H20’s as Nvidia and everyone else dump them when the AI bubble bursts and the market is flooded with second hand units. I’d buy 4 of them then at $100 each, one to use and three spares. So start saving :slight_smile:

P.s. I never use CrapGPT as I find it’s very weak and is probably powered by a bunch of mice running inside a cage in Sam Altmans office.

Cheers,

Terry

1 Like

Not directly answering your question (as I have no experience with anything agentic, yet), but I do want to have a play with smolagents when I get my current “offline homework” wrapped up and can come back to the homelab. I believe it’ll work just fine with Ollama and if anyone is going to have something expert level scripted up in a half day using smolagents, it’ll be you, @jdownie :joy:.

Also on my todo list is to check out some of the MiniMax models. I’ve seen some odds and ends online about their models and understand that they have some agent specific stuff as well (minimax.io / MiniMax Agent). Paging @skalyan to the thread in case he’s got experience with agents or has experimented with the MiniMax model(s).

Honestly, buy a second hand Apple Silicon MacBook or Mini with as much RAM as you can. I am astonished at how well they perform when up against my desktop PCs when performing LLM type tasks!

1 Like

Hello all,
Apologies for not being at the meetings but I can definitely chime in here.
So Agentic LLMs / AI, are LLMs that can use tools to obtain information prior to talking to the user. Consider it an internal monologue for a LLM, where it goes “ok what are my steps to answer this, ok I need database information do I have a tool for that? Oh yes I do ok I’ll utilise that tool and get the information. Ok here’s my information now can I answer the users question”. I’ve played a fair bit with these using N8N but OpenWeb UI does allow Agent tools I believe.
Is it worth self-hosting Agentic LLMs, personally no. They are complex behemoths and your electricity bill may be more than the cost of using one through OpenRouter.
N8N allows you to host the logic for the tools, for example, you could use OpenAI ChatGPT models but the tool authentication or usage lives on your self-hosted instance rather than using someone else’s MCP (MCP is a website that serves tools for Agentic LLMs). You can also use docker to spin up MCP servers from different providers and point your Agentic LLMs at those. Ollama also recently brought in cloud hosted models, so Ollama hosts the LLM on their cloud but Ollama on your machine is the entry point to it.
Summary: there’s so many ways to utilise Agentic LLMs, enjoy the ride of learning what works for you.

:thinking:

I’m writing a class in python that contains all of the @staticmethods that I want to make available to the llm. I’m explaining how to use that class with do strings and then including pydoc’s output in my initial system message to the llm. All of this is the way that I’m setting out on this journey. So far I’ve only implemented one method! pi(n), which returns pi to n decimal places. It’s answer is deliberately wrong, because I want to be able to see where this whole agent thing is ignoring my provided tools and answering the question with it’s own logic. This approach is what has revealed my self hosted models to be insufficient for my needs.

You (@shirbo) made reference to N8N. Have you done similar stuff to what I’m attempting using N8N instead of my python approach? Have you implemented your own routines and made them available to a cloud hosted LLM?

Wow that’s a clever way to do it! Wait till you get a Coding LLM to write its own tools (which would be plausible).

I haven’t made a public MCP for cloud LLM as I haven’t researched the authentication component to know my data would be safe. However, using the Agent Node in N8N and hooking it up to a home built MCP is doable I haven’t done it though.

I have an Agent N8N work node that has a Google Calendar tool to gather my calendar or add to my calendar. It also has a HTTP request tool to get travel time for locations via Google API. This agent can then go through, see if a calendar event exists, if not calculate the travel time to a destination and add it to my calendar.

I can’t take credit for this python approach. I’m helping out a guy at work that’s done heaps of stuff using streamlit, and i’m playing catch up. I’m not in love with streamlit, bu what he’s achieved with it is very impressive. I’m trying to write my own front end to get streamlit out of my way, but i’m stalled troubleshooting websockets (which i’m also unfamiliar with). I’m learning a lot of stuff :slight_smile: