Please suggest some good self-hostable RAG for my LLM.

Maroon@lemmy.world · edit-2 2 years ago

Please suggest some good self-hostable RAG for my LLM.

chagall@lemmy.world · 2 years ago

You should ask @brucethemoose@lemmy.world. He seems to know all about this stuff.

brucethemoose@lemmy.world · 2 years ago

I have an old Lenovo laptop with an NVIDIA graphics card.

@Maroon@lemmy.world The biggest question I have for you is what graphics card, but generally speaking this is… less than ideal.

To answer your question, Open Web UI is the new hotness: https://github.com/open-webui/open-webui

I personally use exui for a lot of my LLM work, but that’s because I’m an uber minimalist.

And on your setup, I would host the best model you can on kobold.cpp or the built-in llama.cpp server (just not Ollama) and use Open Web UI as your front end. You can also use llama.cpp to host an embeddings model for RAG, if you wish.

This is a general ranking of the “best” models for document answering and summarization: https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

…But generally, I prefer to not mess with RAG retrieval and just slap the context I want into the LLM myself, and for this, the performance of your machine is kind of critical (depending on just how much “context” you want it to cover). I know this is !selfhosted, but once you get your setup dialed in, you may consider making calls to an API like Groq, Cerebras or whatever, or even renting a Runpod GPU instance if that’s in your time/money budget.

Zelyios@lemmy.world · 2 years ago

You can use h2ogpt which allows you to build a RAG choosing your documents without coding anything

Antiochus@lemmy.one · 2 years ago

I’m not sure how well it would work in a self-hosted or server-type context, but GPT4all has built in RAG functionality. There’s also a flatpak in addition to the Windows, Mac and .deb installs.

BaroqueInMind@lemmy.one · 2 years ago

Why not use this and select whatever LLM to leverage as a RAG? It literally allows you to self host the model and select any model for both chat and RAG analysis. I have it set to Hermes3 8B for chat and a 1.3B Llama3 as the RAG.

BitSound@lemmy.world · 2 years ago

Not sure how ollama integration works in general, but these are two good libraries for RAG:

https://github.com/facebookresearch/faiss

https://pypi.org/project/chromadb/

filister@lemmy.world · 2 years ago

Why don’t you build your own?