I am using LangChain with a SQLite database - it works pretty well on a 16G GPU, but I started running it on a crappy NUC, which also worked with lesser results.
The real lightbulb moment is when you realise the ONLY thing a RAG passes to the LLM is a short string of search results with small chunks of text. This changes it from 'magic' to 'ahh, ok - I need better search results'. With small models you cannot pass a lot of search results ( TOP_K=5 is probably the limit ), otherwise the small models 'forget context'.
It is fun trying to get decent results - and it is a rabbithole, next step I am going into is pre-summarising files and folders.
You can modify this, theres settings for
- how much context
- chunk size
We had to do this, 3 best matches but about 1000 characters each was far more effective than the default I ran into of 15-20 snippets of 4 sentences each
We also found a setting for "when do you cut off and/or start" the chunk, and set it to double new lines
Then just structured our agentic memory into meaningful chunks with 2 new lines between each, and it gelled perfectly.
I am working on a local RAG LLM designed for lower end PC's - ability for people to try out searching their own documents, seeing it was such a learning curve to get to this stage - hoping others can learn from my mistakes.
Switching models when running locally is fairly easy - as long as you have them downloaded you can switch them in and out with a just a config setting - cant quite remember, but you may need to rebuild the vectorstore when switching though.
LangChain has the embeddings for major providers:
def build_vectorstore(docs):
"""
Create vectorstore from documents using configured embedding model.
"""
# Choose embedding model
if cfg.EMBED_MODEL.lower() == "openai":
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
elif cfg.EMBED_MODEL.lower() == "huggingface":
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
elif cfg.EMBED_MODEL.lower() == "nomic-embed-text":
from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(model=cfg.EMBED_MODEL)
Additionally, there was an option that was on by default to use Clippy in place of confirmation dialogs. You'd try to close an unsaved file and instead of the usual Windows dialog you'd get Clippy asking whether you'd like to save changes instead.
That would be violating the second design principle:
"When robots and people coexist in the same spaces, the robots must not take away from people’s agency, particularly when the robots are failing, as inevitably they will at times."
With a physical robot, if it fails and freezes, it turns into a hazard.
With Clippy, it intrusively stops humans from being able to do what they are doing.
> Some people love programming, for the sake of programming itself.
And this is what is causing the friction against LLM's (which are quite useful for getting up to speed with a new concept / language ), the programming itself is the fun bit - I still want to do that bit!
Very well done - it took me several attempts to get the hang of Blender and I still only know / used half those shortcuts. Thanks for the easy to use tips!
I think Google has lost all faith in terms of keeping projects around - especially when they involve data locked into a mildly complex system without a complete migration path out.
I would be wary investing time in learning / using any new products from them.
Cyc seemed to be the best application for proper AI in my opinion - all the ML and LLM tricks are statistically really good, but you need to parse it through Cyc to check for common sense.
I am really pleased they continue to work on this - it is a lot of work, but it needs to be done and checked manually, once done the base stuff shouldn't change much and it will be a great common sense check for generated content.
Our ETL process is heavily monitored so we never miss a days data, but we got a surprising error "cant build aggregates - missing data, aborting MV refresh, data will be a day old".
It was the year to date (YTD) calculation - no data for 29/2/2023 to compare to today.
Frankly unless you've considered it ahead of time and thought you'd handled it, IMO you want the error. What's the correct thing to do here? I don't think it's necessarily -365d, it might be, but if I was GP I'd be glad for the chance to consider it and decide what's correct - instead of it just blowing up or even worse silently going whichever way's wrong and undetected for a while.
reply