A simple mobile web app inspired by Fuzzy-Search/realtime-bakllava that uses llama.cpp server backend with multimodal mode to describe and narrate what the phone camera sees.
I built this thing in a few hours using a single ChatGPT thread to generate most things for me and iterate on this project. Here's the workflow: https://chat.openai.com/share/ea84ec69-5617-45e8-8772-ac2dcf...