Codebased combines Tree Sitter for code awareness (find functions, data structures, constants, etc. not just lines of code), full-text search using SQLite, and semantic search using OpenAI embeddings + FAISS.
Despite being implemented in Python, supporting semantic search, making multiple API calls for embedding and re-ranking, it is faster than ripgrep for runng searches against the Linux kernel (takes ~1 second vs. ~2 seconds, obviously depends on system, temperature, time of day, tidal forces, etc.)
Up next:
- A Perplexity-like agent for interpreting results, making multiple follow-up searches, etc.
- Custom embedding and re-ranking stack
- Agent for running shell commands, editing code, etc. similar to SWe-Agent:
https://arxiv.org/pdf/2405.15793.