You can do this in Windows too, useful if you want tiny executables that use minimum resources.
I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute
You have your usual Win32 API functions found in libraries like Kernel32, User32, and GDI32, but since after Windows XP, those don't actually make system calls. The actual system calls are found in NTDLL and Win32U. Lots of functions you can import, and they're basically one instruction long. Just SYSENTER for the native version, or a switch back to 64-bit mode for a WOW64 DLL. The names of the function always begin with Nt, like NtCreateFile. There's a corresponding Kernel mode call that starts with Zw instead, so in Kernel mode you have ZwCreateFile.
But the system call numbers used with SYSENTER are indeed reordered every time there's a major version change to Windows, so you just call into NTDLL or Win32U instead if you want to directly make a system call.
Windows isn’t quite like Linux in that typically apps don’t make syscalls directly. Maybe you could say what’s in ntdll is the system call contract, but in practice you call the subsystem specific API, typically the Win32 API, which is huge compared to the Linux syscall list because it includes all sorts of things like UI, COM (!), etc.
The project has some of the properties discussed above such as not having a typical main() (or winmain), because there’s no CRT to call it.
Fair point on latency, we (Azure AI Search) target both scenarios with different features. For instant search you can just do the usual hybrid + rerank combo, or if you want query rewriting to improve user queries, you can enable QR at a moderate latency hit. We evaluated this approach at length here: https://techcommunity.microsoft.com/blog/azure-ai-foundry-bl...
Of course, agentic retrieval is just better quality-wise for a broader set of scenarios, usual quality-latency trade-off.
We don't do SPLADE today. We've explored it and may get back to it at some point, but we ended up investing more on reranking to boost precision, we've found we have fewer challenges on the recall side.
Code-golfed Game of Life in Javascript we wrote a while back with some friends. Always surprising how much abuse Javascript can take: https://shorterlife.github.io/challenge/
We also included supporting data in that write up showing you can improve significantly on top of Hybrid/RRF using a reranking stage (assuming you have a good reranker model), so we shipped one as an optional step as part of our search engine.
Here's a bit of a quantification of the point Doug makes. Indeed, for a number of scenarios you get better results if you combine vector search and keyword search into a hybrid retrieval step, and do reranking on top of that.
This is interesting. I recently built a search tool that needed to locate documents by keyword or by semantics, so I implemented a hybrid search straight away: BM25 + embeddings (from `gte-base`), with a cross-encoder for reranking.
I found that the lexical search was adding nothing; the embeddings alone produced almost identical results for keyword queries. (The re-ranker, however, made a big difference.)
It depends on the scenario. For example, for concept-seeking queries, vectors tend to do better (less likely to be an overlap in words between query and content), whereas for keyword searches (a product name, a serial number, project codenames, etc.) BM25 + keywords does much better. If your workload is all concept-seeking queries, it's reasonable that keywords don't add much.
If you look at the table in the section "3. Hybrid Retrieval brings out the best of Keyword and Vector Search" of that article, we shared there the significant variability of metrics as a function of query types.
Check out FeatureBase, when you get a chance. Vectors and super fast operations on sets. I'm using it for managing keyterms extracted from the text and stored along with the vectors.
The Cognitive Search team continues to grow. We're looking for engineering managers for multiple locations. Join us to work at the intersection of cloud services, information retrieval and machine learning.
We're growing our teams in a number of different areas. If you have expertise in database systems internals (query processing, optimization, distributed query execution, storage engines), search (lucene, elasticsearch, infrastructure) or cloud systems (any cloud platform), we probably have a role that could be interesting for you.
Our current positions are all for folks with at least 2-3 years of experience and go up to fairly senior. We don't have entry-level positions in these teams right now (I'm sure other parts of Microsoft most likely have some though).
If interested apply on the links above, or email me and I'll route to the right hiring manager. pablo <dot> castro <at> company's domain.
I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute