A header-only C vector database library
85 points by abdimoalim 2 days ago | 44 comments

eatonphil 2 days ago
As data stores go go this is basically in memory only. The save and load process is manually triggered by the user and the save process isn't crash safe nor does it do any integrity checks.

I also don't think it has any indexes either? So search performance is a function of the number of entries.

reply
andy99 23 hours ago
Does declaring a function as inline do anything for any modern compiler? I understood that this is basically ignored now and is the compiler makes its own decisions based on what is fastest.
reply
wasmperson 22 hours ago
The idea that it does nothing is a persistent myth. Both GCC and Clang heed it although neither treats it as a mandate:

https://tartanllama.xyz/posts/inline-hints/

This library seems to have the annotation on every function, though, so it's possible the author is just following a convention of always using it for functions defined in header files (it'd be required if the functions weren't declared `static`).

reply
adrian_b 10 hours ago
"static inline" is not the same as "inline".

In the former case the compiler is allowed to always inline the function.

In the latter case, even when the compiler chooses to inline the function, it also emits code for an independent instance of the function, because the function is public and it may be called from another file.

So "static inline" in the worst case does nothing, but it suggests to the compiler that the function should be inlined everywhere, which it will probably do, unless it decides that the function is too long (or it uses some features forbidden in inlined functions, e.g. variadic arguments, setjmp, alloca, etc.), so the benefits of inlining it may be less than the disadvantages.

When the compiler refuses to follow the suggestion of inlining the function, it can be made to tell the reason, e.g. with "-Winline".

So the compiler does not ignore the suggestion, even if it may choose to not follow it.

reply
garaetjjte 5 hours ago
>In the latter case, even when the compiler chooses to inline the function, it also emits code for an independent instance of the function, because the function is public and it may be called from another file.

Not in standard C. "inline" function provides implementation for usage iff compiler decides to inline the call. If it does decide not to inline, it will emit call to external symbol that needs to be defined in different TU (otherwise you will get errors at link time).

reply
adrian_b 5 hours ago
The meaning of "inline" differs between C and C++.

Quote from the gcc manual:

"GCC implements three different semantics of declaring a function inline. One is available with -std=gnu89 or -fgnu89-inline or when gnu_inline attribute is present on all inline declarations, another when -std=c99, -std=gnu99 or an option for a later C version is used (without -fgnu89-inline), and the third is used when compiling C++."

Nevertheless, "static inline" means the same thing in all 3 standards, unlike "inline" alone.

This can be a reason to always prefer "static inline", because then it does not matter whether the program is compiled as C or as C++.

reply
TheNewAndy 20 hours ago
One obvious benefit for a header only library is that it suppresses the warning you get when a static function isn't used.
reply
uecker 17 hours ago
It is not a benefit if you do not get warnings about unused functions. With any proper library, you would also not get warnings for functions that are part of the API that are not used, but you would get warnings about non-exported functions internal to a translation unite that are accidentally not used. This is a good thing.
reply
ddtaylor 23 hours ago
Kind of. At the end of the day the compiler can do almost anything it wants outside of unrefined behavior, which isn't much of a guard rail.

In reality header only libraries allow for deep inlining, the compiler may optimize very specifically to your code and usage.

The situation is a bit more exaggerated with C++ because of templates, but there is some remaining gains to he had in C alone.

reply
kazinator 2 days ago
In the world of Kubernetes and languages where a one-liner brings in a graph of 1700 dependencies, and oceans of Yaml, it's suddently important for a C thing to be one file rather than two.
reply
jasonpeacock 2 days ago
C libraries have advertised "header-only" for a long time, it's because there is no package manager/dependency management so you're literally copying all your dependencies into your project.

This is also why everyone implements their own (buggy) linked-list implementations, etc.

And header-only is more efficient to include and build with than header+source.

reply
uecker 2 days ago
I never copied my dependencies into my C project, nor does it usually take more than a couple of seconds to add one.
reply
AlotOfReading 24 hours ago
There's a number of extremely shitty vendor toolchain/IDE combos out there that make adding and managing dependencies unnecessarily painful. Things like only allowing one project to be open at a time, or compiler flags needing to be manually copied to each target.

Now that I'm thinking about it, CMake also isn't particularly good at this the way most people use it.

reply
uecker 19 hours ago
They are certainly bad vendor toolchain, but I want to push back against the idea that this is a general C problem. But even for the worst toolchains I have seen, dropping in a pair of .c/.h would not have been difficult. So it is still difficult to see how a header-only library makes a lot of sense.
reply
AlotOfReading 17 hours ago
One of the worst I've experienced had a bug where adding too many files would cause intermittent errors. The people affected resorted to header-izing things. Was an off-by-one in how it was constructing arguments to subshells, causing characters to occasionally drop.

But, more commonly I've seen that it's just easier to not need to add C files at all. Add a single include path and you can avoid the annoyances of vendoring dependencies, tracking upstream updates, handling separate linkage, object files, output paths, ABIs, and all the rest. Something like Cargo does all of this for you, which is why people prefer it to calling rustc directly.

reply
uecker 6 hours ago
People certainly sometimes create a horrible mess. I just do not see that this is a good reason to dumb everything down. With a proper .c/.h split there are many advantages, and in the worst case you could still design it in a way that it is possible "#include" the .c file.

I tried to use cargo in the past and found it very bad compared to apt / apt-get (even when ignoring that it is a supply-chain disaster), essentially the same mess as npm or pip. Some python packages certainly wasted far more time of my life than all dependencies for C projects I ever had deal with combined.

reply
quotemstr 2 days ago
Writing new C code in 2026 is already an artisanal statement, so why not got all the way in making it?
reply
fonheponho 2 days ago
Exactly; I can't understand this obsession with header-only C "libraries".
reply
hendler 2 days ago
Useful for embedded devices? Crashes, disk updates not important for ephemeral process?
reply
whstl 24 hours ago
I feel like there's two kinds of developers. The ones who shit all over other people's preferences and turn everything into an almost religious discussion, and the ones who prefer to just build stuff.

Get over it. Some people like header only.

reply
gkhartman 24 hours ago
Agreed, once you've spent hrs fighting with C build tools under a deadline, it becomes very easy to see why this is beneficial.
reply
FranklinJabar 21 hours ago
No need to be an asshole; we can all discuss things civilly.
reply
johnisgood 13 hours ago
Only if people provide reasons for why they think it is bad, but with people along the lines of "Header-only? Eww. Sucks." you cannot.

To comment on this, I have a couple of header-only projects I have written. It makes sense in some scenarios. Sometimes I want no external dependencies and a single header file interface.

reply
ddtaylor 23 hours ago
Some people may not have known the difference and probably thought it was more akin to a naming convention.
reply
whstl 22 hours ago
I'm obviously not talking about the people asking "what is it".
reply
FranklinJabar 21 hours ago
It would be a lot better for the community if you directly replied to the objectionable content with a civil response.
reply
Mikhail_Edoshin 2 days ago
Why to call it a header? Could be just a source file. Including sources is uncommon, but why not? Solid "amalgamation" builds are a thing too.
reply
Y_Y 19 hours ago
In the early days of CUDA it was pretty common to just #include all your sources, since linking was such a nightmare.
reply
bawolff 24 hours ago
As a non-C programmer, why would "header only" be a good thing?
reply
saidinesh5 19 hours ago
It's not.

It's a tradeoff people make between ease of integration - just download the .h file into your project folder and #include it in your source file instead of worrying about source build system vs target build system, cross compiling headaches etc...

And compilation times: any time you change any of your source files, your compiler also has to recompile your dependencies. (Assuming you haven't used precompiled headers).

reply
atiedebee 3 hours ago
Recompiling the dependencies should only really happen if you change the file with the implementation include (usually done by defining <library>_IMPLEMENTATION or something like that.
reply
robotpepi 17 hours ago
I'm completely ignorant about this, but wouldn't it be possible to compile separately your project to improve compilation times? for instance, if you're using OP's vector library, which is self contained, you could compile that first and just once?
reply
saidinesh5 16 hours ago
Let's say you need to use a function like:

    int add(int a, int b){
        // Long logic and then this
        return a+b;
    }
Let's say this is your main.c.

    #include "add.h"

    int main(void) {
      return add(5,6);
    }

The preprocessor just copies the contents of add.h into your main.c whenever you're trying to compile main.c. (let's ignore the concept of precompiled headers for now).

What you can instead do is just put the add function declaration in add.h that just tells the compiler that add function takes two integers and returns an integer.

   int add(int a, int b);
You can then put the add function definition in add.c , compile that to an add.o and link it to your main.o at link time to get your final binary - without having to recompile add.o every time you change your main.c.

Precompiled headers: https://maskray.me/blog/2023-07-16-precompiled-headers

reply
yxhuvud 16 hours ago
Unless you have link time optimization you would lose out on optimization and performance.

The whole thing is essentially a workaround for lack of sufficiently good/easy ways to package code in the ways people want to use it.

reply
ddtaylor 23 hours ago
It often also means it was written more correctly. There is a bit of an art to designing a header only library and it can strike a different balance between code size and runtime speed optimization.

In strict terms when you place implementation in a .c file you probably want that code to be shared when different things call it, and the compiler will "link" to that same implementation.

When you have a header only library the compiler is free to optimize in more ways specific to your actual use case.

reply
c45y 24 hours ago
Extremely easy copy paste deployment into projects
reply
colonCapitalDee 22 hours ago
C's package management story is unfriendly to say the least. A header only library simplifies it dramatically, and makes it much more straightforward to integrate dependencies into your application.
reply
johnisgood 13 hours ago
Using your OS' package manager IS C's package management. Is it really that difficult to use apt, pacman, or BSD's "pkg"?
reply
1718627440 4 hours ago
This. Wish I could upvote this 10 times.
reply
kreco 8 hours ago
What if I'm using 10 different OS?

I can still push the file on git and it works everywhere else.

reply
johnisgood 8 hours ago
git is not a package manager. It does not handle many things a package manager does.
reply
whstl 8 hours ago
GP never said it was.

But it does successfully replaces the need of using one, with less problems for certain situations.

reply
ddtaylor 2 days ago
Would it work to replace the memory store with mmap?
reply
newzino 2 days ago
Brute-force kNN gets a bad reputation, but below ~50K vectors the overhead of building and maintaining an HNSW index often costs more than it saves, especially for infrequent queries. I use sqlite-vec (also flat scan by default) in production with 10K vectors at 384 dimensions and search takes under 5ms.

The low-hanging fruit for this library would be SIMD. At 128d float32, each distance computation touches 512 bytes of data. AVX2 processes 8 floats per cycle, NEON does 4. That's a 4-6x speedup on the hot path without changing the algorithm at all. For a header-only library where simplicity is the point, that seems like the right optimization to reach for before adding approximate indexing.

One gotcha: metadata isn't persisted on save/load. The README mentions the binary format stores vectors and IDs but not metadata. Anyone attaching text chunks to their embeddings for RAG will lose them on reload.

reply
altcunn 2 days ago
[dead]
reply