A type-safe, realtime collaborative Graph Database in a CRDT
29 points by phpnode 2 hours ago | 9 comments

2ndorderthought 27 minutes ago
Can anyone explain why it is a good idea to make a graphdb in typescript? This not a language flamewar question, more of an implementation details question.

Though typescript is pretty fast, and the language is flexible, we all know how demanding graph databases are. How hard they are to shard, etc. It seems like this could be a performance trap. Are there successful rbdms or nosql databases out there written in typescript?

Also why is everything about LLMs now? Can't we discuss technologies for their face value anymore. It's getting kind of old to me personally.

reply
phpnode 19 minutes ago
I needed it to be possible to run the graph in the browser and cloudflare workers, so TS was a natural fit here. It was built as an experiment into end to end type safety - nothing to do with LLMs, but it ended up being useful in the product I'm building. It's not designed for large data sets.
reply
2ndorderthought 15 minutes ago
Makes sense thanks for explaining the use case. The LLM question was only because of the comments at the time of the post.

The query syntax looks nice by the way.

reply
phpnode 13 minutes ago
thanks, it was as close to Gremlin[0] as I could get without losing type safety (Gremlin is untyped)

[0] https://tinkerpop.apache.org/

reply
lo1tuma 16 minutes ago
15 years ago I was a big fan of this chaining methods pattern. These days I don’t like it anymore. Especially when it comes to unit-testing and implementing fake objects it becomes quite cumbersome to setup the exact same interface.
reply
cyanydeez 57 minutes ago
Eventually someone will figure out how to use a graph database to allow an agent to efficiency build & cull context to achieve near determinant activities. Seems like one needs a sufficiently powerful schema and a harness that properly builds the graph of agent knowledge, like how ants naturally figure how where sugar is, when that stockpile depletes and shifts to other sources.

This looks neat, but if you want it to be used for AI purposes, you might want to show a schema more complicated than a twitter network.

reply
embedding-shape 38 minutes ago
I'd wager the problem is on the side of "LLMs can't value/rank information good enough" rather than "The graph database wasn't flexible/good enough", but I'd be happy to be shown counter-examples.

I'm sure once that problem been solved, you can use the built-in map/object of whatever language, and it'll be good enough. Add save/load to disk via JSON and you have long-term persistence too. But since LLMs still aren't clever enough, I don't think the underlying implementation matters too much.

reply
lmeyerov 15 minutes ago
It's interesting to think of where the value comes from. Afaict 2 interesting areas:

A: One of the main lessons of the RAG era of LLMs was reranked multiretrieval is a great balance of test time, test compute, and quality at the expense of maintaining a few costly index types. Graph ended up a nice little lift when put next to text, vector, and relational indexing by solving some n-hop use cases. I'm unsure if the juice is worth the squeeze, but it does make some sense as infra. Making and using these flows isn't that conceptually complicated and most pieces have good, simple OSS around them.

B: There is another universe of richer KG extraction with even heavier indexing work. I'm less clear on the relative ROI here in typical benchmarks. Imagine going full RDF, vs the simpler property graph queries & ontologies here, and investing in heavy entity resolution etc preprocessing during writes. I don't know how well these improve scores vs regular multiretrieval above, and how easy it is to do at any reasonable scale. However, a lot of the work shifts out of the DB and out of the agent, and into a much fancier kg pipeline. So now there is a missing layer with less clear proof/value burden.

--

Seperately, we have been thinking about these internally. We have been building gfql , oss gpu cypher queries on dataframes etc without needing a DB -- reuse existing storage tiers by moving into embedded compute tier -- and powering our own LLM usage has been a primary use case for us. Our experiences have led us to prioritizing case A as a next step for what the graph engine needs to support inside, and viewing case B as something that should live outside of it in a separate library . This post does make me wonder if case B should move closer into the engine to help streamline things for typical users, akin how solr/lucene/etc helped make elastic into something useful early on for search.

reply
phpnode 55 minutes ago
the airline graph is more complex, I can show the schema for that if you think it's useful?
reply