A DuckDB-based metabase alternative
159 points by wowi42 18 hours ago | 40 comments
piterrro 15 hours ago
In what extent this is a metabase alternative? I'm a heavy Metabase user and there's nothing to compare really in this product.
replymritchie712 9 hours ago
We've (https://www.definite.app/) replaced quite a few metabase accounts now and we have a built-in lakehouse using duckdb + ducklake, so I feel comfortable calling us a "duckdb-based metabase alternative".
replyWhen I see the title here, I think "BI with an embedded database", which is what we're building at Definite. A lot of people want dashboards / AI analysis without buying Snowflake, Fivetran, BI and stitching them all together.
jorin 13 hours ago
hi, dev building Shaper here. Both, Shaper and Metabase, can be used to build dashboards for business intelligence functionality and embedded analytics. But the use cases are different: Metabase is feature-rich and has lots of functionality for self-serve that allows non-technical users to easily build their own dashboards and drill down as they please. With Shaper you define everything as code in SQL. It's much more minimal in terms of what you can configure, but if you like the SQL-based approach it can be pretty productive to treat dashboards as code.
replyrorylaitila 7 hours ago
Nice work! I met Jorin a couple years ago at a tech meetup and this was just an idea at the time. So cool to see the consistent progress and updates and to see this come across HN.
replythanhnguyen2187 8 hours ago
Thanks for the cool tool! I think it's worth mentioning SQLPage, which is another tool in similar vein, to generate UI from SQL. From my POV:
reply- SQLPage: more on UI building; doesn't use DuckDB
- Shaper: more on analytics/dashboard focused with PDF generation and stuff; uses DuckDB
cjonas 5 hours ago
Is there anyway to run the query -> report generation standalone in process? Like maybe just outputting the html (or using the React components in a project).
replyI was looking to add similar report generation to a vscode-extension I've been building[0]
frafra 15 hours ago
Metabase works great with DuckDB as well, thanks to
metabase_duckdb_driver by MotherDuck.
reply3abiton 13 hours ago
As someone who used duckdb but not shaper, what is shaper used for? The readme is scarce on details.
replyjorin 13 hours ago
hi, dev building shaper here. shaper allows you to visualize data and build dashboards just by writing sql. the sql runs in duckdb so you can use all duckdb features. its for when you are looking for a minimal tool that allows you to just work in code. you can use shaper to build dashboards that you share internally or also for customer-facing dashboards you want to embed into another application.
replyantman 11 hours ago
Will it expose a visual query builder as metabase?
replyldnbln 10 hours ago
my company integrated tale shape as our customer-facing metabase dashboard alternative. absolutely love its simplicity!
replypdyc 16 hours ago
interesting i am trying to build one too but rejected duckdb because of large size, i guess i will have to give in and use it at some point of time.
replyandrewstuart 16 hours ago
I wanted to love DuckDB but it was so crashy I had to give up.
replyjastr 2 hours ago
I had this too until I lowered it's memory limit. In ~/.duckdbrc `set max_memory='1GB';` or even less
replyrobowo 16 hours ago
I use it daily and it never crashed. How long ago was this?
I am a big fan of DuckDB. Plow through hundrets of GB of logs on a 5 year old linux laptop - no problem.
replypletnes 15 hours ago
Same here. I have however seen a few out of memory cases in the past when given large input files.
replyjastr 2 hours ago
By default, it tries to take 80% of your memory. I've found that you need to set it to something much smaller in ~/.duckdbrc `set max_memory='1GB';`
replyskeeter2020 8 hours ago
it's not the focus or very performant but you can have it spill to disk if you run out of memory. I wouldn't suggest building a solution based on this approach though; the sweet-spot is memory-constrained.
reply
I feel very moronic making a dashboard for any products now. Enterprise customers prefer you integrate into their ERPs anyway.
I think we lost the plot as an industry, I've always advocated for having a read only database connection to be available for your customers to make their own visualisations. This should've been the standard 10 years ago and it's case is only stronger in this age of LLMs.
We get so involved with our products we forget that our customers are humans too. Nobody wants another account to manage or remember. Analytics and alerts should be push based, configurable reports should get auto generated and sent to your inbox, alerts should be pushed via notifications or emails and customers should have an option to build their own dashboard with something like this.
Sane defaults make sense but location matters just as much.
Roughly three decades ago, that *was* the norm. One of the more popular tools for achieving that was Crystal Reports[1].
In the late 90s, it was almost routine for software vendors to bundle Crystal Reports with their software (very similar to how the MSSQL installer gets invoked by products), then configure an ODBC data source which connected to the appropriate database.
In my opinion, the primary stumbling block of this approach was the lack of a shared SQL query repository. So if you weren’t intimately aware with the data model you wanted to work with, you’d lose hours trying to figure it out on your own or rely on your colleagues sharing it via sneakernet or email.
Crystal Reports has since been acquired by SAP, and I haven’t touched it since the early ‘00s so I don’t know what it looks or functions like today.
1: https://en.wikipedia.org/wiki/Crystal_Reports
I was a developer albeit not professionally, and my boss gave me the opportunity to develop the integration between Agresso and Crystal Reports, my first professional development project, for which I am still grateful. It was a DLL written in C++ and I imagine they shipped that for quite a while after I left for greener pastures.
I was already a free software and Linux enthusiast, so I did a vain skunkworks attempt at getting Agresso to run with MySQL, which failed, but my Linux server in the office came in handy when I needed some extra software in the field--I asked a colleague to put a CD in the server so I could download it to the client site some 500 km away, and deliver on the migration.
I was part of this and "saw the light". We had such a great visibility into all the processes, it was unreal. It tremendously sped-up cross-org initiatives.
Today, I guess, only agents get that privilege.
Customers need it to build custom reports, archive data into a warehouse, drive downstream systems (notifications, audits, compliance), and answer edge-case questions you didn’t anticipate.
Because of that, I generally prefer these patterns over a half-baked built-in analytics UI or an opinionated REST API:
Provide a read replica or CDC stream. Let sophisticated customers handle authz, modelling, and queries themselves. This gets harder with multi-tenant DBs.
Optionally offer a hosted Data API, using something like -- PostgREST / Hasura / Microsoft DAB. You handle permissions and safety, but stay largely un-opinionated about access patterns.
Any built-in metrics or analytics layer will always miss edge cases.
With AI agents becoming first-class consumers of enterprise data, direct read access is going to be non-negotiable.
Also, I predict the days of charging customers to access their own goddamn data, behind rate-limited + metered REST APIs are behind us.
The CDC stream option you flagged is more viable in my (admittedly biased) opinion. At my company (Prequel) our entire pitch is basically "you should give your customer's a live replica of their data in whatever data platform they want it in" (and let us handle the cross-platform compatibility & multi-tenant DB challenges).
I think this problem could also be a killer use case for Open Table Formats, where the read-replica architecture can be mirrored but the cost of reader compute can be assumed by the data consumer.
To your point, this is only going to be more important with what will likely be a dramatic increase in AI agent data consumption.
I get your point, but generally with most enterprise-scale apps you really don’t want your transactional DB doubling as your data warehouse. The “push-based” operation should be limited to moving data from your tx environment to your analytical one.
Of course, if the “analytics” are limited to simple static reports, then a data warehouse is overkill.
A layer on top of the database to account for auth/etc. would be necessary anyways. Could be achieved to some degree with views, but I'd prefer an approach where you choose the publicly available data explicitly.
GraphQL almost delivered on that dream. Something more opinionated would've been much better, though.