I built this last summer and thought it was fairly complete. And it turns out I was naive. 6 months later, I have resolved ~100 issues reported by the users.
Nice work, really happy to have found this today.
But anyway, from the example for sqlite, I added "x integer not null," to the original table, and was greeted with "-- Skipped: ALTER TABLE books DROP COLUMN x". Then ticked the "Enable DROP" and got the same, except with the line uncommented.
Such a shame, this is one thing that would make a lot of difference. FWIW, the standard way of doing this is creating a temporary table with the new data, dropping the original table, and recreating the new table, copying the data across and deleting the temporary. It's kind of a shame that it doesn't automate that, and this is one of those things that's incredibly easy to mess up if you're not careful, especially if there are constraints involved.
If it just does the easy stuff, you might as well just do it by hand.
See: https://atlasgo.io/concepts/declarative-vs-versioned#combini..., and https://github.com/ariga/atlas-action?tab=readme-ov-file#ari...
For example add temporarily nullable column to a large table, deploy new code which starts writing to the new column, in background populate that column for existing rows in batches and finally alter column to be mandatory non-nullable.
Another example of non-trivial schema management case is to make schema change after new version rollout completes: simple migration at the start of the container can't do that.
It must be a solved problem, but I didn't see a good tool for it which would allow expressing these imperative changes in a declarative way which can be comitted and reviewed and tested along the app code. It is always bunch of adhoc ugly scripts on a side and some hand waving deployment instructions.
1. https://grate-devs.github.io/grate/
Pretty easy to setup/use in a dev environment as well... see docker-compose.yaml and run/dbup script.
When using Postgres, I just try to keep my schema script idempotent and also work with any version of the DB. Like I start with a CREATE TABLE IF NOT EXISTS, then if I add a new col, it goes into there and also a separate ALTER. But at some point it gets cluttered and I delete the ALTERs once things are stable. Maybe something could hit the fan, I have to restore an old backup from before the last purge, and this tool helps me make it compatible quickly.
I get the declarative nature of this, but migration of schemas is not purely declarative, particular on large production databases. An ideal schema manager would understand the costs of particular migrations (perhaps by using the table stats and EXPLAIN) and apply that back to the migration strategies so that downtime is minimized/eliminated.
Adding or remove columns or indexes can trigger major database table scans and other problems, especially when partitioning conditions change.
If I split a Fullname into FirstName and LastName, a diff will only tell half of the story. In EF Core, you will adjust an Up and a Down generated method to make the change reversible, plus you deal with data transformation there.
So I would love to know how people handle that without an explicit notion of migrations.
Also means I can stop with my hobby project that was supposed to do the same. Wasn't far along and haven't worked on it in months anyway.
So I'll spend my time on another hobby project then that also solves something that is already solved 100 times over, but I don't like the other solutions (simple log monitoring including systemd and sending error emails if something is found).
We used Sybase SQLAnywhere, and a complication there was that if you had materialized views against a table, you had to drop and recreate that view when adding or removing columns in the table. And when you recreate it, you of course have to remember to recreate the indexes on that materialized view.
Tracking this in case you have multiple materialized views touching multiple tables became a bit tricky, and you don't want to do the "dumb" thing to just always drop and recreate, or per-column, since some of them might take an hour or so to recreate and reindex.
The tool has some built-in safeties like never dropping columns if it's missing (have to add explicit element for that in the XML), and only performs safe column definition changes, ie integer to varchar(50) is safe, integer to varchar(3) is not, etc.
It really made database migrations very painless for us, great since we've have hundreds of on-premise installations. Just modify the XML, and let the tool do its job.
And since its our tool, we can add functionality when we need to.
Schema management is only a small part of the problem, and I don’t think this tool handles data migrations. E.g. if I reshape a JSONB column into something more structured, I don’t think it would be able to handle that. Or if I drop a column the backwards migration it generates ADD COLUMN … NOT NULL, which is obviously unusable if the table has any data in it already.
https://david.rothlis.net/declarative-schema-migration-for-s...
Found shmig and it’s been really fantastic.
Isn't it much quicker to write a one line migration vs copying the DDL, then adapting it to the desired state and then getting getting the migrations from that tool? Or am I misunderstanding something?
In the big picture, declarative schema management has lots of advantages around avoiding/solving schema drift, either between environments (staging vs prod) or between shards in a sharded setup (among thousands of shards, one had a master failure at an inopportune time).
It's also much more readable to have the "end state" in your repo at all times, rather than a sequence of ALTERs.
There are a bunch of other advantages; I have an old post about this topic here: https://www.skeema.io/blog/2019/01/18/declarative/
It's also quite essential when maintaining nontrivial stored procedures. Doing that with imperative migrations is a gateway to hell. https://www.skeema.io/blog/2023/10/24/stored-proc-deployment...
I was using a location-related naming scheme in general at that time; similarly my automation library was called Go La Tengo because I was living in the town where the band Yo La Tengo was from.
With a declarative model, would you run the migration and follow immediately with a one off script?
With row data migrations on large tables, there's also risk of long/slow transactions destroying prod DB performance due to MVCC impact (pile-up of old row versions). So at minimum you need to break up a large data change into smaller chunked transactions, and have application logic to account for these migrations being ongoing in the background in a non-atomic fashion.
That all said, to answer from a mechanical standpoint of "how do companies using declarative schema management also handle data migrations or renames":
At large scale, companies tend to implement custom/in-house data migration frameworks. Or for renames, they're often just outright banned, at least for any table with user-facing impact.
At smaller scale, yeah you can just pair a declarative tool for schema changes with an imperative migration tool for non-schema changes. They aren't really mutually exclusive. Some larger schema management systems handle both / multiple paradigms.
For MySQL/MariaDB with Skeema in particular, a few smaller-scale data migration approaches are discussed in a separate post, https://www.skeema.io/blog/2024/07/23/data-migrations-impera...
i like the following about it 1. database schema is regular code 2. make schema change declaratively 3. packaging (.daspac) and deployement script
most open source tools , seem to be after the fact tools, that do diffs ms db project, handle the code from the start in a declarative, source code managed way
e.g. CREATE TABLE books2 (LIKE book INCLUDING ALL);
So I assume it is not Progresql DDL it uses but "something close".
sqldef is really cool for supporting many database dialects. I'm the author of Skeema [1] which includes a lot of functionality that sqldef lacks, but at the cost of being 100% MySQL/MariaDB-specific. Some other DB-specific options in this space include Stripe's pg-schema-diff [2], results [3], stb-tester's migrator for sqlite [4], among many others over the years.
The more comprehensive solutions from ByteBase, Atlas, Liquibase, etc tend to support multiple databases and multiple paradigms.
And then over in Typescript ORM world, the migrators in Prisma and Drizzle support a "db push" declarative concept. (fwiw, I originated that paradigm; Prisma directly copied several aspects of `skeema push`, and then Drizzle copied Prisma. But ironically, if I ever complete my early-stage next-gen tool, it uses a different deployment paradigm.)
[1] https://github.com/skeema/skeema/
[2] https://github.com/stripe/pg-schema-diff
[3] https://github.com/djrobstep/results
[4] https://david.rothlis.net/declarative-schema-migration-for-s...
Seems like the top contenders, at least for Postgres, are:
sqldef: https://news.ycombinator.com/item?id=46845239
pgschema: https://github.com/pgschema/pgschema
pg_roll: https://github.com/xataio/pgroll
atlas: https://github.com/ariga/atlas
grate (minimal SQL-based migrations): https://grate-devs.github.io/grate/
pg-schema-diff: https://github.com/stripe/pg-schema-diff
results: https://github.com/djrobstep/results
https://david.rothlis.net/declarative-schema-migration-for-s...
1. DB is source of truth, generate TS from DB 2. TS to DB direct sync, no migration files 3. TS source, Drizzle generates and applies SQL 4. TS source, Drizzle generates SQL, runtime application 5. TS source, Drizzle generates SQL, manual application 6. TS source, Drizzle outputs SQL, Atlas application
Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.
That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.
In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.
Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.
Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.
You tell it what’s being renamed with a special comment.
No tool can help you with that, simply because this kind of data migration depends on your particular business logic that the tool has no way of knowing about.
While SQL Server Data Tools has its warts, it has been immensely useful for us in making sure every little detail gets handled during migration. That doesn't usually mean that it can do the entire migration itself - we do the manual adjustments to the base tables that SSDT cannot do on its own, and then let it handle the rest, which in our case is mostly about indexes, views, functions and stored procedures.
After all that, SSDT can compare the resulting database with the "desired" database, and reliably flag any differences, preventing schema drift.
`ALTER TABLE books ADD CONSTRAINT fk_books_author FOREIGN KEY (author_id) REFERENCES authors (id)`
Which is not valid in SQLite (https://www.sqlite.org/lang_altertable.html)