Interest in a 0-downtime managed Postgres migration tool?
I recently discovered how much of a hassle it is to migrate off of Heroku Postgres. Both to keep the user from screwing themselves and as a vendor lock-in mechanism there's no possibility of running an external replica for your Heroku Postgres database. Unless I'm missing something (after a bunch of Googling) it seems like there is a market gap for a tool that allows for a seamless migration away from a Postgres DB that won't allow for replicas.
I imagine that, given some constraints on what queries you allow on your database, you could proxy connections to a managed Postgres database while feeding them simultaneously to a new database. Starting from a snapshot of the old database one could catch the new database up to a live state of the old database and then swap over all connections to the new one.
Does such a proxy already exist? I'd love to know as I could use it. If not it might be fun to build and a good side-project.
Bucardo might be what you’re looking for. There’s some nice info on real world usage as well as an overview of a few similar options (and why they aren’t viable) here: https://medium.com/preply-engineering/postgres-multimaster-34f2446d5e14 - that’s specifically talking about replication between separate RDS instances, but it should follow fairly similarly going from Heroku.
Amazon themselves also have a proprietary migration service, which is pretty cool in that it can translate from one DB engine to another, but I believe it requires deeper access to the source DB than Heroku will give you.
[Edit] Hmm, I can’t seem to find any confirmation that Heroku Postgres supports PL/Perl, even though there are one or two (literally, just one or two) references out there to using Bucardo on Heroku.
Interestingly, it looks like there's at least one example of Heroku providing WAL access to help with an outbound migration. I'm kind of surprised they'd do so at all, although it sounds like this was a sizeable customer who is likely still spending a decent amount on their other services, so a special case either way.
As a slight tangent for anyone reading this later, and speaking entirely from hindsight after hitting the same problem in the past (we thankfully had a relatively small DB when I had to do the same migration, so we just bit the bullet and took the late-night downtime), I'd say that using Heroku Postgres for anything more than dev and testing is not a good idea, specifically because of this lock-in. That's not to say the same about Heroku as a whole, at all, just to say that their database service specifically doesn't seem to offer anything much that RDS or Aurora don't do as well or better.
I know a lot of people dislike Heroku, but my experience is that replicating their services in-house is no small feat: it involves learning Terraform, learning the potentially thousands of platform-specific options to figure out which do and don't have defaults that match your use case, writing several hundred lines of config at minimum, setting up external monitoring and alerting services, and then praying that there wasn't an unknown-unknown somewhere in all of that that will blow up on you later.
The servers might cost twice what they would elsewhere, but if you're somewhere in the hobbyist to several hundred dollar per month bracket, that overhead can be more than worth the money. It's only once you start hitting four figures that taking on all the work yourself starts to become a bit more compelling; I'm actually surprised that they don't offer more aggressive price breaks for scaling.
The Postgres service, in contrast, is very similar to RDS. You can set both up with just a few clicks, RDS is fully compatible with (and in the same physical datacenter as) Heroku's web dynos, and you end up getting a more capable service for less money with almost no additional effort. No help if you're already on a Heroku DB, obviously, but if anyone is reading this because they're weighing up the options I'd say go for a Heroku web environment pointing at an RDS DB right from the start.
100% agree. Currently managing a Heroku Postgres and would recommend the same. For review apps use Heroku Postgres, but for staging and production use RDS from the start. Our db is 400GB and growing, so I’ve been thinking about the most elegant solution for a while. We can afford the downtime, but would absolutely love to find a minimal downtime alternative.
One thing I’m not sure of is if Aurora is a better option for me than RDS. I don’t want to use a vendor-specific database right as I’m jumping the ship from a vendor lock-in database. But if it wouldn’t be any harder to leave than RDS I’m curious if the extra features are useful to a web app that hopes to scale to pretty high.
In theory you can replicate from Aurora to plain RDS either using DMS or the built in logical replication in Aurora. It's not quite as point-and-click as the opposite direction, because Amazon definitely wants to encourage people onto Aurora, but it's comfortably possible - for now, at least.
My main concerns about Aurora were the horror stories about unexpected spikes in the IO-based billing, and concern about future lock-in even if there's an escape hatch at the moment. For our use case its extra features didn't provide anything especially compelling to offset those concerns, and even at baseline usage it was likely to come out more expensive, so that was decision made - but I can understand how it might be a decent option depending what you need from it.
Thanks. One day I might get the opportunity to use Cockroach DB. It looks like the next generation of Postgres. But for now I think more Postgres is the way.
In short, you're talking about rolling your own logical replica.
Lot of ways to skin that cat. And to quote the end of that link:
I seem to remember dealing with a tool quite a few years ago. It wasn't specific to pg, but it did the job. Sadly I can't remember exactly what it was called. It was a LAMP application that would do small, spaced queries so that you wouldn't overwhelm the servers. I want to say it was called BigSwap or something like that, but I can't seem to find anything under that name.
Would you mind if I ask you why you chose Heroku? Literally every time I have heard them being mentioned people have bad things to say.
Heroku is the best in its niche. It’s very low effort and can run a moderately sized business if you take care as you grow. Render isn’t ready for prime time IMO. Going with AWS or actually managing a kubernetes cluster yourself is way more work.
People complain about Heroku but they keep giving them money for a reason. The product does what it says.