Snapshot Restore versus pg_restore — RDS Disaster Recovery


The time difference between pg_restoring an 11 GB sized PostgreSQL database with 159 MB pg_dump file, versus restoring snapshot of its Multi-AZ t2.medium RDS Instance:


Important note! The time values are subject to change, because it depends on network, other running processes, intermediary EC2 Instance type etc.


I used a t2.medium SSM Bastion Instance to execute PostgreSQL commands. As I’ve mentioned before, provisioning a more powerful EC2 Instance might increase the performance drastically. You can find information about SSM Bastions in this page.

Imagine you have a PostgreSQL database named “XXX” with 11 GB size. Don’t get distracted with the name, I don’t know why I named it “XXX” at the first hand, but here we go anyway :)

I take dump of it by this pg_dump command (check documentation here);

PGPASSWORD='${PGPASSWORD}' pg_dump -h "" -p 54321 XXX XXX - format=c > xxx.sql

And to see how big the file is, I execute;

You can see that the xxx.sql dump file is 159 MB.

The reason why our dump file is smaller than our XXX database is because we take logical backup of the database, therefore, all of those empty pages, old versions of rows, indexes aren’t going to be included in the dump file. It is also compressed as default, because we used –format=c option. All that greatly reduced the size.

Now, let’s restore our database with this dump file by this pg_restore command (check documentation here);

PGPASSWORD=’${PGPASSWORD}’ pg_restore -v -h “” -p 5432 XXX -d POSTGRESQL -j 4 -O -C “xxx.sql”

As you can see, I used time command with the actual command, so that we can see how long it took;

“real” value here is what we must be looking for. Details over here. Not bad, huh? However, it is still 14 minutes of downtime, which I don’t recommend for your mental, and probably physical health :)

Let’s note the value and proceed with Snapshot Restore operation.

Snapshot Restore

At this point, if you’ve decided to use snapshot restore operation instead of pg_restore, you’re going to need to change your corrupted RDS Instance’s identifier. Because AWS doesn’t allow duplicate RDS identifiers, obviously. I mean, they’re identifiers, that’s what they do, identify. You can change identifier to, let’s say, “original-old”. So the time it takes to change identifier of a Multi-AZ RDS Instance is this;

Quite long. After changing our corrupted RDS Instance’s identifier, we can now proceed to restore our snapshot and name it as our original RDS identifier, so our applications can automatically connect to it. Let’s see how long it takes to restore a Multi-AZ PostgreSQL RDS Instance;

WOW! This is not good at all. 20 minutes is enough time to make you looking for a new job! Add that identifier change time(2:21) to this and you get 22:04 minutes! You might as well take a damn nap while this completes.


After enabling Multi-AZ feature, I highly recommend taking pg_dumps of your PostgreSQL Databases daily and uploading them to S3. Even if you don’t use PostgreSQL, you can adapt this solution to MySQL etc.

I also highly recommend configuring read-replicas(slaves) of the master RDS Instance, if you are swimming in gold and don’t feel lazy at all. Because in that case, you can make one of the slaves your master DB and keep your job in some specific disaster scenarios. In addition being a recovery solution for disaster cases, if designed well, it also increases R/W performance drastically.

What is hell for Doomguy, is manual work for me (I also hit animal skin stretched over hollow wooden objects to create music 🥁)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store