Friday, February 19, 2016

How to perform incremental / continuous backups of zfs pool?




How can zfs pools be continuously/incrementally backed up offsite?



I recognise the send/receive over ssh is one method however that involves having to manage snapshots manually.



There are some tools I have found however most are no longer supported.



The one tool which looks promising is https://github.com/jimsalterjrs/sanoid however I am worried that non-widely known tool may do more harm then good in that it may corrupt/delete data.



How are continuous/incremental zfs backups performed?


Answer




ZFS is an incredible filesystem and solves many of my local and shared data storage needs.



While, I do like the idea of clustered ZFS wherever possible, sometimes it's not practical, or I need some geographical separation of storage nodes.



One of the use cases I have is for high-performance replicated storage on Linux application servers. For example, I support a legacy software product that benefits from low-latency NVMe SSD drives for its data. The application has an application-level mirroring option that can replicate to a secondary server, but it's often inaccurate and is a 10-minute RPO.



I've solved this problem by having a secondary server (also running ZFS on similar or dissimilar hardware) that can be local, remote or both. By combining the three utilities detailed below, I've crafted a replication solution that gives me continuous replication, deep snapshot retention and flexible failover options.



zfs-auto-snapshot - https://github.com/zfsonlinux/zfs-auto-snapshot




Just a handy tool to enable periodic ZFS filesystem level snapshots. I typically run with the following schedule on production volumes:



# /etc/cron.d/zfs-auto-snapshot

PATH="/usr/bin:/bin:/usr/sbin:/sbin"

*/5 * * * * root /sbin/zfs-auto-snapshot -q -g --label=frequent --keep=24 //
00 * * * * root /sbin/zfs-auto-snapshot -q -g --label=hourly --keep=24 //
59 23 * * * root /sbin/zfs-auto-snapshot -q -g --label=daily --keep=14 //
59 23 * * 0 root /sbin/zfs-auto-snapshot -q -g --label=weekly --keep=4 //

00 00 1 * * root /sbin/zfs-auto-snapshot -q -g --label=monthly --keep=4 //


Syncoid (Sanoid) - https://github.com/jimsalterjrs/sanoid



This program can run ad-hoc snap/replication of a ZFS filesystem to a secondary target. I only use the syncoid portion of the product.


Assuming server1 and server2, simple command run from server2 to pull data from server1:



#!/bin/bash


/usr/local/bin/syncoid root@server1:vol1/data vol2/data

exit $?


Monit - https://mmonit.com/monit/



Monit is an extremely flexible job scheduler and execution manager. By default, it works on a 30-second interval, but I modify the config to use a 15-second base time cycle.



An example config that runs the above replication script every 15 seconds (1 cycle)




check program storagesync with path /usr/local/bin/run_storagesync.sh
every 1 cycles
if status != 0 then alert


This is simple to automate and add via configuration management. By wrapping the execution of the snapshot/replication in Monit, you get centralized status, job control and alerting (email, SNMP, custom script).







The result is that I have servers that have multiple months of monthly snapshots and many points of rollback and retention within: https://pastebin.com/zuNzgi0G - Plus, a continuous rolling 15-second atomic replica:



# monit status



Program 'storagesync'
status Status ok
monitoring status Monitored
last started Wed, 05 Apr 2017 05:37:59
last exit value 0
data collected Wed, 05 Apr 2017 05:37:59

.
.
.
Program 'storagesync'
status Status ok
monitoring status Monitored
last started Wed, 05 Apr 2017 05:38:59
last exit value 0
data collected Wed, 05 Apr 2017 05:38:59


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...