I installed antergos Linux with ZFS-on-root. The system is up and running and I’m glad because that alone is a success for me. Now apparently ZFS is mainly for NAS, RAID, Server configs etc. (and I totally see and understand why) but I still want to try it on my desktop for it’s supposedly superiority to BTRFS in terms of stability (even more so regarding power outages [a major concern for me where I live]), performance and handling.
Now I’m completely new to COW and snapshots even in BTRFS but I don’t consider them to be easy to understand at all even if there’s only two commands to master in case of ZFS.
A Samsung 850 Evo 250GB SSD configured by antergos with ZFS
# lsblk
NAME MOUNTPOINT
sda
├─sda1 /boot/efi
├─sda2 /boot
└─sda3
# zfs list
NAME MOUNTPOINT
antergos_6g9t /
antergos_6g9t/swap
A WDC WD30EZRX 3TB HDD configured by myself
# lsblk
NAME FSTYPE MOUNTPOINT
sdb
├─sdb1 vfat
├─sdb2 ext4 /run/media/username/masstorage
├─sdb3 ext4 /home
└─sdb4 ext4 /run/media/username/snapshots
As you can see I set up the bigger drive to hold a partition for lots of data (work, movies, music etc.) one for home and one for snapshots. The sdb1 is supposed to be a possible ESP because:
- I want to incremental snapshot root (antergos_6g9t) to sdb4
- I want to be able to boot from those snapshots
- I want to be able to restore those snapshots to sda if I break my root
- What are the every-day commands I need to use to accomplish the above?
(Almost any guide on the web is NAS and RAID related or clones via SSH) - Is ext4 ok for /home to combine with ZFS root?
- Would I have to format sdb4 as ZFS?
- Maybe there's a completely different approach to this? (Take note I want a separate partition for /home and mass storage with a solid backwards compatibility, that's why I chose ext4 here)
Any help, comment or suggestion is highly appreciated.
Answer
What are the every-day commands I need to use to accomplish the above? (Almost any guide on the web is NAS and RAID related or clones via SSH)
There is no real difference between copying over the network instead of copying locally - instead of zfs send | ssh zfs recv
you just have zfs send | zfs recv
- or you can even store the backup in streamed form (without zfs recv
, and with some downsides) and only expand it if you need it.
Is ext4 ok for /home to combine with ZFS root?
Would I have to format sdb4 as ZFS?
What exactly is your goal with the ext4 partition? You can do it this way, but you will miss out on snapshots and integrity checks for your user data. In my eyes, any system can be brought back cheaply, but user data lost is lost forever. If I had to choose, I would use ZFS for my user data and ext4 for an (worthless) system partition, not the other way.
Maybe there's a completely different approach to this? (Take note I want a separate partition for /home and mass storage with a solid backwards compatibility, that's why I chose ext4 here)
- If you see backwards compatibility as "I want to test ZFS and then go back to ext4 without copying data", you will gain nothing: you will not see the benefits of ZFS and still have the drawbacks of ext4. Also, you have more work to do in creating bootable ZFS system on Linux (although you already did it in this case) than in setting up a simple data partition for user data with ZFS.
- If you see it as "I want to access it with other systems", I would suggest NFS, SMB, AFP or SSH (or all of them at the same time).
- If this is for dualboot with a system that is not ZFS-capable, this would be one of the few constellations where your layout makes perfect sense.
- If you don't trust ZFS to keep your data safe, or do not trust the Linux version, either use Solaris/illumos/*BSD, or keep your backups on ext4. This way, you lose the easy send/recv backup utility, but at least you know that you backup only good data.
As from your current description, you could still reach all your goals, but it would be suboptimal and more difficult the way you have outlined it.
Instead, think about using ZFS on all your disks, maybe adding redundancy if possible (can also be done later on by adding a mirror disk), using ZFS file systems instead of partitions (to separate concerns), and backing up your snapshots regularly on different disks (to combat possible corruption from missing ECC memory).
Follow-up to your comment:
Would you mind to elaborate on this situation in your answer and provide details (as in structure) on the snapshot management and how to achieve a boot into one of them or restore the snapshot to root? I'd partition the whole 3TB as ZFS and then from my ZFS OS add datapools and datasets, for my storage, home and snapshot partitions I suppose. But from there I'd be helpless.
Yes, pretty much that. I don't know your hardware options, but assuming you have the disks you described, I would do the following:
Hardware and pool layout
Normally, you would use the SSD as read cache, but on a desktop you lose all advantages of an L2ARC cache (except on Solaris 11.3, where it is persistent and survives reboots).
So, you could either put everything on the HDD and use the SSD as an SLOG device (for sync writes only); or you could separate them and put your system data (root pool) on the SSD and the rest on the HDD.
Theoretically you could achieve better performance with the first solution, but in a desktop case I doubt your system stays online long enough to notice it. Therefore, the second solution is less hassle and your SSD will survive longer.
So you create two pools - one root pool (assume its named rpool
) on the SSD with 250 GB and one data pool (data
) on the HDD with 3000 GB. Both are non-redundant, because they only have 1 vdev each, but later on you can add an additional HDD or SSD to make them mirrors with zpool attach data /dev/
(so errors can be corrected automatically). This is optional, but recommended (if you can only add one disk, add the data mirror, because your data is more valuable than the system which is cloned to data
anyway).
You don't need any additional partitions (except maybe swap and/or boot partitions, but this will be done automatically at installation), because your ZFS file systems will fill this role.
ZFS file system layout
Now you have two pools - rpool
is already populated (I am sorry I cannot go into details here, as Linux is different from illumos/Solaris) from your installation - you don't need to change anything here. You can check with zfs mount
if the file systems are mounted correctly.
data
on the other hand is still empty, so you add some file systems:
# zfs create data/home
# zfs create data/home/alice
# zfs create data/sysbackup
# zfs create data/pictures
...
Check with zfs mount
if they are mounted correctly, if not, then mount them with zfs mount
(and/or in fstab, again this might differ on Linux). I find it easier if the directory structure is similar to the file system structure (but it is not necessary): /home/alice
corresponds to data/home/alice
.
ACLs and network sharing
Now would be a good point to think about permissions and shares (because both are included in your future snapshots, as they are properties of the snapshotted file system at a specific point in time).
All your files and directories will have file ACLs (access control lists). Additionally, all your Windows network shares (SMB/CIFS) will have share ACLs. This is a broad topic, but for a desktop system you can keep it simple: set your file ACLs the way you want to give permissions (using only allow, no deny), and leave the share permissions on default (all have access). Therefore, they are ignored and you just have to manage one set of permissions which work locally and for all network share protocols (AFP, SMB, NFS).
To show ACLs, use ls -Vd /home/alice
for the directory itself and ls -V /home/alice
for all files within. Depending on your system, ls
might be the wrong version (GNU ls
instead of Solaris ls
), so you might need the full path.
To modify ACLs, use chmod
(same as with list), a good documentation is here.
Also you should set any ZFS properties on the file systems (zfs get
and zfs set
) if needed.
Background on snapshots
Each snapshot is just an atomic saved state of the given file system at the point of creation. It is like a time machine, where you can go back and see how it was a day or a year ago. They are read-only, so you cannot modify them (only delete them completely), and they take space only for changed blocks since their creation.
This means each snapshot starts out at (almost) zero bytes size and each changed, added, or removed block is recorded and retained, meaning the snapshot begins to grow (because of the Copy-on-Write property of ZFS).
If you imagine your data on a line from left to right (like a timeline), a new block gets written to the right of the last old block. If you set a snapshot after block 5, initially nothing changes. Block 6 is then added on the right, but the snapshot still has only blocks 0 to 5 referenced. If block 3 is deleted, no space is reclaimed until the snapshot is destroyed, because it still references blocks 0 to 5. Modifying block 4 is (CoW!) the same as adding block 6 and moving the references after the write operation - but again, no space is freed because the snapshot still demands to hold the original blocks 0 to 5. If we finally destroy the snapshot, we reclaim blocks 4 and 5, leading to a hole (fragmentation), which might later be filled with other blocks.
This is the block level, each file can consist of multiple blocks all over the disk. For the file level, you see the files at that single point in the past, as if nothing had changed. It may be helpful to play around a bit with snapshots and add/edit/delete simple text files, so you get a feeling for it. The idea is quite simple, but it works very efficiently.
Automatic snapshot rotation
IIRC, on Linux you can use zfs-auto-snapshot to do it automatically, or you can setup your own scripts called by cron
at regular intervals (for creating snapshots and for destroying them).
As for a good rotation, it depends on your use patterns and needs. I would start with a generous amount, so you can reduce as needed. Deleting snapshots is easy, creating them retroactively is impossible. If your performance tanks, reduce the intervals.
- System data: for the "rm -rf /" moments and unwanted/bad updates, also for bugs that surface later
- once per hour, retain 12
- once per day, retain 7
- once per month, retain 12
- Personal data: user directories and network shares, essentially the most valuable data
- every five minutes, retain 12
- every hour, retain 24
- every day, retain 30
- every month, retain 12
- every year, retain 10
- Scrap data: for private things that should not propagate down the snaps, for temp data, for working data sets that change much but are useless after reboots
- Sysbackup: You don't need snapshots here, because you already have them on
rpool
and they are just copied over
Backup and restore
Basically, over time your snapshots accumulate and provide a sliding view into your data over a period of time. As hardware can and will likely fail, you need to back up this data to another disk or system. If you use zfs send
and zfs recv
, your interim snapshots and ACLs and properties are preserved, meaning a backup is simply a combination of a full recursive snapshot and recursive send/recv, no matter the target (can be another disk as expanded file system, another ZFS system, ZFS-aware Cloud storage, or even a tarball on any other system).
This rotation is different from your normal snapshots and should be named distinctly, for example with a prefix combined with date or increasing number, e. g. data@offsitebackup_217
. The names do not matter internally, but if you script it, you need to quickly find the latest existing backup (or remember the name somewhere else), as you need the delta between the last transmitted and the newly created snapshot:
# full initial send, destroy all filesystems on the destination
# which are not present on the source
zfs snapshot -r data@offsite_backup_1
zfs send -R data@offsite_backup_1 | ssh user@host zfs recv -Fdu data
# incremental send, destroy all filesystems on the destination
# which are not present on the source
zfs snapshot -r data@offsite_backup_2
zfs send -R -I data@offsite_backup_1 data@offsite_backup_2 | ssh user@host zfs recv -Fdu data
zfs destroy data@offsite_backup_1
As for the root pool, it is only slightly different: If you need to replace the disk, you must first create boot/swap and write the bootloader as usual, then you restore the snapshots and optionally also the mount points. I think beadm
on Solaris does essentially the same. Of course, locally you leave out the ssh user@host
portion. Again, first test with a small amount of data (real tests are necessary, the -n
flag will not work here).
Now you can copy or move over all your data (the usual way, for example cp
or rsync
).
No comments:
Post a Comment