NAS setup logbook
After two weeks, my NAS is finally complete (almost, still need to setup NUT). You might think something like two weeks to build a NAS?!
but it was homemade with love, no TrueNAS/OMV hand-holding
was used here. That and I had to wait for hardware deliveries.
I wanted to write an autistic post about the long process and its minutiae, so here it is.
Hardware §

I already had a "home server" in the shape of an old repurposed workstation: AMD R5 2600, an ASUS motherboard with 6 SATA ports, 8GB of ECC DDR4 in a old Fractal Design Define R4 (lots of drive bays). So I only had to buy new HDDs (5x 10 TB Seagate IronWolf Pro, not trusting WD after the SMR Red scandal) and SATA power cable extensions (to allow single disk hot-unplugging). And a small Eaton UPS (3S 700VA).
First time using AliExpress, got the exact cables I wanted for 0,80€/unit instead of 10€ (!) on (Sc)Amazon. Will shop here for that kind of small potatoes from now on.
PSU failure
In fine, even after doing everything listed below, I still had constant ATA link failures and even a disk ejected from the RAID. On a whim, I changed the PSU by a brand new one I had lying around (never paid for that 80+ Platinum unit, it was an order mistake, lucky bastard that I am) and everything got fixed! Safe to say that this was probably the root issue, not my old drives dying, in defense of Toshiba's honour.
The old one (bottom right of the photograph) was a decent Seasonic Gold model, but well, it had 6 years of age and saw a few power surges… the replacement is still OEM'd by Seasonic, heh.
Topology and filesystem §
$ ssh user@server lsblk /dev/disk/by-id/ata-ST10000NT001-3LY101_WP027C'??' NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 9.1T 0 disk └─sda1 8:1 0 9.1T 0 part └─dm-integrity-WP027C1N 253:0 0 9.1T 0 crypt └─md0 9:0 0 18.2T 0 raid6 /home/user/data sdb 8:16 0 9.1T 0 disk └─sdb1 8:17 0 9.1T 0 part └─dm-integrity-WP027C9L 253:3 0 9.1T 0 crypt └─md0 9:0 0 18.2T 0 raid6 /home/user/data sdc 8:32 0 9.1T 0 disk └─sdc1 8:33 0 9.1T 0 part └─dm-integrity-WP027C98 253:2 0 9.1T 0 crypt └─md0 9:0 0 18.2T 0 raid6 /home/user/data sdd 8:48 0 9.1T 0 disk └─sdd1 8:49 0 9.1T 0 part └─dm-integrity-WP027C38 253:1 0 9.1T 0 crypt └─md0 9:0 0 18.2T 0 raid6 /home/user/data
Decided on a RAID 6 because I don't play around and don't need that much
space. Why the strange dm-integrity+mdraid+XFS choice? Simple, I don't care about compression (multimedia formats are already
compressed) nor online dedup (I just want cp --reflink
), only self-healing. I had two
other available choices: Btrfs, known for its unstable RAID 5/6 support, and ZFS.
NB: only use dm-integrity with a kernel ≥5.4, reason.
Why not ZFS?
Because looking at OpenZFS with the eyes of a C/C++ dev (who knows how tricky those are, unlike most sysadmins) unaffected by the hype tells me they're losing the fight against complexity, historical cruft and portability friction. In details:
- They're still implementing complex and invasive features instead of focusing on stability.
- Related, they don't seem to have any concept of LTS: the 2.2 series was abandoned as soon as 2.3 started being worked on (some brave soul is spearheading a 2.2.8 effort). Doubly important since Linux isn't backward compatible with external modules (so you can't "just use an old version").
- It may be full of useful features (and by extension so many knobs to twiddle it'd give JVM tuning consultants a hard-on) but it's also a bug nest. And not small ones, critical bugs. A non-exhaustive list:Seriously, just search "OOM" or "crash" in their bug tracker and peruse a bit.
- ZFS implements its own page cache (so no zero-copy
sendfile(2)
/splice(2)
), scheduler (so noionice(1)
), mdadm/mdraid+LVM+dm+etc… it's basically the X.org of filesystems, for better and for worse. - Fragmentation isn't a solved problem (compounded by the lack of working
fallocate(mode=0)
, supposedly a byproduct of CoW, though XFS doesn't seem to suffer as much, probably because it uses huge - thus dangerous - writeback buffers), and no defragmentation is possible (not even offline!) without "simply" rebuilding a pool and moving the old stuff to it. - I've rarely seen numbers to justify the ARC hype over Linux's built-in LRU. Would love to see modern benchmarks.
So thanks but no thanks, my mother always told me not to play with radioactive fire.
Step by step §
First, I reused the OS that was on my server: a trusty Gentoo (like on my desktop).
Preliminary tasks
- Update the ancient BIOS just to be sure and inspect the settings (disabled PBO, stupid on an always-on server).
- Check the kernel config for:
CONFIG_CRYPTO_CRC32C_INTEL
(CONFIG_CRC_OPTIMIZATIONS
≥6.14)CONFIG_MD_RAID456
CONFIG_DM_INTEGRITY
CONFIG_NFSD
andCONFIG_NFSD_V4
CONFIG_MD_AUTODETECT=n
, useless since our RAID is on top of dm.
SMART testing
Always vet your new hard drives with an extended SMART test and RMA the ones with errors.
$ for dev in /dev/disk/by-id/ata-ST10000NT001* do sudo smartctl -t long "$dev" done … $ for dev in /dev/disk/by-id/ata-ST10000NT001* do sudo smartctl -l selftest "$dev" | tail -n+6 done Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 13 - Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 13 - Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 13 - Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 13 -
Partitioning
You can use raw hard drives for RAID and dm, but since you can only replace (or add) drives in your array with some of equal or greater size and that manufacturers can (or used to?) have slight sector count discrepancies for the same advertised size, we create partitions with just a little space left out.
$ for dev in /dev/disk/by-id/ata-ST10000NT001* do echo size=19532800000 | sudo sfdisk -Xgpt "$dev" done
Here I chose 19532800000 sectors because it's a multiple of the physical/logical sector size ratio of modern drives that aren't 4K native (8) and leaves a reasonable ~35 MB out.
Create dm-integrity devices
We create dm-integrity devices on our new partitions, these'll work under the RAID to notify it of any broken block which will be then corrected according to the RAID's parity; that's the aforementioned self-healing.
$ tmux # Split in 4 # Launch `sudo integritysetup format --integrity-bitmap-mode --sector-size 4096 # --progress-frequency 30 /dev/sdX1` in each # Detach

Any better idea for a less involved solution? Could probably write a simple script to spawn
each, filter/label their progress and multiplex it in a single FIFO, then nohup sudo
it, hmmm.
Create the RAID array
We open (mount) our new dm-integrity devices:
$ for p in /dev/disk/by-id/ata-ST10000NT001*-part1 do tmp=${p#*_} name=dm-integrity-${tmp%-part1} sudo integritysetup open --integrity-bitmap-mode "$p" "$name" done
Build the array and wait patiently for the initial sync to finish:
$ sudo mdadm --create --verbose --level=6 --raid-devices=4 /dev/md0 /dev/mapper/* … $ cat /proc/mdstat Personalities : [linear] [raid6] [raid5] [raid4] md0 : active raid6 dm-3[3] dm-2[2] dm-1[1] dm-0[0] 19513348096 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU] [=====>……………] resync = 29.1% (2842051012/9756674048) finish=825.8min speed=139545K/sec bitmap: 52/73 pages [208KB], 65536KB chunk unused devices:
Create the filesystem
$ sudo mkfs.xfs /dev/md0 meta-data=/dev/md0 isize=512 agcount=32, agsize=152448128 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=0 data = bsize=4096 blocks=4878337024, imaxpct=5 = sunit=128 swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
Nothing more to do. The sunit
and swidth
parameters (underlying
RAID stripe unit/width, important for performance) are fortunately autodetected when using software
RAID, these days.
Setup OpenRC
This complex layered cake isn't going to automagically mount/unmount itself in the right order at startup/shutdown, so we need to setup our service manager to do so, OpenRC in my case.
Using the same order as during creation, first we need to mount our dm-integrity devices. This is where it really wasn't fun because there's no support in OpenRC for this! Whereas systemd has integritytab, we need to write something to replace it…
First, we look at some existing services in /etc/init.d
(e.g. the very similar dmcrypt
), then we use our honed POSIX sh writing skills and the official guide to bake ourselves a beautiful service:
Expand to see the service file (/etc/init.d/dm-integrity
)
#!/sbin/openrc-run # Devices configuration in /etc/conf.d/dm-integrity echo() { printf '%s\n' "$*"; } nth_arg_from_end() { local delta=$1; shift; eval echo "\${$(($#-delta))}"; } depend() { use modules before checkfs fsck after dev-settle } start() { local status=0 i= dev= argline= # dev-settle is broken, we still need to manually wait for /dev/disk symlinks ewaitfile 10 $(echo "$integritysetup_open_args" | awk '! /^$/ {print $(NF-1)}') while IFS= read -r argline; do dev=$(nth_arg_from_end 1 $argline) ebegin "integritysetup open $argline" if ! [ -b "$dev" ]; then eerror "$dev: device not found even after waiting 10 s" continue fi integritysetup open --batch-mode $argline eend $? || status=1 done <<EOF $(echo "$integritysetup_open_args" | sed '/^$/d') EOF if [ $status -eq 1 ]; then eerror "Failed opening some devices" return 1 fi } stop() { local name= echo "$integritysetup_open_args" | awk '! /^$/ {print $NF}' | while IFS= read -r name; do ebegin "integritysetup close $name" integritysetup close "$name" eend $? done } status() { local name= echo "$integritysetup_open_args" | awk '! /^$/ {print $NF}' | while IFS= read -r name; do if ! [ -b /dev/mapper/"$name" ]; then eerror "$name not opened" return 1 fi done return 0 }
Then we just have to configure it (via the /etc/conf.d/dm-integrity
file):
# /etc/conf.d/dm-integrity: config file for /etc/init.d/dm-integrity # Each line contains the arguments for an `integritysetup open` call; empty lines are ignored integritysetup_open_args=' --integrity-bitmap-mode /dev/disk/by-id/ata-ST10000NT001-3LY101_WP027C1N-part1 dm-integrity-WP027C1N --integrity-bitmap-mode /dev/disk/by-id/ata-ST10000NT001-3LY101_WP027C38-part1 dm-integrity-WP027C38 --integrity-bitmap-mode /dev/disk/by-id/ata-ST10000NT001-3LY101_WP027C98-part1 dm-integrity-WP027C98 --integrity-bitmap-mode /dev/disk/by-id/ata-ST10000NT001-3LY101_WP027C9L-part1 dm-integrity-WP027C9L'
NB: some of the complexity comes from the fact that I use those device symlinks setup by udev
since it's the only unambiguous way to refer to physical drives; one could
use PARTUUID
s, but I don't like it very much.
Then enable it by adding it to the early boot runlevel:
$ sudo rc-update add boot dm-integrity
Now for the RAID, we need to register the array for mdadm
to know what to
re-assemble later:
$ sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf
Then declare the new dependency between our two services:
$ echo 'rc_need=dm-integrity' | sudo tee -a /etc/conf.d/mdraid
Before also enabling it:
$ sudo rc-update add boot mdraid
That's it!
Conclusion
Except the startling absence of integritytab
equivalent, OpenRC's dev-settle
thing being broken and the Gentoo wiki lacking in documentation about
RAID and dm-integrity (always look at the Arch wiki too), this wasn't too bad.
Still painful for the non-computer wizards compared to ZFS' all-in-one approach. This is something that RedHat's stratis may fix in the future… crossing my fingers.
Omake 1: NFS §
I don't need encryption within my LAN, sshfs is slow and dead and Samba brings too much Windows brain damage along and can be a bit slow too. So NFS it is.
Never used it before, but it wasn't too hard. Well, it wasn't because I explicitely used the
much saner NFSv4 everywhere (even disabled v2/3 at the kernel and USE flag level) which does away
with many moving parts (rpc.statd
, rpc.idmapd
, rpcbind
, sm-notify
) and needs only one TCP port opening in firewalls.
Had a few problems with OpenRC being decrepit in that place too, but after a few hours, I had myself a squeaky clean and performant NFSv4.2 setup with server-side copy working.
NFS and reflinking
Even more impressive, I got server-side reflinking (copy-on-write copies to cavemen) for free. Behold!
$ mkdir share; sudo mount -t nfs -o rw server:/exports/data/ share/ $ echo foo >share/a $ cp --reflink=auto share/a share/b # --reflink=auto is the default since coreutils 9.0 $ ssh user@server is_reflink /exports/data/a /exports/data/b && echo '200% rad!' 200% rad!
PS: is_reflink
is a small Linux-only script that can be found here
Apparently, anything that uses copy_file_range(2)
Just Werks™ (in FreeBSD 13 too).
Omake 2: lf §
Now that all my media files are accessed through NFS and a GbE link, using the poorly programmed ranger and its UI blocking on I/O quickly became torture. Despite its long years of loyal services, I switched to lf during this week-end and am now thoroughly impressed by the performance and design cleanliness gap.
One consequence of the UNIX-ier design is that I was forced to improve my xdg-open script to replace ranger's rifle/sxiv-rifle.