3FS isn't particularly fast in mdbench, though. Maybe our FDB tuning skill is what to blame, or FUSE, I don't know, but it doesn't really matter.
The truly amazing part for me is combining NVMe SSD + RDMA + supports reading a huge batch of random offsets from a few already opened huge files efficiently. This is how you get your training boxes consuming 20~30GiB/s (and roughly 4 million IOPS).
FUSE has traditionally been famously slow. I remember there were some changes that supposedly made it faster, but maybe that was just a certain fuse implementation.
The truly amazing part for me is combining NVMe SSD + RDMA + supports reading a huge batch of random offsets from a few already opened huge files efficiently. This is how you get your training boxes consuming 20~30GiB/s (and roughly 4 million IOPS).