配置基于RDMA的NFS服务

环境信息

  • OS:Ubuntu 18.04 LTS
  • 内核:5.4.0-84-generic
  • 网卡:Mellanox ConnectX-3 Pro(MT27520)

系统通用配置

  • 使用阿里云APT镜像源
# cat >/etc/apt/sources.list<<EOF
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
EOF

APT镜像源更新

# apt update -y
  • 内核参数配置
# cat >/etc/sysctl.d/99-common.conf<<EOF
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144
EOF
避免出现因文件描述符数量不足导致服务异常。

NFS服务端

服务配置

安装软件包

# apt -y install nfs-kernel-server nfs-common

增加NFS服务线程,编辑/etc/default/nfs-kernel-server文件,找到RPCNFSDCOUNT项,比如改成16

RPCNFSDCOUNT=16
...<省略若干行>...

配置NFS服务,比如将/data目录共享出去

# cat >/etc/exports<<EOF
/data  *(rw,async,crossmnt,insecure,fsid=0,no_auth_nlm,no_subtree_check,no_root_squash,no_all_squash)
EOF

# exportfs -av
exporting *:/data

重启服务

# systemctl restart nfs-server 
# systemctl enable nfs-server

开启RDMA协议

加载rdma内核模块

# modprobe xprtrdma    # 服务端
# modprobe svcrdma     # 客户端

指定服务端监听 RDMA 传输端口。

# echo 'rdma 20049' | tee /proc/fs/nfsd/portlist
# cat /proc/fs/nfsd/portlist
rdma 20049
rdma 20049
udp 2049
tcp 2049
udp 2049
tcp 2049

NFS systemd 整合RDMA,编辑/lib/systemd/system/nfs-server.service文件

[Unit]
Description=NFS server and services
DefaultDependencies=no
Requires=network.target proc-fs-nfsd.mount
Requires=nfs-mountd.service
Wants=rpcbind.socket
Wants=nfs-idmapd.service

After=local-fs.target
After=network.target proc-fs-nfsd.mount rpcbind.socket nfs-mountd.service
After=nfs-idmapd.service rpc-statd.service
Before=rpc-statd-notify.service

# GSS services dependencies and ordering
Wants=auth-rpcgss-module.service
After=rpc-gssd.service rpc-svcgssd.service

# start/stop server before/after client
Before=remote-fs-pre.target

Wants=nfs-config.service
After=nfs-config.service

[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils

Type=oneshot
RemainAfterExit=yes
ExecStartPre=/sbin/modprobe xprtrdma
ExecStartPre=/sbin/modprobe svcrdma
ExecStartPre=/usr/sbin/exportfs -r
ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
ExecStartPost=/bin/bash -c "sleep 3 && echo 'rdma 20049' | tee /proc/fs/nfsd/portlist"
ExecStop=/usr/sbin/rpc.nfsd 0
ExecStopPost=/usr/sbin/exportfs -au
ExecStopPost=/usr/sbin/exportfs -f

ExecReload=/usr/sbin/exportfs -r

[Install]
WantedBy=multi-user.target
Tips: 新增29,30,33行,这样就不需要手动去加载内核模块与配置RDMA传输端口了。

NFS客户端

服务配置

安装软件包

# apt -y install nfs-common

配置内核参数

# echo 'sunrpc.tcp_slot_table_entries = 128' >/etc/sysctl.d/99-sunrpc.conf
sysctl --system

加载rdma内核模块

# modprobe svcrdma     # 客户端

手动挂载

# mount -t nfs 192.168.200.35:/ /mnt/ -o vers=4.1,_netdev,rdma,port=20049,hard,intr,noatime,nodiratime,async,nolock,noacl,sec=sys,noresvport

查看挂载

# mount |grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
192.168.200.35:/ on /mnt type nfs4 (rw,noatime,nodiratime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=rdma,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=192.168.200.35,local_lock=none,addr=192.168.200.35,_netdev)
可以看到客户端与服务端使用的是RDMA协议。

FIO压测

使用fio简单测试个随机读

# fio --rw=randread --bs=64k --numjobs=4 --iodepth=8 --runtime=30 --time_based --loops=1 --ioengine=libaio --direct=1 --invalidate=1--fsync_on_close=1 --randrepeat=1 --norandommap --exitall --name task1 --filename=/mnt/1.txt --size=10000000
task1: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=8
...
fio-3.1
Starting 4 processes
task1: Laying out IO file (1 file / 9MiB)
fio: native_fallocate call failed: Operation not supported
Jobs: 4 (f=4): [r(4)][100.0%][r=4082MiB/s,w=0KiB/s][r=65.3k,w=0 IOPS][eta 00m:00s]
task1: (groupid=0, jobs=1): err= 0: pid=19205: Sun Nov 28 17:55:08 2021
   read: IOPS=15.9k, BW=995MiB/s (1043MB/s)(29.2GiB/30001msec)
    slat (usec): min=4, max=17475, avg=22.44, stdev=104.78
    clat (usec): min=3, max=24386, avg=477.53, stdev=656.88
     lat (usec): min=90, max=24421, avg=500.32, stdev=670.66
    clat percentiles (usec):
     |  1.00th=[  143],  5.00th=[  192], 10.00th=[  217], 20.00th=[  260],
     | 30.00th=[  297], 40.00th=[  334], 50.00th=[  367], 60.00th=[  404],
     | 70.00th=[  453], 80.00th=[  506], 90.00th=[  627], 95.00th=[  898],
     | 99.00th=[ 3392], 99.50th=[ 4883], 99.90th=[ 8455], 99.95th=[10552],
     | 99.99th=[16909]
   bw (  KiB/s): min=616064, max=1526272, per=23.83%, avg=1019563.92, stdev=227595.64, samples=60
   iops        : min= 9626, max=23848, avg=15930.52, stdev=3556.21, samples=60
  lat (usec)   : 4=0.01%, 50=0.01%, 100=0.06%, 250=17.95%, 500=60.88%
  lat (usec)   : 750=14.40%, 1000=2.41%
  lat (msec)   : 2=2.37%, 4=1.19%, 10=0.68%, 20=0.06%, 50=0.01%
  cpu          : usr=3.84%, sys=25.77%, ctx=384889, majf=0, minf=995
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=477658,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
task1: (groupid=0, jobs=1): err= 0: pid=19206: Sun Nov 28 17:55:08 2021
   read: IOPS=15.3k, BW=957MiB/s (1004MB/s)(28.0GiB/30001msec)
    slat (usec): min=4, max=12737, avg=22.72, stdev=108.28
    clat (usec): min=39, max=27395, avg=496.88, stdev=694.83
     lat (usec): min=102, max=27421, avg=519.95, stdev=709.23
    clat percentiles (usec):
     |  1.00th=[  149],  5.00th=[  196], 10.00th=[  221], 20.00th=[  265],
     | 30.00th=[  306], 40.00th=[  343], 50.00th=[  375], 60.00th=[  416],
     | 70.00th=[  465], 80.00th=[  519], 90.00th=[  652], 95.00th=[  971],
     | 99.00th=[ 3523], 99.50th=[ 4883], 99.90th=[ 9241], 99.95th=[11338],
     | 99.99th=[16450]
   bw (  KiB/s): min=600832, max=1629184, per=22.82%, avg=976299.95, stdev=228428.99, samples=59
   iops        : min= 9388, max=25456, avg=15254.64, stdev=3569.24, samples=59
  lat (usec)   : 50=0.01%, 100=0.07%, 250=16.68%, 500=60.13%, 750=15.62%
  lat (usec)   : 1000=2.66%
  lat (msec)   : 2=2.63%, 4=1.45%, 10=0.67%, 20=0.08%, 50=0.01%
  cpu          : usr=3.76%, sys=24.45%, ctx=386910, majf=0, minf=573
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=459455,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
task1: (groupid=0, jobs=1): err= 0: pid=19207: Sun Nov 28 17:55:08 2021
   read: IOPS=20.4k, BW=1274MiB/s (1335MB/s)(37.3GiB/30001msec)
    slat (usec): min=4, max=11251, avg=19.39, stdev=59.26
    clat (nsec): min=1970, max=26154k, avg=371278.12, stdev=417542.18
     lat (usec): min=91, max=26165, avg=390.93, stdev=425.22
    clat percentiles (usec):
     |  1.00th=[  126],  5.00th=[  165], 10.00th=[  192], 20.00th=[  225],
     | 30.00th=[  253], 40.00th=[  289], 50.00th=[  318], 60.00th=[  355],
     | 70.00th=[  388], 80.00th=[  441], 90.00th=[  519], 95.00th=[  611],
     | 99.00th=[ 1434], 99.50th=[ 2606], 99.90th=[ 6390], 99.95th=[ 8225],
     | 99.99th=[12780]
   bw (  MiB/s): min=  765, max= 1805, per=30.48%, avg=1273.51, stdev=272.90, samples=60
   iops        : min=12244, max=28892, avg=20376.03, stdev=4366.52, samples=60
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%, 100=0.04%
  lat (usec)   : 250=29.19%, 500=58.89%, 750=9.12%, 1000=1.13%
  lat (msec)   : 2=0.93%, 4=0.43%, 10=0.23%, 20=0.03%, 50=0.01%
  cpu          : usr=4.73%, sys=35.67%, ctx=425780, majf=0, minf=1561
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=611362,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
task1: (groupid=0, jobs=1): err= 0: pid=19208: Sun Nov 28 17:55:08 2021
   read: IOPS=15.2k, BW=952MiB/s (999MB/s)(27.9GiB/30000msec)
    slat (usec): min=3, max=16154, avg=22.48, stdev=100.97
    clat (nsec): min=1120, max=39989k, avg=500028.41, stdev=697938.47
     lat (usec): min=97, max=39996, avg=522.87, stdev=710.57
    clat percentiles (usec):
     |  1.00th=[  155],  5.00th=[  202], 10.00th=[  229], 20.00th=[  273],
     | 30.00th=[  310], 40.00th=[  343], 50.00th=[  375], 60.00th=[  416],
     | 70.00th=[  465], 80.00th=[  519], 90.00th=[  652], 95.00th=[  963],
     | 99.00th=[ 3687], 99.50th=[ 5276], 99.90th=[ 8455], 99.95th=[11076],
     | 99.99th=[16188]
   bw (  KiB/s): min=621787, max=1551647, per=22.82%, avg=976172.42, stdev=181730.02, samples=60
   iops        : min= 9715, max=24244, avg=15252.50, stdev=2839.44, samples=60
  lat (usec)   : 2=0.01%, 50=0.01%, 100=0.06%, 250=15.14%, 500=61.80%
  lat (usec)   : 750=15.53%, 1000=2.70%
  lat (msec)   : 2=2.54%, 4=1.37%, 10=0.79%, 20=0.07%, 50=0.01%
  cpu          : usr=3.66%, sys=24.38%, ctx=375132, majf=0, minf=995
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=457125,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: bw=4178MiB/s (4381MB/s), 952MiB/s-1274MiB/s (999MB/s-1335MB/s), io=122GiB (131GB), run=30000-30001msec