Veritas CFS Media Server Workloads
Sequential read I/O throughput test
Date: 18th September 2015
Colin Eldridge
Shrinivas Chandukar
What is the purpose of this document?
This initial document is designed to help setup a CFS environment for use as a media server solution.
The idea is to repeat the testing we have performed in this document using your own h/w environment
This report is specific to sequential read I/O, it includes best practices and configuration recommendations.
This testing will identify the I/O bottlenecks in your h/w environment.
The testing will identify the maximum read I/O throughput that can be achieved from one node and the maximum read I/O throughput from all nodes combined, using your h/w environment.
This testing will identify the best stripe-width and number of columns for your VxVM volume.
This testing will identify the best file system read_ahead tuning for a sequential read I/O workload.
In summary:
This document attempts to explain how to setup a media server solution, including:
how to perform the tests
how to measure the I/O throughput
how to choose the correct VxVM volume configuration and achieve balanced I/O
how to identify the bottlenecks in the I/O path using your h/w environment
how to tune the file system read_ahead to balance the read I/O throughput across processes
You should then understand the capabilities of your h/w environment, including:
the maximum read I/O throughput that will be possible in the environment
the mechanism of balancing the I/O across the LUNs
the mechanism of balancing the read I/O throughput across active processes/threads
< 1. Hardware, DMP paths and volume configuration >
HOST side
Each node has a dual port HBA card (so 2 active DMP paths to each LUN), each HBA port is connected to a different FC switch
The theoretical maximum throughput per FC port on the HBA is 8Gbits/sec
The theoretical maximum throughput per node (two FC ports) is 16Gbits/sec.
The theoretical maximum throughput for two nodes is therefore 32Gbits/sec.
In reality during our testing the maximum throughput we could reach from one node was 12Gbits/sec
In our 1-node testing the dual port HBA therefore bottlenecked at approximately 12Gbits/sec (1.5 Gbytes/sec), so this is our approx. maximum throughput from one node.
FC Switch
Each switch is capable of 32Gbits/sec, there are two switches so the total theoretical max throughout for both switches is 64Gbits/sec.
Each individual switch port is capable of 8Gbits/sec.
We are using 4 switch ports connected to HBA FC ports on the host nodes – this limits the max throughout at the switch to 32Gbits/sec (through both switches).
We are using 12 switch ports connected to the modular storage arrays.
Storage Array
We have 6 modular storage arrays.
We are using 2 ports from each storage array – each port has a theoretical maximum throughput of 4Gbits/sec.
We therefore have a total of 12 storage array connections to the two FC switches (6 connections to each switch)
The theoretical maximum throughput is therefore 48Gbits/sec for the storage arrays.
In our 2-node testing the combination of 6 storage arrays bottlenecked at approximately 20Gbits/sec (2.5 Gbytes/sec), so this is our approx. maximum throughout from both nodes.
# vxdmpadm listenclosureENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT FIRMWARE=======================================================================================================storagearray-0 STORAGEARRAY- 21000022a1035118 CONNECTED A/A-A-STORAGE 4 1storagearray-1 STORAGEARRAY- 21000022a1035119 CONNECTED A/A-A-STORAGE 4 1storagearray-2 STORAGEARRAY- 21000022a1035116 CONNECTED A/A-A-STORAGE 4 1storagearray-3 STORAGEARRAY- 21000022a1035117 CONNECTED A/A-A-STORAGE 4 1storagearray-4 STORAGEARRAY- 21000022a106c70a CONNECTED A/A-A-STORAGE 4 1storagearray-5 STORAGEARRAY- 21000022a106c705 CONNECTED A/A-A-STORAGE 4 1
LUNs
Each modular array has 4 enclosures with 12 disks each, only 11 disks are used in each enclosure for a RAID-0 LUN
Each LUN is comprised of 11 disks (11 way stripe), 64Kb stripe width (one disk is kept as a failure disk).
There are 4 LUNs per modular array, therefore we have a total of 24 LUNs.
Each LUN is approximately 3TB.
All 24 LUNs can be displayed using the “vxdisk list” command:
# vxdisk list DEVICE TYPE DISK GROUP STATUS storagearray-0_16 auto:cdsdisk storagearray-0_16 testdg online shared storagearray-0_17 auto:cdsdisk storagearray-0_17 testdg online shared storagearray-0_18 auto:cdsdisk storagearray-0_18 testdg online shared storagearray-0_20 auto:cdsdisk storagearray-0_20 testdg online shared storagearray-1_6 auto:cdsdisk storagearray-1_6 testdg online shared storagearray-1_7 auto:cdsdisk storagearray-1_7 testdg online shared storagearray-1_8 auto:cdsdisk storagearray-1_8 testdg online shared storagearray-1_9 auto:cdsdisk storagearray-1_9 testdg online shared storagearray-2_5 auto:cdsdisk storagearray-2_5 testdg online shared storagearray-2_6 auto:cdsdisk storagearray-2_6 testdg online shared storagearray-2_7 auto:cdsdisk storagearray-2_7 testdg online shared storagearray-2_8 auto:cdsdisk storagearray-2_8 testdg online shared storagearray-3_4 auto:cdsdisk storagearray-3_4 testdg online shared storagearray-3_6 auto:cdsdisk storagearray-3_6 testdg online shared storagearray-3_7 auto:cdsdisk storagearray-3_7 testdg online shared storagearray-3_8 auto:cdsdisk storagearray-3_8 testdg online shared storagearray-4_8 auto:cdsdisk storagearray-4_8 testdg online shared storagearray-4_9 auto:cdsdisk storagearray-4_9 testdg online shared storagearray-4_10 auto:cdsdisk storagearray-4_10 testdg online shared storagearray-4_11 auto:cdsdisk storagearray-4_11 testdg online shared storagearray-5_8 auto:cdsdisk storagearray-5_8 testdg online shared storagearray-5_9 auto:cdsdisk storagearray-5_9 testdg online shared storagearray-5_10 auto:cdsdisk storagearray-5_10 testdg online shared storagearray-5_11 auto:cdsdisk storagearray-5_11 testdg online shared
DMP paths
There are 2 paths per LUN (on each node).
Both paths are active, therefore there are 48 active paths in total (on each node).
All 48 paths can be displayed using the “vxdisk path” command:
# vxdisk path SUBPATH DANAME DMNAME GROUP STATE sdad storagearray-0_16 storagearray-0_16 testdg ENABLED sdo storagearray-0_16 storagearray-0_16 testdg ENABLED sdab storagearray-0_17 storagearray-0_17 testdg ENABLED sdm storagearray-0_17 storagearray-0_17 testdg ENABLED sdae storagearray-0_18 storagearray-0_18 testdg ENABLED sdp storagearray-0_18 storagearray-0_18 testdg ENABLED sdac storagearray-0_20 storagearray-0_20 testdg ENABLED sdn storagearray-0_20 storagearray-0_20 testdg ENABLED sdx storagearray-1_6 storagearray-1_6 testdg ENABLED sdan storagearray-1_6 storagearray-1_6 testdg ENABLED sdaa storagearray-1_7 storagearray-1_7 testdg ENABLED sdaq storagearray-1_7 storagearray-1_7 testdg ENABLED sdz storagearray-1_8 storagearray-1_8 testdg ENABLED sdap storagearray-1_8 storagearray-1_8 testdg ENABLED sdy storagearray-1_9 storagearray-1_9 testdg ENABLED sdao storagearray-1_9 storagearray-1_9 testdg ENABLED sdat storagearray-2_5 storagearray-2_5 testdg ENABLED sdw storagearray-2_5 storagearray-2_5 testdg ENABLED sdar storagearray-2_6 storagearray-2_6 testdg ENABLED sdu storagearray-2_6 storagearray-2_6 testdg ENABLED sdas storagearray-2_7 storagearray-2_7 testdg ENABLED sdv storagearray-2_7 storagearray-2_7 testdg ENABLED sdaz storagearray-2_8 storagearray-2_8 testdg ENABLED sday storagearray-2_8 storagearray-2_8 testdg ENABLED sdq storagearray-3_4 storagearray-3_4 testdg ENABLED sdau storagearray-3_4 storagearray-3_4 testdg ENABLED sds storagearray-3_6 storagearray-3_6 testdg ENABLED sdaw storagearray-3_6 storagearray-3_6 testdg ENABLED sdav storagearray-3_7 storagearray-3_7 testdg ENABLED sdr storagearray-3_7 storagearray-3_7 testdg ENABLED sdax storagearray-3_8 storagearray-3_8 testdg ENABLED sdt storagearray-3_8 storagearray-3_8 testdg ENABLED sdaf storagearray-4_8 storagearray-4_8 testdg ENABLED sdi storagearray-4_8 storagearray-4_8 testdg ENABLED sdag storagearray-4_9 storagearray-4_9 testdg ENABLED sdj storagearray-4_9 storagearray-4_9 testdg ENABLED sdl storagearray-4_10 storagearray-4_10 testdg ENABLED sdai storagearray-4_10 storagearray-4_10 testdg ENABLED sdk storagearray-4_11 storagearray-4_11 testdg ENABLED sdah storagearray-4_11 storagearray-4_11 testdg ENABLED sdh storagearray-5_8 storagearray-5_8 testdg ENABLED sdam storagearray-5_8 storagearray-5_8 testdg ENABLED sdg storagearray-5_9 storagearray-5_9 testdg ENABLED sdal storagearray-5_9 storagearray-5_9 testdg ENABLED sde storagearray-5_10 storagearray-5_10 testdg ENABLED sdaj storagearray-5_10 storagearray-5_10 testdg ENABLED sdf storagearray-5_11 storagearray-5_11 testdg ENABLED sdak storagearray-5_11 storagearray-5_11 testdg ENABLED
VxVM volume
The idea is to achieve balanced I/O across all the LUNs, and to maximise the h/w I/O bandwidth.
As we have 24 LUNs available we created our VxVM volume with 24 columns to obtain the maximum possible throughput.
We then tested using three different VxVM stripe unit widths, 64Kb, 512Kb, 1024Kb
The “stripewidth” argument to the vxassist command is in units of 512byte sectors.
Volume configuration using 64k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=128 `vxdisk list|grep storage|awk '{print $1}'`
v vol1 - ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen pl vol1-01 vol1 ENABLED ACTIVE 107374184448 STRIPE 24/128 RW sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924352 0/0 storagearray-0_16 ENA sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924352 1/0 storagearray-0_17 ENA sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924352 2/0 storagearray-0_18 ENA sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924352 3/0 storagearray-0_20 ENA sd storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924352 4/0 storagearray-1_6 ENA sd storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924352 5/0 storagearray-1_7 ENA sd storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924352 6/0 storagearray-1_8 ENA sd storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924352 7/0 storagearray-1_9 ENA sd storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924352 8/0 storagearray-2_5 ENA sd storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924352 9/0 storagearray-2_6 ENA sd storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924352 10/0 storagearray-2_7 ENA sd storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924352 11/0 storagearray-2_8 ENA sd storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924352 12/0 storagearray-3_4 ENA sd storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924352 13/0 storagearray-3_6 ENA sd storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924352 14/0 storagearray-3_7 ENA sd storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924352 15/0 storagearray-3_8 ENA sd storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924352 16/0 storagearray-4_8 ENA sd storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924352 17/0 storagearray-4_9 ENA sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924352 18/0 storagearray-4_10 ENA sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924352 19/0 storagearray-4_11 ENA sd storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924352 20/0 storagearray-5_8 ENA sd storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924352 21/0 storagearray-5_9 ENA sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924352 22/0 storagearray-5_10 ENA sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924352 23/0 storagearray-5_11 ENA
Volume configuration using 512k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=1024 `vxdisk list|grep storage|awk '{print $1}'`
v vol1 - ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen pl vol1-01 vol1 ENABLED ACTIVE 107374190592 STRIPE 24/1024 RW sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924608 0/0 storagearray-0_16 ENA sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924608 1/0 storagearray-0_17 ENA sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924608 2/0 storagearray-0_18 ENA sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924608 3/0 storagearray-0_20 ENA sd storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473924608 4/0 storagearray-1_6 ENA sd storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473924608 5/0 storagearray-1_7 ENA sd storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473924608 6/0 storagearray-1_8 ENA sd storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473924608 7/0 storagearray-1_9 ENA sd storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473924608 8/0 storagearray-2_5 ENA sd storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473924608 9/0 storagearray-2_6 ENA sd storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473924608 10/0 storagearray-2_7 ENA sd storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473924608 11/0 storagearray-2_8 ENA sd storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473924608 12/0 storagearray-3_4 ENA sd storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473924608 13/0 storagearray-3_6 ENA sd storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473924608 14/0 storagearray-3_7 ENA sd storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473924608 15/0 storagearray-3_8 ENA sd storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473924608 16/0 storagearray-4_8 ENA sd storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473924608 17/0 storagearray-4_9 ENA sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924608 18/0 storagearray-4_10 ENA sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924608 19/0 storagearray-4_11 ENA sd storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473924608 20/0 storagearray-5_8 ENA sd storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473924608 21/0 storagearray-5_9 ENA sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924608 22/0 storagearray-5_10 ENA sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924608 23/0 storagearray-5_11 ENA
Volume configuration using 1024k stripe width volume, 24 columns:
# vxassist -g testdg make vol1 50T layout=striped stripewidth=2048 `vxdisk list|grep storage|awk '{print $1}'`
v vol1 - ENABLED ACTIVE 107374182400 SELECT vol1-01 fsgen pl vol1-01 vol1 ENABLED ACTIVE 107374215168 STRIPE 24/2048 RW sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473925632 0/0 storagearray-0_16 ENA sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473925632 1/0 storagearray-0_17 ENA sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473925632 2/0 storagearray-0_18 ENA sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473925632 3/0 storagearray-0_20 ENA sd storagearray-1_6-01 vol1-01 storagearray-1_6 0 4473925632 4/0 storagearray-1_6 ENA sd storagearray-1_7-01 vol1-01 storagearray-1_7 0 4473925632 5/0 storagearray-1_7 ENA sd storagearray-1_8-01 vol1-01 storagearray-1_8 0 4473925632 6/0 storagearray-1_8 ENA sd storagearray-1_9-01 vol1-01 storagearray-1_9 0 4473925632 7/0 storagearray-1_9 ENA sd storagearray-2_5-01 vol1-01 storagearray-2_5 0 4473925632 8/0 storagearray-2_5 ENA sd storagearray-2_6-01 vol1-01 storagearray-2_6 0 4473925632 9/0 storagearray-2_6 ENA sd storagearray-2_7-01 vol1-01 storagearray-2_7 0 4473925632 10/0 storagearray-2_7 ENA sd storagearray-2_8-01 vol1-01 storagearray-2_8 0 4473925632 11/0 storagearray-2_8 ENA sd storagearray-3_4-01 vol1-01 storagearray-3_4 0 4473925632 12/0 storagearray-3_4 ENA sd storagearray-3_6-01 vol1-01 storagearray-3_6 0 4473925632 13/0 storagearray-3_6 ENA sd storagearray-3_7-01 vol1-01 storagearray-3_7 0 4473925632 14/0 storagearray-3_7 ENA sd storagearray-3_8-01 vol1-01 storagearray-3_8 0 4473925632 15/0 storagearray-3_8 ENA sd storagearray-4_8-01 vol1-01 storagearray-4_8 0 4473925632 16/0 storagearray-4_8 ENA sd storagearray-4_9-01 vol1-01 storagearray-4_9 0 4473925632 17/0 storagearray-4_9 ENA sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473925632 18/0 storagearray-4_10 ENA sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473925632 19/0 storagearray-4_11 ENA sd storagearray-5_8-01 vol1-01 storagearray-5_8 0 4473925632 20/0 storagearray-5_8 ENA sd storagearray-5_9-01 vol1-01 storagearray-5_9 0 4473925632 21/0 storagearray-5_9 ENA sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473925632 22/0 storagearray-5_10 ENA sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473925632 23/0 storagearray-5_11 ENA
< 2. VxVM maximum disk I/O: Read throughput test execution >
Raw disk deviceread I/O test execution and collection of throughput results:
vxbench sequential read test execution method and result collection
An example of the vxbench command that we run on each node is below.
This test executes 64 parallel processes, each process is reading from the same raw volume device, reading using a block size of 1MB.
The output of the vxbench command provides the combined total throughput of all 64 parallel processes, we capture this information in our result table.
The result in this example test was 1577033.29 KBytes/second
Therefore the result of this test was 1.504 GBytes/second
Test: vxbench
IO : sequential read of raw volume
IOsize=1024K
VxVM volume stripe width 512KB, 24 columns
Processes: 64
$ ./vxbench -w read -i iosize=1024k,iotime=300,maxfilesize=40T /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 user 1: 300.015 sec 24625.94 KB/s cpu: 0.75 sys 0.00 user user 2: 300.024 sec 24498.95 KB/s cpu: 0.75 sys 0.00 user user 3: 300.004 sec 24667.86 KB/s cpu: 0.75 sys 0.00 user user 4: 300.020 sec 24417.35 KB/s cpu: 0.75 sys 0.00 user user 5: 300.016 sec 24574.65 KB/s cpu: 0.74 sys 0.01 user user 6: 300.012 sec 24615.97 KB/s cpu: 0.74 sys 0.01 user user 7: 300.029 sec 24689.68 KB/s cpu: 0.76 sys 0.00 user user 8: 300.023 sec 24587.75 KB/s cpu: 0.75 sys 0.00 user user 9: 300.032 sec 24668.98 KB/s cpu: 0.76 sys 0.00 user user 10: 300.024 sec 24795.84 KB/s cpu: 0.76 sys 0.00 user user 11: 300.033 sec 24546.01 KB/s cpu: 0.75 sys 0.00 user user 12: 300.024 sec 24761.75 KB/s cpu: 0.76 sys 0.00 user user 13: 300.028 sec 24543.02 KB/s cpu: 0.76 sys 0.00 user user 14: 300.014 sec 24591.96 KB/s cpu: 0.75 sys 0.01 user user 15: 300.013 sec 24568.13 KB/s cpu: 0.75 sys 0.00 user user 16: 300.037 sec 24624.20 KB/s cpu: 0.75 sys 0.00 user user 17: 300.018 sec 24734.97 KB/s cpu: 0.76 sys 0.00 user user 18: 300.003 sec 24596.26 KB/s cpu: 0.76 sys 0.00 user user 19: 300.004 sec 24886.31 KB/s cpu: 0.77 sys 0.00 user user 20: 300.007 sec 24879.24 KB/s cpu: 0.76 sys 0.00 user user 21: 300.017 sec 24434.71 KB/s cpu: 0.75 sys 0.00 user user 22: 300.027 sec 24437.31 KB/s cpu: 0.76 sys 0.00 user user 23: 300.019 sec 24635.87 KB/s cpu: 0.75 sys 0.00 user user 24: 300.028 sec 24665.88 KB/s cpu: 0.76 sys 0.00 user user 25: 300.021 sec 24519.64 KB/s cpu: 0.75 sys 0.00 user user 26: 300.022 sec 24587.85 KB/s cpu: 0.76 sys 0.00 user user 27: 300.006 sec 24647.22 KB/s cpu: 0.77 sys 0.00 user user 28: 300.019 sec 24666.62 KB/s cpu: 0.76 sys 0.00 user user 29: 300.006 sec 24544.82 KB/s cpu: 0.76 sys 0.00 user user 30: 300.022 sec 24625.35 KB/s cpu: 0.75 sys 0.00 user user 31: 300.021 sec 24649.38 KB/s cpu: 0.75 sys 0.00 user user 32: 300.016 sec 24701.01 KB/s cpu: 0.76 sys 0.00 user user 33: 300.018 sec 24683.74 KB/s cpu: 0.75 sys 0.00 user user 34: 300.018 sec 24738.38 KB/s cpu: 0.77 sys 0.00 user user 35: 300.001 sec 24599.78 KB/s cpu: 0.75 sys 0.00 user user 36: 300.008 sec 24674.30 KB/s cpu: 0.76 sys 0.00 user user 37: 300.024 sec 24580.86 KB/s cpu: 0.75 sys 0.00 user user 38: 300.023 sec 24628.71 KB/s cpu: 0.75 sys 0.00 user user 39: 300.007 sec 24701.75 KB/s cpu: 0.77 sys 0.00 user user 40: 300.026 sec 24765.01 KB/s cpu: 0.76 sys 0.00 user user 41: 300.007 sec 24824.63 KB/s cpu: 0.76 sys 0.00 user user 42: 300.015 sec 24707.90 KB/s cpu: 0.78 sys 0.00 user user 43: 300.032 sec 24587.01 KB/s cpu: 0.76 sys 0.00 user user 44: 300.027 sec 24700.06 KB/s cpu: 0.78 sys 0.00 user user 45: 300.019 sec 24584.70 KB/s cpu: 0.77 sys 0.00 user user 46: 300.013 sec 24745.56 KB/s cpu: 0.78 sys 0.00 user user 47: 300.033 sec 24556.21 KB/s cpu: 0.77 sys 0.00 user user 48: 300.012 sec 24728.58 KB/s cpu: 0.77 sys 0.01 user user 49: 300.010 sec 24489.82 KB/s cpu: 0.76 sys 0.00 user user 50: 300.020 sec 24751.83 KB/s cpu: 0.76 sys 0.01 user user 51: 300.035 sec 24846.13 KB/s cpu: 0.77 sys 0.00 user user 52: 300.012 sec 24639.83 KB/s cpu: 0.75 sys 0.00 user user 53: 300.010 sec 24691.24 KB/s cpu: 0.77 sys 0.00 user user 54: 300.029 sec 24686.29 KB/s cpu: 0.77 sys 0.00 user user 55: 300.021 sec 24608.41 KB/s cpu: 0.77 sys 0.00 user user 56: 300.027 sec 24440.67 KB/s cpu: 0.77 sys 0.00 user user 57: 300.017 sec 24700.92 KB/s cpu: 0.77 sys 0.00 user user 58: 300.026 sec 24645.57 KB/s cpu: 0.77 sys 0.00 user user 59: 300.004 sec 24442.54 KB/s cpu: 0.76 sys 0.00 user user 60: 300.011 sec 24749.21 KB/s cpu: 0.77 sys 0.00 user user 61: 300.006 sec 24865.61 KB/s cpu: 0.77 sys 0.00 user user 62: 300.023 sec 24468.29 KB/s cpu: 0.75 sys 0.00 user user 63: 300.023 sec 24662.87 KB/s cpu: 0.77 sys 0.00 user user 64: 300.017 sec 24646.26 KB/s cpu: 0.76 sys 0.00 user total: 300.037 sec 1577033.29 KB/s cpu: 48.63 sys 0.05 user
iostat throughput data method and result collection
An example of the iostat command that we run on each node is below.
The sector size is 512bytes.
Note that the average request size avgrq-szis 1024, this is 1024 sectors * 512bytes = 512KB read I/O size.
The result in this example test is 3155251.20 sectors/second
Therefore the result of the test is 1.504 GBytes/second
$iostat –x 20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 4.30 0.00 1.25 0.00 44.40 35.52 0.00 0.12 0.04 0.01 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdp 0.00 0.00 61.95 0.00 63436.80 0.00 1024.00 1.87 30.22 12.73 78.84 sdo 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 1.82 27.99 12.35 80.21 sdn 0.00 0.00 64.90 0.00 66457.60 0.00 1024.00 2.03 31.32 12.92 83.84 sds 0.00 0.00 64.90 0.00 66457.60 0.00 1024.00 2.05 31.60 12.76 82.79 sdt 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.16 33.38 12.47 80.75 sdq 0.00 0.00 63.20 0.00 64716.80 0.00 1024.00 1.88 29.71 12.37 78.20 sdr 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.05 31.67 12.28 79.48 sdx 0.00 0.00 62.40 0.00 63897.60 0.00 1024.00 2.05 32.81 13.16 82.10 sdz 0.00 0.00 66.60 0.00 68198.40 0.00 1024.00 2.37 35.60 12.27 81.70 sdy 0.00 0.00 64.65 0.00 66201.60 0.00 1024.00 2.37 36.69 12.89 83.35 sdaa 0.00 0.00 62.15 0.00 63641.60 0.00 1024.00 2.17 34.92 13.56 84.25 sdm 0.00 0.00 63.90 0.00 65433.60 0.00 1024.00 1.96 30.63 13.05 83.36 sdv 0.00 0.00 62.75 0.00 64256.00 0.00 1024.00 2.26 36.05 12.86 80.72 sdu 0.00 0.00 64.15 0.00 65689.60 0.00 1024.00 2.32 36.25 13.08 83.93 sdw 0.00 0.00 66.45 0.00 68044.80 0.00 1024.00 2.25 33.89 12.47 82.86 sdg 0.00 0.00 62.80 0.00 64307.20 0.00 1024.00 2.39 38.07 13.43 84.37 sdj 0.00 0.00 64.45 0.00 65996.80 0.00 1024.00 2.14 33.04 13.31 85.81 sdi 0.00 0.00 65.45 0.00 67020.80 0.00 1024.00 2.02 30.76 12.47 81.65 sdl 0.00 0.00 64.05 0.00 65587.20 0.00 1024.00 2.10 32.67 12.64 80.97 sdk 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.28 34.90 13.01 84.53 sdf 0.00 0.00 65.35 0.00 66918.40 0.00 1024.00 2.59 39.63 12.90 84.31 sde 0.00 0.00 62.45 0.00 63948.80 0.00 1024.00 2.38 38.16 12.90 80.53 sdh 0.00 0.00 65.25 0.00 66816.00 0.00 1024.00 2.43 37.21 12.73 83.05 sdab 0.00 0.00 64.30 0.00 65843.20 0.00 1024.00 2.15 33.46 13.61 87.53 sdac 0.00 0.00 63.55 0.00 65075.20 0.00 1024.00 2.23 35.19 13.48 85.69 sdad 0.00 0.00 63.20 0.00 64716.80 0.00 1024.00 2.01 31.87 13.19 83.39 sdae 0.00 0.00 66.40 0.00 67993.60 0.00 1024.00 2.25 33.89 12.58 83.50 sdaf 0.00 0.00 62.90 0.00 64409.60 0.00 1024.00 1.89 30.10 12.73 80.09 sdag 0.00 0.00 63.95 0.00 65484.80 0.00 1024.00 2.08 32.68 13.12 83.93 sdah 0.00 0.00 63.25 0.00 64768.00 0.00 1024.00 2.18 34.56 13.00 82.22 sdai 0.00 0.00 64.20 0.00 65740.80 0.00 1024.00 2.07 32.21 12.40 79.60 sdaj 0.00 0.00 65.70 0.00 67276.80 0.00 1024.00 2.44 37.07 12.33 81.03 sdak 0.00 0.00 62.75 0.00 64256.00 0.00 1024.00 2.44 38.80 13.33 83.64 sdal 0.00 0.00 65.40 0.00 66969.60 0.00 1024.00 2.41 36.79 13.08 85.56 sdam 0.00 0.00 62.90 0.00 64409.60 0.00 1024.00 2.26 35.92 12.97 81.59 sdan 0.00 0.00 66.00 0.00 67584.00 0.00 1024.00 2.13 32.23 12.36 81.59 sdao 0.00 0.00 64.05 0.00 65587.20 0.00 1024.00 2.28 35.60 13.04 83.54 sdap 0.00 0.00 62.05 0.00 63539.20 0.00 1024.00 2.15 34.65 12.77 79.26 sdaq 0.00 0.00 66.30 0.00 67891.20 0.00 1024.00 2.27 34.21 12.95 85.87 sdar 0.00 0.00 64.40 0.00 65945.60 0.00 1024.00 2.17 33.60 13.43 86.51 sdas 0.00 0.00 65.80 0.00 67379.20 0.00 1024.00 2.24 33.90 12.42 81.74 sdat 0.00 0.00 62.30 0.00 63795.20 0.00 1024.00 1.98 31.74 13.24 82.46 sdau 0.00 0.00 65.25 0.00 66816.00 0.00 1024.00 2.00 30.52 12.21 79.66 sdav 0.00 0.00 63.75 0.00 65280.00 0.00 1024.00 2.05 32.12 12.43 79.27 sdaw 0.00 0.00 63.55 0.00 65075.20 0.00 1024.00 2.11 33.08 13.10 83.22 sdax 0.00 0.00 63.80 0.00 65331.20 0.00 1024.00 2.21 34.69 12.86 82.07 sday 0.00 0.00 63.55 0.00 65075.20 0.00 1024.00 2.39 37.70 13.13 83.41 sdaz 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.35 36.08 13.09 85.01 VxVM59000 0.00 0.00 3081.30 0.00 3155251.20 0.00 1024.00 104.68 33.97 0.32 100.00
vxstat throughput data collection method and result collection
An example of the vxstat command that we run on each node is below.
The blocks in the vxstat output are in units of sectors, so the block size is 512bytes.
Note that ‘blocks read / operations read’ gives the average I/O size:
2624512 BLOCKS READ / 2563 OPERATIONS READ / 2 = 512KB avg. read I/O size
The result in this example test is 63109120 blocks (512 byte sectors) read every 20 seconds
Therefore the result of the test is 1.504 GBytes/second
$ vxstat -g testdg -vd –I 20 OPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE Fri 27 Feb 2015 12:49:49 PM IST dm storagearray-0_16 2563 0 2624512 0 29.86 0.00 dm storagearray-0_17 2564 0 2625536 0 32.01 0.00 dm storagearray-0_18 2568 0 2629632 0 32.08 0.00 dm storagearray-0_20 2568 0 2629632 0 33.20 0.00 dm storagearray-1_6 2570 0 2631680 0 32.47 0.00 dm storagearray-1_7 2569 0 2630656 0 34.50 0.00 dm storagearray-1_8 2572 0 2633728 0 35.12 0.00 dm storagearray-1_9 2573 0 2634752 0 36.11 0.00 dm storagearray-2_5 2576 0 2637824 0 32.81 0.00 dm storagearray-2_6 2572 0 2633728 0 34.88 0.00 dm storagearray-2_7 2570 0 2631680 0 34.93 0.00 dm storagearray-2_8 2569 0 2630656 0 36.84 0.00 dm storagearray-3_4 2570 0 2631680 0 30.09 0.00 dm storagearray-3_6 2568 0 2629632 0 32.30 0.00 dm storagearray-3_7 2570 0 2631680 0 31.84 0.00 dm storagearray-3_8 2572 0 2633728 0 33.96 0.00 dm storagearray-4_8 2567 0 2628608 0 30.40 0.00 dm storagearray-4_9 2567 0 2628608 0 32.82 0.00 dm storagearray-4_10 2566 0 2627584 0 32.41 0.00 dm storagearray-4_11 2564 0 2625536 0 34.69 0.00 dm storagearray-5_8 2563 0 2624512 0 36.54 0.00 dm storagearray-5_9 2563 0 2624512 0 37.37 0.00 dm storagearray-5_10 2563 0 2624512 0 37.57 0.00 dm storagearray-5_11 2563 0 2624512 0 39.20 0.00 vol vol1 61630 0 63109120 0 33.92 0.00
portperfshow – FC switch port throughput data collection method and result collection
An example of the command used to collect the throughput at the switch port is below.
The portperfshow command reports the throughput for one switch, so two ‘portperfshow’ commands are executed, one for each FC switch.
The ‘portperfshow’ total is no use here, as we only want to collect the data for the specific ports that are connected to the host HBA FC ports.
In our test case this is port3 and port7. The other six ports are connected to the six modular storage arrays.
FC_switch1:admin> portperfshow 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... Total ============================================================================================================== 234.4m 237.3m 238.5m704.4m231.4m 242.6m 239.0m 717.7m 0 0 0 0 0 0 ... 2.8g FC_switch2:admin> portperfshow 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... Total =============================================================================================================== 236.7m 236.0m 237.8m708.0m231.5m 238.2m 232.8m715.5m 0 0 0 0 0 0 ... 2.8g
Therefore we have to add:
Switch1 port3 704.4 + port7 717.7 = 1422.1 MB/sec = 1.388769 Gbytes/sec
Switch2 port3 708.0 + port7 715.5 = 1423.5 MB/sec = 1.390137 Gbytes/sec
Total: = 2.778906 Gbytes/sec
NOTE:
Measuring the throughout at the switch port always shows a higher reading than the throughput measured by vxbench/vxstat/iostat.
The measurement at the switch port is higher due to 8b/10b encoding overhead.
The I/O throughput reading is therefore best measured by vxbench/vxstat/iostat and not at the FC switch port.
Referring to the “Fibre channel roadmap v1.8” table at http://fibrechannel.org/fibre-channel-roadmaps.html
The 8GFC throughput is 1600MB/sec for full duplex, therefore the net throughput for each direction will be 800MB/sec.
As the HBA is a dual port card the maximum theoretical throughput for each direction will be 1600MB/sec.
However, referring to http://en.wikipedia.org/wiki/Fibre_Channel shows 8GFC is actually 797MB/sec for each direction.
Therefore using our dual port card the maximum theoretical throughput for each direction will be 1594MB/sec (1.5566 GB/sec)
Therefore, per the specification, the maximum theoretical throughput in our environment will be 1.5566 GB/sec per node.
< 3. VxVM maximum disk I/O: Test results and conclusions >
Raw volume device disk I/O throughput test results summary in Gbits per second:
Test program: vxbench
IO : sequential read of raw volume
IOsize=1024K
VxVM volume stripe widths 64KB, 512KB and 1024KB
VxVM volume 24 columns
Processes: 64
Summary of raw volume throughput (Gbits/sec) | |||||
Stripe width | nodes | vxbench | iostat | Summary Gbits/sec | Recommended |
64k | 1 | 11.429 | 11.485 | 11.5 | |
64k | 2 | 19.457 | 19.543 | 19.5 | |
512k | 1 | 12.032 | 12.040 | 12.0 | YES |
512k | 2 | 20.552 | 20.557 | 20.5 | YES |
1024k | 1 | 12.029 | 12.037 | 12.0 | |
1024k | 2 | 20.341 | 20.331 | 20.3 |
Raw volume device disk I/O throughput detailed test results in GBytes per second:
vxbench GB/s | iostat GB/s | vxstat GB/s | FC Switch GB/s | ||||||||||
Stripe width | Nodes | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Switch | 2nd Switch | Total |
64k | 1 | 1.429 | 1.429 | 1.436 | 1.436 | 1.428 | 1.428 | 0.782 | 0.795 | 1.577 | |||
64k | 2 | 1.215 | 1.217 | 2.432 | 1.218 | 1.225 | 2.443 | 1.214 | 1.220 | 2.434 | 1.306 | 1.304 | 2.610 |
512k | 1 | 1.504 | 1.504 | 1.505 | 1.505 | 1.504 | 1.504 | 0.803 | 0.829 | 1.632 | |||
512k | 2 | 1.285 | 1.284 | 2.569 | 1.284 | 1.286 | 2.570 | 1.286 | 1.287 | 2.573 | 1.372 | 1.370 | 2.741 |
1024k | 1 | 1.504 | 1.504 | 1.505 | 1.505 | 1.505 | 1.505 | 0.817 | 0.813 | 1.629 | |||
1024k | 2 | 1.272 | 1.271 | 2.543 | 1.269 | 1.273 | 2.541 | 1.272 | 1.273 | 2.545 | 1.357 | 1.359 | 2.716 |
Conclusions and recommendations so far:
Maximum I/O size setting (RHEL6.5)
The default operating system maximum I/O size is 512KB, there is no need to change the operating system’s default maximum I/O size tunable values.
VxVM stripe width setting
The optimal VxVM stripe width for media server solutions is also 512KB, Veritas therefore recommend using VxVM stripe width of 512KB.
VxVM stripe columns setting
The hardware was configured to achieve maximum throughput when accessing all the available LUNs.
The number of LUNs available using our storage configuration was 24.
We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O bandwidth.
Balanced I/O
Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were able to achieve balanced I/O across all the LUNs (see the iostat output).
This then allowed us to easily identify the HBA bottleneck (using a single node) and storage bottlenecks (using both nodes).
Maximum achievable read I/O throughout using our hardware configuration
12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node:
using our hardware configuration, we identified the dual FC port HBA had a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is maximum throughput we can achieve from each node.
20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes:
using our hardware configuration, we identified the storage bottleneck of 20Gbits/sec (2.5Gbytes/sec)
Conclusion: From this point onwards we now know the maximum throughout achievable using our hardware configuration.
< 4. VxFS direct I/O maximum disk I/O>
Read throughput test execution
This VxFS direct I/O test mimics the VxVM raw disk test by performing direct I/O to one file that contains a single contiguous extent.
Thereby, all the vxbench processes begin reading from the same device offset.
This VxFS direct I/O test is therefore equivalent to the VxVM raw device test, only the starting offset into the device is different.
Here are the details of the file we created for this test:
# ls -li file1 4 -rw-r--r-- 1 root root 34359738368 Mar 3 14:25 file1 # ls -lhi file1 4 -rw-r--r-- 1 root root 32G Mar 3 14:25 file1 # du -h file1 32G file1
One file with a single contiguous extent of size 32GB:
# fsmap -HA ./file1Volume Extent Type File Offset Dev Offset Extent Size Inode#vol1 Data 0 Bytes 34359738368 32.00 GB 4
Here is how we created this file and performed this test, note that we strongly recommend a file system block size of 8192:
// mkfs
$ mkfs -t vxfs /dev/vx/rdsk/testdg/vol1 version 10 layout 107374182400 sectors, 6710886400 blocks of size 8192, log size 32768 blocks rcq size 8192 blocks largefiles supported maxlink supported
Note that for optimal read performance we recommend using the mount option of “noatime”.
The ‘noatime’ mount option prevents the inode access time being updated for every read operation.
// mount
$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
// create a file with a single 32Gb extent and write to it
$ touch /data1/file1 $ /opt/VRTS/bin/setext -r 4194304 -f contig /data1/file1 $ dd if=/dev/zero of=/data1/file1 bs=128k count=262144 262144+0 records in 262144+0 records out 34359738368 bytes (34 GB) copied, 24.0118 s, 1.4 GB/s $ /opt/VRTS/bin/fsmap -A /data1/file1 Volume Extent Type File Offset Dev Offset Extent Size Inode# vol1 Data 0 34359738368 34359738368 4 $ ls -lh /data1/file1 -rw-r--r-- 1 root root 32G Mar 3 14:12 /data1/file1
// umount the file system to clear the file data from memory
$ umount /data1
// mount the file system from both nodes
$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
// vxbench command execution, 64 processes reading from the same file using direct I/O
$./vxbench -w read -c direct -i iosize=1024k,iotime=300,maxfilesize=32G /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1
< 5. VxFS direct I/O maximum disk I/O: Test results>
As expected the results are the same as the VxVM raw disk read throughput test (all results in GBytes/second)
VxFS direct IO | vxbench | iostat | vxstat | FC Switch | |||||||||
Stripe width | Nodes | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st switch | 2nd switch | Total |
64k | 1 | 1.423 | 1.423 | 1.428 | 1.428 | 1.428 | 1.428 | 0.769 | 0.768 | 1.537 | |||
64k | 2 | 1.213 | 1.209 | 2.423 | 1.217 | 1.208 | 2.425 | 1.217 | 1.208 | 2.425 | 1.294 | 1.302 | 2.596 |
512k | 1 | 1.502 | 1.502 | 1.504 | 1.504 | 1.504 | 1.504 | 0.801 | 0.802 | 1.603 | |||
512k | 2 | 1.282 | 1.281 | 2.563 | 1.283 | 1.283 | 2.566 | 1.283 | 1.283 | 2.566 | 1.370 | 1.364 | 2.734 |
1024k | 1 | 1.502 | 1.502 | 1.502 | 1.502 | 1.502 | 1.502 | 0.802 | 0.802 | 1.604 | |||
1024k | 2 | 1.271 | 1.268 | 2.539 | 1.271 | 1.271 | 2.541 | 1.271 | 1.271 | 2.541 | 1.352 | 1.361 | 2.713 |
Using a stripe-width of 512KB is recommended by VERITAS for media server workloads.
The I/O is evenly balanced across all 24 LUNs. Below is the iostat output showing all 48 paths (1-node test):
IOstat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sde 0.00 0.00 65.25 0.00 66816.00 0.00 1024.00 2.26 34.55 12.69 82.80 sdf 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.32 36.19 13.34 85.18 sdg 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.23 34.36 13.30 86.39 sdh 0.00 0.00 63.45 0.00 64972.80 0.00 1024.00 2.09 32.92 12.82 81.34 sdi 0.00 0.00 66.50 0.00 68096.00 0.00 1024.00 2.13 32.10 12.76 84.83 sdj 0.00 0.00 62.90 0.00 64409.60 0.00 1024.00 2.19 34.79 13.83 86.98 sdk 0.00 0.00 64.15 0.00 65689.60 0.00 1024.00 2.27 35.41 13.41 86.04 sdl 0.00 0.00 65.00 0.00 66560.00 0.00 1024.00 2.23 34.35 12.72 82.65 sdm 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.12 32.69 13.03 84.61 sdn 0.00 0.00 66.60 0.00 68198.40 0.00 1024.00 2.21 33.34 12.66 84.32 sdo 0.00 0.00 62.95 0.00 64460.80 0.00 1024.00 1.94 30.84 12.92 81.36 sdp 0.00 0.00 61.20 0.00 62668.80 0.00 1024.00 1.93 31.63 13.12 80.28 sdq 0.00 0.00 62.95 0.00 64460.80 0.00 1024.00 1.98 31.52 13.15 82.80 sdr 0.00 0.00 65.70 0.00 67276.80 0.00 1024.00 2.23 33.94 12.65 83.10 sds 0.00 0.00 66.40 0.00 67993.60 0.00 1024.00 2.25 33.97 13.16 87.41 sdt 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.20 34.40 13.47 85.97 sdu 0.00 0.00 62.60 0.00 64102.40 0.00 1024.00 2.32 37.00 13.60 85.17 sdv 0.00 0.00 65.00 0.00 66560.00 0.00 1024.00 2.36 36.23 12.79 83.13 sdw 0.00 0.00 62.65 0.00 64153.60 0.00 1024.00 2.20 35.17 13.05 81.75 sdx 0.00 0.00 64.85 0.00 66406.40 0.00 1024.00 2.50 38.48 13.12 85.11 sdy 0.00 0.00 62.80 0.00 64307.20 0.00 1024.00 1.84 29.32 12.63 79.30 sdz 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.13 32.81 12.73 82.45 sdaa 0.00 0.00 62.20 0.00 63692.80 0.00 1024.00 1.95 31.29 12.58 78.24 sdab 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.00 31.35 13.22 84.43 sdac 0.00 0.00 61.75 0.00 63232.00 0.00 1024.00 1.98 31.93 13.00 80.28 sdad 0.00 0.00 65.25 0.00 66816.00 0.00 1024.00 2.18 33.30 13.15 85.79 sdae 0.00 0.00 64.10 0.00 65638.40 0.00 1024.00 2.22 34.64 13.11 84.05 sdaf 0.00 0.00 63.25 0.00 64768.00 0.00 1024.00 2.08 32.85 12.67 80.12 sdag 0.00 0.00 62.95 0.00 64460.80 0.00 1024.00 2.19 34.82 12.93 81.40 sdah 0.00 0.00 64.30 0.00 65843.20 0.00 1024.00 2.31 36.00 13.21 84.95 sdai 0.00 0.00 63.30 0.00 64819.20 0.00 1024.00 2.22 35.10 13.36 84.54 sdak 0.00 0.00 65.35 0.00 66918.40 0.00 1024.00 2.13 32.52 12.42 81.17 sdal 0.00 0.00 62.55 0.00 64051.20 0.00 1024.00 2.17 34.68 12.67 79.22 sdaj 0.00 0.00 64.80 0.00 66355.20 0.00 1024.00 2.15 33.23 12.53 81.22 sdam 0.00 0.00 61.90 0.00 63385.60 0.00 1024.00 2.19 35.28 13.44 83.18 sdan 0.00 0.00 64.40 0.00 65945.60 0.00 1024.00 2.31 35.80 12.98 83.58 sdaq 0.00 0.00 66.10 0.00 67686.40 0.00 1024.00 2.46 37.17 12.69 83.87 sdao 0.00 0.00 65.55 0.00 67123.20 0.00 1024.00 2.28 34.79 12.84 84.16 sdap 0.00 0.00 63.50 0.00 65024.00 0.00 1024.00 2.44 38.47 13.48 85.58 sdar 0.00 0.00 64.55 0.00 66099.20 0.00 1024.00 2.40 37.14 13.59 87.73 sdas 0.00 0.00 65.80 0.00 67379.20 0.00 1024.00 2.10 32.03 12.92 85.01 sdat 0.00 0.00 63.25 0.00 64768.00 0.00 1024.00 2.01 31.74 12.76 80.72 sdau 0.00 0.00 65.75 0.00 67328.00 0.00 1024.00 1.98 30.15 12.60 82.86 sdav 0.00 0.00 63.25 0.00 64768.00 0.00 1024.00 2.10 33.10 13.19 83.44 sdaw 0.00 0.00 63.30 0.00 64819.20 0.00 1024.00 1.97 31.20 13.32 84.32 sdax 0.00 0.00 61.85 0.00 63334.40 0.00 1024.00 2.01 32.52 13.27 82.08 sday 0.00 0.00 65.35 0.00 66918.40 0.00 1024.00 1.92 29.36 12.36 80.74 sdaz 0.00 0.00 67.15 0.00 68761.60 0.00 1024.00 2.07 30.88 12.08 81.15 VxVM40000 0.00 0.00 3078.65 0.00 3152537.60 0.00 1024.00 103.77 33.71 0.32 100.00
< 6. VxVM raw disk and VxFS direct I/O>
Results comparison and conclusions
Raw volume device disk I/O throughput test results in Gbytes/sec :
VxVM RAW IO | vxbench | iostat | vxstat | FC Switch | |||||||||
Stripe width | Nodes | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Switch | 2nd Switch | Total |
64k | 1 | 1.429 | 1.429 | 1.436 | 1.436 | 1.428 | 1.428 | 0.782 | 0.795 | 1.577 | |||
64k | 2 | 1.215 | 1.217 | 2.432 | 1.218 | 1.225 | 2.443 | 1.214 | 1.220 | 2.434 | 1.306 | 1.304 | 2.610 |
512k | 1 | 1.504 | 1.504 | 1.505 | 1.505 | 1.504 | 1.504 | 0.803 | 0.829 | 1.632 | |||
512k | 2 | 1.285 | 1.284 | 2.569 | 1.284 | 1.286 | 2.570 | 1.286 | 1.287 | 2.573 | 1.372 | 1.370 | 2.741 |
1024k | 1 | 1.504 | 1.504 | 1.505 | 1.505 | 1.505 | 1.505 | 0.817 | 0.813 | 1.629 | |||
1024k | 2 | 1.272 | 1.271 | 2.543 | 1.269 | 1.273 | 2.541 | 1.272 | 1.273 | 2.545 | 1.357 | 1.359 | 2.716 |
VxFS direct I/O disk I/O throughput test results in Gbytes/sec :
VxFS direct IO | vxbench | iostat | vxstat | FC Switch | |||||||||
Stripe width | Nodes | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st Node | 2nd Node | Total | 1st switch | 2nd switch | Total |
64k | 1 | 1.423 | 1.423 | 1.428 | 1.428 | 1.428 | 1.428 | 0.769 | 0.768 | 1.537 | |||
64k | 2 | 1.213 | 1.209 | 2.423 | 1.217 | 1.208 | 2.425 | 1.217 | 1.208 | 2.425 | 1.294 | 1.302 | 2.596 |
512k | 1 | 1.502 | 1.502 | 1.504 | 1.504 | 1.504 | 1.504 | 0.801 | 0.802 | 1.603 | |||
512k | 2 | 1.282 | 1.281 | 2.563 | 1.283 | 1.283 | 2.566 | 1.283 | 1.283 | 2.566 | 1.370 | 1.364 | 2.734 |
1024k | 1 | 1.502 | 1.502 | 1.502 | 1.502 | 1.502 | 1.502 | 0.802 | 0.802 | 1.604 | |||
1024k | 2 | 1.271 | 1.268 | 2.539 | 1.271 | 1.271 | 2.541 | 1.271 | 1.271 | 2.541 | 1.352 | 1.361 | 2.713 |
Conclusion: The test results show that VxFS direct I/O does not degrade sequential read I/O throughput performance compared to raw disk.
By creating a file system and creating a file with a single contiguous extent we could emulate the raw disk read throughput using VxFS direct I/O
Each direct I/O read will fetch data from disk, so no buffering is being performed using either direct I/O or raw disk I/O.
Using VxFS direct I/O and running an identical vxbench test, we hit the same maximum achievable read I/O throughout.
Therefore the sequential read throughput was not impacted using VxFS direct I/O compared to reading from VxVM raw disk.
< 7. VxFS buffered I/O maximum disk I/O throughput test>
Test execution
This VxFS buffered I/O test is different. For the buffered read I/O throughout test, each process needs to read from a different file.
To prepare the files for this test we pre-allocate 16GB of file system space to each file, then write to the files to increase their file size to 16GB.
To pre-create the 64 files for this test the following script can used. The script assumes an 8192 byte file system block size is being used.
mkdir /data1/primarymkdir /data1/secondaryfor n in `seq 1 64`do touch /data1/primary/file${n}; /opt/VRTS/bin/setext -r 2097152 -f contig /data1/primary/file${n}; dd if=/dev/zero of=/data1/primary/file${n} bs=128k count=131072 & touch /data1/secondary/file${n}; /opt/VRTS/bin/setext -r 2097152 -f contig /data1/secondary/file${n}; dd if=/dev/zero of=/data1/secondary/file${n} bs=128k count=131072 &done
When this script has finished some of the file data will remain in memory, before we run our buffered I/O test we need to remove the file data from memory.
Note that for improved read performance you can also use the “noatime” mount option.
The ‘noatime’ mount option prevents the inode access time being updated for every read operation.
We did not use the “noatime” mount option in our test.
To remove the file data from memory the file system can be umounted and mounted again.
Alternatively, a simple trick can be used to remove the file data from memory before each test run by using the “remount” mount option, as follows:
// mount
$ mount -t vxfs -o remount,noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1
Again we are using vxbench to perform our test. This time however we need to explicitly stipulate the path to each separate file on the vxbench command line, as shown below.
Note also that the iosize argument has been changed, we are no longer reading using a 1024KB block size; in our VxFS buffered I/O test we are reading using a 32KB block size, because a smaller read(2) iosize will be used in the media server solution implementation.
# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64
< 8. VxFS buffered I/O max disk I/O throughput test >
Tests and test results and individual test conclusions
All the tests in this entire report read from disk using sequential read I/O.
VxFS readahead is required
The greatest impact to the performance of sequential reads from disk when using VxFS/CFS buffered I/O is readahead.
File system readahead utilizes the file system page cache to asynchronously pre-fetch file data into memory, this logically benefits sequential read I/O performance.
Our buffered I/O sequential read performance tests demonstrate the impact of readahead and highlight how tuning readahead can avoid a potential imbalance in throughput between processes.
Readahead is tunable using the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.
The VxVM volume configuration will impact readahead
We have already determined, in our earlier testing above, that the optimal VxVM stripe-width to maximize the I/O throughput is 512KB running our test.
In our storage configuration we created 24 LUNs across 6 modular arrays, by striping across all 24 LUNs we can balance the I/O across the LUNs and maximize the overall storage bandwidth.
Using this optimal volume configuration we could easily identify two bottlenecks, one due to the FC HBA ports (a per-node bottleneck) and the other bottleneck in the storage itself.
However the volume stripe width and the number of columns (LUNs) in the volume are also used to auto-tune the values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.
VxFS readahead tunables – default values
When mounting a VxFS file system it will auto-tune values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables. These two tunables are used to tune VxFS readahead.
The value for read_pref_io will be set to the VxVM volume stripe width – therefore the default auto-tuned value is read_pref_io=524288 in our test.
The value for read_nstream will be the number of columns (LUNs) in the volume – therefore the default auto-tuned value is read_nstream=24 in our test.
VxFS picks the default values for these tunables from the VxVM volume configuration.
This means read_pref_io=524288 and read_nstream=24 will be set by default by VxFS at mount time using our volume configuration.
VxFS readahead tunables – maximum amount file data that will be pre-fetched
The maximum amount of file data that is pre-fetched from disk using read_ahead is determined by read_pref_io*read_nstream.
Therefore, by default, the maximum amount of read_ahead will be “512KB * 24 = 12MB” using our volume configuration.
As we will see during the buffered I/O testing, pre-fetching 12MB of file data is too much readahead, we found this caused an imbalance in read I/O throughput between processes.
VxFS readahead tunable – read_pref_io
The VxFS read_pref_io tunable is set to the VxVM volume stripe-width by default. The tunable means the “preferred read I/O size”.
VxFS readahead will be triggered by two sequential read I/O’s. The amount of file data to pre-fetch from disk is increased as more sequential I/O’s are performed.
As mentioned above, the maximum amount of readahead (the maximum amount of file data to pre-fetch from disk) is read_pref_io*read_nstream.
However the maximum I/O request size submitted by VxFS to VxVM will be ‘read_pref_io’. Therefore read_pref_io is the maximum read I/O request size submitted to VxVM.
What does it mean if read_pref_io is set to 512KB:
If (for example) we read a file using the ‘dd’ command and use a dd block size of 8KB, then VxFS readahead will pre-fetch the file data using I/O requests of size 512KB to VxVM.
Readahead can therefore result in a smaller number of I/O’s and a larger I/O request size, thus improving read I/O performance.
Veritas do not recommend tuning ‘read_pref_io’ from its default auto-tuned value.
If a different value (other than the default value) for ‘read_pref_io’ is desired, then Veritas recommend changing the volume stripe width instead.
VxFS readahead tunable – read_nstream
The read_nstream value defaults to the number of columns in the VxVM volume.
As mentioned above, the maximum amount of readahead (the maximum amount of file data to pre-fetch from disk) is read_pref_io*read_nstream
To reduce the maximum amount of read_ahead simply reduce the value of read_nstream, please see the results of our tests using different values for read_nstream below.
The best practice for tuning readahead is as follows:
Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead.
Reduce read_nstream to reduce the amount of readahead
You could disable readahead if necessary, but this will usually be a disadvantage (see test4).
Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.
Summary:
By performing sequential reads using VxFS buffered I/O and performing readahead, the application I/O size is effectively converted to read_pref_io sized requests to VxVM.
So there are two performance benefits of readahead, one is to pre-fetch file data from disk, the other is to increase the I/O size of the read request from disk (so reducing the number of I/O’s).
These buffered I/O throughput tests will therefore help you decide what stripe-width, number of columns and readahead tuning is best for your solution implementation.
Also, these buffered I/O throughput tests will help you to determine how many running processes you will want to be reading from disk at the same time.
Buffered I/O tests:
We have chosen a volume configuration that was best for disk I/O performance, however this volume configuration also results in very aggressive read_ahead (12MB at maximum).
With a stripe_width of 512KB and 24 LUNs (columns) the default maximum read_ahead is therefore too aggressive.
TEST1: Use the default auto-tuned settings, using one node: <this is the baseline test>
Baseline vxbench test – 64files/64processess/32KB block size
Default auto-tuning – read_ahead enabled/read_nstream=24/read_pref_io=524288
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 24
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.062 sec 48868.77 KB/s cpu: 9.78 sys 0.08 user user 2: 300.102 sec 48370.93 KB/s cpu: 9.78 sys 0.06 user user 3: 300.042 sec 48094.01 KB/s cpu: 9.86 sys 0.08 user user 4: 300.176 sec 4461.92 KB/s cpu: 1.01 sys 0.00 user user 5: 300.105 sec 4584.10 KB/s cpu: 1.12 sys 0.00 user user 6: 300.102 sec 48125.32 KB/s cpu: 9.85 sys 0.08 user user 7: 300.031 sec 48341.50 KB/s cpu: 9.79 sys 0.07 user user 8: 300.201 sec 4583.81 KB/s cpu: 1.12 sys 0.01 user user 9: 300.194 sec 4582.32 KB/s cpu: 1.14 sys 0.00 user user 10: 300.203 sec 4755.40 KB/s cpu: 1.19 sys 0.00 user user 11: 300.126 sec 48121.38 KB/s cpu: 9.74 sys 0.08 user user 12: 300.220 sec 4500.70 KB/s cpu: 1.01 sys 0.00 user user 13: 300.201 sec 4665.25 KB/s cpu: 1.11 sys 0.00 user user 14: 300.086 sec 48291.58 KB/s cpu: 9.74 sys 0.07 user user 15: 300.165 sec 4501.41 KB/s cpu: 1.01 sys 0.01 user user 16: 300.203 sec 4633.57 KB/s cpu: 1.16 sys 0.00 user user 17: 300.147 sec 48159.06 KB/s cpu: 9.64 sys 0.08 user user 18: 300.035 sec 48504.56 KB/s cpu: 9.41 sys 0.08 user user 19: 300.078 sec 48497.65 KB/s cpu: 9.73 sys 0.07 user user 20: 300.161 sec 48238.58 KB/s cpu: 9.66 sys 0.08 user user 21: 300.136 sec 48201.71 KB/s cpu: 9.74 sys 0.08 user user 22: 300.193 sec 4705.78 KB/s cpu: 1.21 sys 0.00 user user 23: 300.086 sec 48045.94 KB/s cpu: 9.86 sys 0.07 user user 24: 300.062 sec 47926.93 KB/s cpu: 9.69 sys 0.08 user user 25: 300.198 sec 4460.09 KB/s cpu: 1.11 sys 0.01 user user 26: 300.207 sec 4623.79 KB/s cpu: 1.09 sys 0.00 user user 27: 300.215 sec 4582.00 KB/s cpu: 1.01 sys 0.00 user user 28: 300.125 sec 48203.53 KB/s cpu: 9.70 sys 0.08 user user 29: 300.141 sec 48323.77 KB/s cpu: 9.65 sys 0.07 user user 30: 300.212 sec 4705.48 KB/s cpu: 1.20 sys 0.00 user user 31: 300.153 sec 48485.59 KB/s cpu: 9.72 sys 0.07 user user 32: 300.163 sec 48033.68 KB/s cpu: 9.66 sys 0.07 user user 33: 300.160 sec 48525.35 KB/s cpu: 9.82 sys 0.07 user user 34: 300.144 sec 4624.56 KB/s cpu: 1.09 sys 0.01 user user 35: 300.102 sec 48002.47 KB/s cpu: 9.60 sys 0.07 user user 36: 300.203 sec 4821.38 KB/s cpu: 1.18 sys 0.01 user user 37: 300.006 sec 48072.18 KB/s cpu: 9.64 sys 0.07 user user 38: 300.219 sec 4746.29 KB/s cpu: 1.15 sys 0.00 user user 39: 300.213 sec 4701.73 KB/s cpu: 1.18 sys 0.00 user user 40: 300.176 sec 4460.00 KB/s cpu: 1.13 sys 0.00 user user 41: 300.207 sec 4583.50 KB/s cpu: 1.05 sys 0.00 user user 42: 300.213 sec 4624.56 KB/s cpu: 1.03 sys 0.00 user user 43: 300.049 sec 48789.10 KB/s cpu: 9.87 sys 0.08 user user 44: 300.207 sec 4708.85 KB/s cpu: 1.18 sys 0.00 user user 45: 300.077 sec 48129.27 KB/s cpu: 9.59 sys 0.07 user user 46: 300.079 sec 48374.66 KB/s cpu: 9.74 sys 0.07 user user 47: 300.099 sec 48494.28 KB/s cpu: 9.64 sys 0.09 user user 48: 300.064 sec 48581.86 KB/s cpu: 9.47 sys 0.08 user user 49: 300.199 sec 4705.78 KB/s cpu: 1.10 sys 0.00 user user 50: 300.204 sec 4788.64 KB/s cpu: 1.20 sys 0.01 user user 51: 300.032 sec 9044.38 KB/s cpu: 1.94 sys 0.02 user user 52: 300.120 sec 47917.67 KB/s cpu: 9.69 sys 0.06 user user 53: 300.128 sec 48407.76 KB/s cpu: 9.56 sys 0.07 user user 54: 300.203 sec 4746.24 KB/s cpu: 1.07 sys 0.00 user user 55: 300.201 sec 4460.37 KB/s cpu: 1.02 sys 0.01 user user 56: 300.206 sec 4623.49 KB/s cpu: 1.11 sys 0.00 user user 57: 300.212 sec 4664.43 KB/s cpu: 1.09 sys 0.00 user user 58: 300.212 sec 4664.76 KB/s cpu: 1.06 sys 0.00 user user 59: 300.211 sec 4623.52 KB/s cpu: 1.04 sys 0.01 user user 60: 300.206 sec 4623.80 KB/s cpu: 1.08 sys 0.01 user user 61: 300.133 sec 12111.95 KB/s cpu: 2.64 sys 0.02 user user 62: 300.035 sec 9945.29 KB/s cpu: 2.15 sys 0.01 user user 63: 300.195 sec 4583.47 KB/s cpu: 1.13 sys 0.00 user user 64: 300.047 sec 48093.15 KB/s cpu: 9.80 sys 0.09 user total: 300.220 sec 1578817.49 KB/s cpu: 323.53 sys 2.31 user
Conclusion to TEST1: <this is our baseline test, using the default auto-tuned values, read_nstream is therefore set to its default value of 24>
This test ran for 300.220 seconds and read from disk at an average rate of 1578817.49 KB/sec, vxbench therefore read 452 GB of data from disk.
The throughput per process is very imbalanced, some processes achieved ~49000 KB/sec others processes only achieved ~4800 KB/sec
However the maximum possible read I/O throughput from one node is still being achieved 1578817.49 KB/sec = 1.506 GB/sec
The problem is not the total throughput, the problem is the maximum readahead per process is 12MB at a time
12MB of readahead (read_pref_io*read_nstream) is too aggressive and is causing an imbalance of throughout between processes.
This readahead configuration is therefore a failure, too much readahead is causing an imbalance of throughput between the processes.
We do not want to change the value of read_pref_io because we want to request large I/O sizes for better performance.
By default the VxFS read_pref_io tunable is set to the VxVM volume stripe-width, in our test this value is 512KB.
By default the VxFS read_nstream tunable is set to the number of columns in the VxVM volume, in our test this value is 24 (we have 24 LUNs).
Next, we therefore want to experiment by setting smaller values of read_nstream and also test with read_ahead disabled as well.
Our goal is to maintain the maximum amount of total throughput (approx. 1.5Gbytes/sec) whilst also spreading this throughput evenly between all the active processes reading from disk.
TEST2: change read_nstream to 1, keep everything else the same as the baseline test.
vxbench – 64files/64processess/32KB block size
Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=1
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 1
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.013 sec 24639.76 KB/s cpu: 5.35 sys 0.05 user user 2: 300.044 sec 24748.27 KB/s cpu: 5.41 sys 0.06 user user 3: 300.010 sec 24706.66 KB/s cpu: 5.52 sys 0.06 user user 4: 300.021 sec 24872.94 KB/s cpu: 5.46 sys 0.05 user user 5: 300.023 sec 24724.40 KB/s cpu: 5.58 sys 0.05 user user 6: 300.060 sec 24683.79 KB/s cpu: 5.58 sys 0.06 user user 7: 300.021 sec 24744.96 KB/s cpu: 5.66 sys 0.06 user user 8: 300.016 sec 24680.46 KB/s cpu: 5.49 sys 0.06 user user 9: 300.017 sec 24784.51 KB/s cpu: 5.55 sys 0.06 user user 10: 300.021 sec 24744.97 KB/s cpu: 5.54 sys 0.05 user user 11: 300.015 sec 24747.12 KB/s cpu: 5.54 sys 0.06 user user 12: 300.017 sec 24830.60 KB/s cpu: 5.46 sys 0.05 user user 13: 300.013 sec 24824.11 KB/s cpu: 5.61 sys 0.05 user user 14: 300.028 sec 24729.11 KB/s cpu: 5.57 sys 0.05 user user 15: 300.017 sec 24752.09 KB/s cpu: 5.42 sys 0.06 user user 16: 300.028 sec 24655.71 KB/s cpu: 5.53 sys 0.06 user user 17: 300.013 sec 24834.38 KB/s cpu: 5.68 sys 0.05 user user 18: 300.048 sec 24773.52 KB/s cpu: 5.52 sys 0.07 user user 19: 300.024 sec 24697.01 KB/s cpu: 5.50 sys 0.07 user user 20: 300.012 sec 24938.48 KB/s cpu: 5.61 sys 0.06 user user 21: 300.016 sec 24646.33 KB/s cpu: 5.54 sys 0.06 user user 22: 300.016 sec 24689.11 KB/s cpu: 5.57 sys 0.05 user user 23: 300.019 sec 24695.60 KB/s cpu: 5.50 sys 0.06 user user 24: 300.023 sec 24719.31 KB/s cpu: 5.59 sys 0.05 user user 25: 300.015 sec 24755.66 KB/s cpu: 5.58 sys 0.05 user user 26: 300.018 sec 24596.75 KB/s cpu: 5.59 sys 0.07 user user 27: 300.049 sec 24717.11 KB/s cpu: 5.54 sys 0.08 user user 28: 300.019 sec 24753.74 KB/s cpu: 5.59 sys 0.06 user user 29: 300.021 sec 24214.23 KB/s cpu: 5.44 sys 0.06 user user 30: 300.021 sec 24772.27 KB/s cpu: 5.61 sys 0.05 user user 31: 300.019 sec 24908.96 KB/s cpu: 5.68 sys 0.05 user user 32: 300.045 sec 24637.23 KB/s cpu: 5.53 sys 0.06 user user 33: 300.053 sec 24677.55 KB/s cpu: 5.59 sys 0.05 user user 34: 300.017 sec 24692.39 KB/s cpu: 5.60 sys 0.07 user user 35: 300.018 sec 24787.86 KB/s cpu: 5.55 sys 0.06 user user 36: 300.019 sec 24741.70 KB/s cpu: 5.57 sys 0.07 user user 37: 300.015 sec 24813.68 KB/s cpu: 5.52 sys 0.06 user user 38: 300.014 sec 24808.66 KB/s cpu: 5.40 sys 0.06 user user 39: 300.013 sec 24716.57 KB/s cpu: 5.53 sys 0.06 user user 40: 300.024 sec 24705.55 KB/s cpu: 5.54 sys 0.06 user user 41: 300.039 sec 24796.47 KB/s cpu: 5.50 sys 0.05 user user 42: 300.044 sec 24852.33 KB/s cpu: 5.60 sys 0.05 user user 43: 300.044 sec 24836.97 KB/s cpu: 5.59 sys 0.06 user user 44: 300.028 sec 24735.94 KB/s cpu: 5.54 sys 0.05 user user 45: 300.060 sec 24803.28 KB/s cpu: 5.71 sys 0.05 user user 46: 300.019 sec 24830.57 KB/s cpu: 5.54 sys 0.07 user user 47: 300.052 sec 24587.20 KB/s cpu: 5.57 sys 0.05 user user 48: 300.020 sec 24750.25 KB/s cpu: 5.54 sys 0.06 user user 49: 300.016 sec 24675.38 KB/s cpu: 5.53 sys 0.04 user user 50: 300.020 sec 24704.09 KB/s cpu: 5.52 sys 0.06 user user 51: 300.035 sec 24716.59 KB/s cpu: 5.37 sys 0.06 user user 52: 300.049 sec 24700.04 KB/s cpu: 5.54 sys 0.05 user user 53: 300.022 sec 24818.32 KB/s cpu: 5.40 sys 0.06 user user 54: 300.014 sec 24725.01 KB/s cpu: 5.50 sys 0.06 user user 55: 300.026 sec 24683.17 KB/s cpu: 5.57 sys 0.05 user user 56: 300.058 sec 24786.37 KB/s cpu: 5.65 sys 0.04 user user 57: 300.022 sec 24850.79 KB/s cpu: 5.61 sys 0.06 user user 58: 300.021 sec 24702.35 KB/s cpu: 5.38 sys 0.06 user user 59: 300.015 sec 24735.30 KB/s cpu: 5.59 sys 0.06 user user 60: 300.027 sec 24840.10 KB/s cpu: 5.58 sys 0.05 user user 61: 300.021 sec 24687.03 KB/s cpu: 5.58 sys 0.06 user user 62: 300.021 sec 24799.55 KB/s cpu: 5.61 sys 0.06 user user 63: 300.011 sec 24744.08 KB/s cpu: 5.42 sys 0.05 user user 64: 300.015 sec 24655.08 KB/s cpu: 5.54 sys 0.06 user total: 300.061 sec 1582992.52 KB/s cpu: 354.62 sys 3.65 user
Conclusion to TEST2: <read_nstream set to 1>
Using read_nstream=1 produces a perfect balance in throughput per process (~24700 KB/sec), so now all the process have the same consistent throughput during the test:
The maximum total throughput from one node is still being achieved (1582992.52 KB/s), approx. 1.5 GB/sec
The total throughput is now divided evenly across all 64 processes and remains consistent throughout the test.
The average read I/O size is obviously still 512KB (avgrq-sz = 1024.00), this is because read_pref_io is set to 512KB.
The I/O is obviously evenly balanced across all 24 LUNs ( see r/s and rsec/s in the iostat output below)
Most importantly the I/O throughput is now evenly balanced across all 64 processes, yet the total throughput remains the same.
The maximum readahead per process is now 512KB
The throughput per process is now therefore balanced, all 64 processes are now consitently performing approx. 24700 KB/s – perfect!!
Please note:
If the throughput per process had not been evenly distributed using read_nstream=1, then we would recommend reducing the stripe-width to 256KB or 128KB
Reducing the stripe-width will reduce the default value of “read_pref_io”.
We do not advise tuning read_pref_io to override its default value, we recommend tuning the VxVM volume stripe-width instead.
# iostat –x 20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sde 0.00 0.00 63.90 0.00 65433.60 0.00 1024.00 2.88 45.11 15.37 98.20 sdf 0.00 0.00 64.35 0.00 65894.40 0.00 1024.00 2.88 44.76 15.08 97.06 sdg 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.91 44.88 15.17 98.19 sdh 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.90 44.61 15.01 97.48 sdi 0.00 0.00 64.80 0.00 66355.20 0.00 1024.00 2.64 40.74 14.97 97.00 sdj 0.00 0.00 65.40 0.00 66969.60 0.00 1024.00 2.70 41.32 14.85 97.11 sdk 0.00 0.00 65.05 0.00 66611.20 0.00 1024.00 2.73 42.02 14.87 96.71 sdl 0.00 0.00 63.30 0.00 64819.20 0.00 1024.00 2.61 41.25 15.39 97.39 sdm 0.00 0.00 64.50 0.00 66048.00 0.00 1024.00 2.77 42.91 15.14 97.67 sdn 0.00 0.00 64.85 0.00 66406.40 0.00 1024.00 2.79 43.06 14.89 96.58 sdo 0.00 0.00 63.10 0.00 64614.40 0.00 1024.00 2.66 42.17 15.23 96.12 sdp 0.00 0.00 65.45 0.00 67020.80 0.00 1024.00 2.80 42.74 14.97 97.97 sdq 0.00 0.00 64.00 0.00 65536.00 0.00 1024.00 2.66 41.63 15.10 96.61 sdr 0.00 0.00 64.55 0.00 66099.20 0.00 1024.00 2.72 42.14 15.14 97.70 sds 0.00 0.00 64.20 0.00 65740.80 0.00 1024.00 2.65 41.27 15.09 96.91 sdt 0.00 0.00 64.85 0.00 66406.40 0.00 1024.00 2.75 42.45 14.94 96.85 sdu 0.00 0.00 64.65 0.00 66201.60 0.00 1024.00 2.66 41.25 15.05 97.30 sdv 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.64 41.33 15.18 96.90 sdw 0.00 0.00 64.25 0.00 65792.00 0.00 1024.00 2.63 41.00 15.12 97.17 sdx 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.68 41.32 14.87 96.55 sdy 0.00 0.00 64.00 0.00 65536.00 0.00 1024.00 2.71 42.18 15.04 96.26 sdz 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.70 42.16 15.21 97.14 sdaa 0.00 0.00 64.80 0.00 66355.20 0.00 1024.00 2.68 41.35 15.08 97.71 sdab 0.00 0.00 65.05 0.00 66611.20 0.00 1024.00 2.70 41.53 15.03 97.80 sdac 0.00 0.00 64.15 0.00 65689.60 0.00 1024.00 2.57 40.17 15.02 96.34 sdad 0.00 0.00 63.50 0.00 65024.00 0.00 1024.00 2.56 40.34 15.23 96.69 sdae 0.00 0.00 64.00 0.00 65536.00 0.00 1024.00 2.57 40.21 15.08 96.51 sdaf 0.00 0.00 65.65 0.00 67225.60 0.00 1024.00 2.65 40.35 14.77 96.97 sdag 0.00 0.00 65.25 0.00 66816.00 0.00 1024.00 2.95 45.29 15.03 98.04 sdah 0.00 0.00 64.50 0.00 66048.00 0.00 1024.00 2.91 45.07 15.16 97.75 sdai 0.00 0.00 64.25 0.00 65792.00 0.00 1024.00 2.88 44.77 15.22 97.79 sdak 0.00 0.00 64.85 0.00 66406.40 0.00 1024.00 2.64 40.69 14.89 96.56 sdal 0.00 0.00 64.50 0.00 66048.00 0.00 1024.00 2.66 41.21 15.16 97.80 sdaj 0.00 0.00 64.05 0.00 65587.20 0.00 1024.00 2.90 45.20 15.21 97.45 sdam 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.64 40.75 15.04 97.39 sdan 0.00 0.00 64.05 0.00 65587.20 0.00 1024.00 2.63 41.15 15.09 96.68 sdaq 0.00 0.00 64.15 0.00 65689.60 0.00 1024.00 2.74 42.72 15.23 97.68 sdao 0.00 0.00 64.75 0.00 66304.00 0.00 1024.00 2.74 42.36 15.02 97.26 sdap 0.00 0.00 64.95 0.00 66508.80 0.00 1024.00 2.81 43.19 14.91 96.87 sdar 0.00 0.00 63.85 0.00 65382.40 0.00 1024.00 2.74 42.83 15.30 97.67 sdas 0.00 0.00 64.20 0.00 65740.80 0.00 1024.00 2.67 41.61 15.19 97.53 sdat 0.00 0.00 65.05 0.00 66611.20 0.00 1024.00 2.69 41.29 14.99 97.53 sdau 0.00 0.00 64.85 0.00 66406.40 0.00 1024.00 2.66 41.04 14.87 96.41 sdav 0.00 0.00 64.00 0.00 65536.00 0.00 1024.00 2.65 41.37 15.03 96.17 sdaw 0.00 0.00 64.55 0.00 66099.20 0.00 1024.00 2.82 43.67 15.25 98.45 sdax 0.00 0.00 64.05 0.00 65587.20 0.00 1024.00 2.82 44.11 15.14 96.96 sday 0.00 0.00 65.85 0.00 67430.40 0.00 1024.00 2.88 43.75 14.85 97.81 sdaz 0.00 0.00 63.65 0.00 65177.60 0.00 1024.00 2.80 43.95 15.34 97.66 VxVM56000 0.00 0.00 3094.90 0.00 3169177.60 0.00 1024.00 131.05 42.35 0.32 100.00
TEST3: change read_nstream to 1, read from 16 files using 16 processes, keep everything else the same as the baseline test.
vxbench – 16files/16processess/32KB block size
Tuning – read_ahead enabled/read_nstream=1/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=1
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 1
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1# ./vxbench -w read -i iosize=32k,iotime=120,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 user 1: 120.030 sec 97417.75 KB/s cpu: 7.43 sys 0.07 user user 2: 120.037 sec 98452.82 KB/s cpu: 7.68 sys 0.09 user user 3: 120.033 sec 98302.87 KB/s cpu: 7.55 sys 0.08 user user 4: 120.031 sec 98227.29 KB/s cpu: 7.37 sys 0.08 user user 5: 120.030 sec 98381.88 KB/s cpu: 7.89 sys 0.05 user user 6: 120.033 sec 98272.61 KB/s cpu: 7.42 sys 0.07 user user 7: 120.032 sec 97744.74 KB/s cpu: 7.70 sys 0.08 user user 8: 120.037 sec 98069.12 KB/s cpu: 7.74 sys 0.10 user user 9: 120.030 sec 98603.74 KB/s cpu: 7.79 sys 0.06 user user 10: 120.036 sec 98756.87 KB/s cpu: 7.82 sys 0.07 user user 11: 120.037 sec 98513.11 KB/s cpu: 7.78 sys 0.10 user user 12: 120.040 sec 98360.81 KB/s cpu: 7.80 sys 0.08 user user 13: 120.030 sec 98488.47 KB/s cpu: 7.48 sys 0.09 user user 14: 120.030 sec 98241.64 KB/s cpu: 7.50 sys 0.09 user user 15: 120.039 sec 97824.57 KB/s cpu: 7.76 sys 0.09 user user 16: 120.032 sec 98700.71 KB/s cpu: 7.42 sys 0.09 user total: 120.041 sec 1572267.32 KB/s cpu: 122.13 sys 1.29 user
Conclusion to TEST3: <read_nstream to 1, read from 16 files using 16 processes>
Using read_nstream=1 produces a perfect balance in throughput per process (98000 KB/sec), so all process still have an equal amount of throughput:
The maximum total throughput from one node is still being achieved (1572267.32 KB/s) with 16 processes, this is approx. 1.5 GB/sec
The total throughput is now divided evenly across all 16 processes, so the throughput per-process is higher using less processes
Most importantly the I/O throughput is now evenly balanced across all 16 processes, yet the total throughput remains the same.
The maximum readahead per process is still 512KB, this amount of readahead provides perfectly balanced throughput per process in our test.
The throughput per process is now therefore balanced, all 16 processes are now performing approx. 98000 KB/s – perfect!!
Please note:
The throughput per process is now much higher using 16 processes rather than 64 processes.
The number of processes reduced by a factor of 4 in test3, so the throughput per process increased by a factor of 4 in test3, but the total throughput is unchanged.
It is therefore very important to consider the number of running processes that will be reading from disk at the same time, as the available throughput will be evenly distributed between these processes.
TEST4: disable readahead, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead disabled/read_nstream=24/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=24,read_ahead=0
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 24
read_ahead = 0
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.011 sec 12246.06 KB/s cpu: 7.68 sys 0.07 user user 2: 300.009 sec 11192.53 KB/s cpu: 6.96 sys 0.07 user user 3: 300.010 sec 11619.25 KB/s cpu: 7.35 sys 0.06 user user 4: 300.014 sec 11551.35 KB/s cpu: 7.30 sys 0.07 user user 5: 300.015 sec 11563.46 KB/s cpu: 7.19 sys 0.08 user user 6: 300.007 sec 12257.53 KB/s cpu: 7.65 sys 0.10 user user 7: 300.008 sec 11638.53 KB/s cpu: 7.34 sys 0.09 user user 8: 300.007 sec 11449.44 KB/s cpu: 7.26 sys 0.09 user user 9: 300.014 sec 12062.17 KB/s cpu: 7.50 sys 0.08 user user 10: 300.008 sec 11544.21 KB/s cpu: 7.18 sys 0.08 user user 11: 300.012 sec 11442.10 KB/s cpu: 7.22 sys 0.10 user user 12: 300.012 sec 11666.33 KB/s cpu: 7.34 sys 0.07 user user 13: 300.007 sec 11740.63 KB/s cpu: 7.38 sys 0.07 user user 14: 300.015 sec 11528.29 KB/s cpu: 7.32 sys 0.07 user user 15: 300.009 sec 11616.83 KB/s cpu: 7.31 sys 0.08 user user 16: 300.008 sec 12253.34 KB/s cpu: 7.54 sys 0.07 user user 17: 300.013 sec 11727.19 KB/s cpu: 7.36 sys 0.07 user user 18: 300.009 sec 11700.54 KB/s cpu: 7.36 sys 0.07 user user 19: 300.008 sec 12245.63 KB/s cpu: 7.70 sys 0.09 user user 20: 300.007 sec 11757.38 KB/s cpu: 7.42 sys 0.08 user user 21: 300.007 sec 11242.93 KB/s cpu: 7.10 sys 0.06 user user 22: 300.012 sec 11589.92 KB/s cpu: 7.23 sys 0.08 user user 23: 300.008 sec 12262.93 KB/s cpu: 7.56 sys 0.09 user user 24: 300.007 sec 11756.85 KB/s cpu: 7.41 sys 0.08 user user 25: 300.014 sec 12086.92 KB/s cpu: 7.48 sys 0.08 user user 26: 300.011 sec 12001.58 KB/s cpu: 7.54 sys 0.07 user user 27: 300.012 sec 12096.78 KB/s cpu: 7.60 sys 0.10 user user 28: 300.017 sec 11550.08 KB/s cpu: 7.27 sys 0.08 user user 29: 300.011 sec 11734.28 KB/s cpu: 7.24 sys 0.09 user user 30: 300.011 sec 11962.11 KB/s cpu: 7.51 sys 0.08 user user 31: 300.014 sec 12128.16 KB/s cpu: 7.50 sys 0.08 user user 32: 300.011 sec 11725.32 KB/s cpu: 7.38 sys 0.10 user user 33: 300.009 sec 11371.62 KB/s cpu: 7.18 sys 0.06 user user 34: 300.009 sec 12041.25 KB/s cpu: 7.62 sys 0.07 user user 35: 300.008 sec 11980.36 KB/s cpu: 7.48 sys 0.08 user user 36: 300.015 sec 11908.75 KB/s cpu: 7.51 sys 0.07 user user 37: 300.010 sec 11432.46 KB/s cpu: 7.12 sys 0.08 user user 38: 300.014 sec 11796.37 KB/s cpu: 7.48 sys 0.06 user user 39: 300.008 sec 11824.77 KB/s cpu: 7.43 sys 0.08 user user 40: 300.014 sec 12077.29 KB/s cpu: 7.57 sys 0.07 user user 41: 300.012 sec 11564.45 KB/s cpu: 7.29 sys 0.08 user user 42: 300.015 sec 11583.94 KB/s cpu: 7.28 sys 0.05 user user 43: 300.015 sec 11874.83 KB/s cpu: 7.45 sys 0.08 user user 44: 300.010 sec 12142.53 KB/s cpu: 7.54 sys 0.08 user user 45: 300.015 sec 11335.74 KB/s cpu: 7.05 sys 0.09 user user 46: 300.011 sec 11915.63 KB/s cpu: 7.43 sys 0.08 user user 47: 300.014 sec 12259.67 KB/s cpu: 7.56 sys 0.10 user user 48: 300.010 sec 11405.71 KB/s cpu: 7.12 sys 0.08 user user 49: 300.010 sec 11862.76 KB/s cpu: 7.34 sys 0.07 user user 50: 300.014 sec 11556.89 KB/s cpu: 7.28 sys 0.07 user user 51: 300.010 sec 12149.05 KB/s cpu: 7.49 sys 0.08 user user 52: 300.010 sec 11384.38 KB/s cpu: 7.11 sys 0.10 user user 53: 300.008 sec 11414.31 KB/s cpu: 7.09 sys 0.07 user user 54: 300.016 sec 11336.45 KB/s cpu: 7.09 sys 0.07 user user 55: 300.017 sec 12173.06 KB/s cpu: 7.57 sys 0.08 user user 56: 300.011 sec 11808.63 KB/s cpu: 7.33 sys 0.06 user user 57: 300.011 sec 12277.61 KB/s cpu: 7.55 sys 0.08 user user 58: 300.008 sec 11529.39 KB/s cpu: 7.14 sys 0.07 user user 59: 300.010 sec 12021.34 KB/s cpu: 7.44 sys 0.07 user user 60: 300.011 sec 11499.74 KB/s cpu: 7.18 sys 0.07 user user 61: 300.010 sec 12001.73 KB/s cpu: 7.55 sys 0.06 user user 62: 300.011 sec 11978.65 KB/s cpu: 7.52 sys 0.08 user user 63: 300.008 sec 11540.61 KB/s cpu: 7.20 sys 0.09 user user 64: 300.009 sec 11221.21 KB/s cpu: 7.06 sys 0.07 user total: 300.017 sec 753196.79 KB/s cpu: 471.23 sys 4.95 user
iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sde 0.00 0.00 504.35 0.00 32278.40 0.00 64.00 0.51 1.02 0.79 39.97 sdf 0.00 0.00 501.55 0.00 32099.20 0.00 64.00 0.44 0.88 0.70 35.15 sdg 0.00 0.00 507.70 0.00 32492.80 0.00 64.00 0.52 1.02 0.79 40.20 sdh 0.00 0.00 496.70 0.00 31788.80 0.00 64.00 0.55 1.10 0.81 40.10 sdi 0.00 0.00 502.40 0.00 32153.60 0.00 64.00 0.47 0.95 0.76 38.42 sdj 0.00 0.00 499.70 0.00 31980.80 0.00 64.00 0.62 1.24 0.91 45.48 sdk 0.00 0.00 502.80 0.00 32179.20 0.00 64.00 0.46 0.91 0.72 36.22 sdl 0.00 0.00 503.90 0.00 32249.60 0.00 64.00 0.47 0.93 0.76 38.05 sdm 0.00 0.00 501.25 0.00 32080.00 0.00 64.00 0.49 0.99 0.78 39.16 sdn 0.00 0.00 504.10 0.00 32262.40 0.00 64.00 0.91 1.80 1.11 55.75 sdo 0.00 0.00 497.20 0.00 31820.80 0.00 64.00 3.51 7.07 1.96 97.30 sdp 0.00 0.00 496.50 0.00 31776.00 0.00 64.00 0.44 0.90 0.73 36.06 sdq 0.00 0.00 505.40 0.00 32345.60 0.00 64.00 0.67 1.32 0.93 47.04 sdr 0.00 0.00 503.40 0.00 32217.60 0.00 64.00 0.60 1.19 0.85 42.86 sds 0.00 0.00 502.65 0.00 32169.60 0.00 64.00 3.46 6.88 1.92 96.49 sdt 0.00 0.00 501.35 0.00 32086.40 0.00 64.00 0.46 0.92 0.75 37.40 sdu 0.00 0.00 511.30 0.00 32723.20 0.00 64.00 0.60 1.18 0.85 43.55 sdv 0.00 0.00 502.70 0.00 32172.80 0.00 64.00 0.76 1.52 1.07 53.56 sdw 0.00 0.00 502.45 0.00 32156.80 0.00 64.00 3.73 7.42 1.96 98.52 sdx 0.00 0.00 503.10 0.00 32198.40 0.00 64.00 3.70 7.36 1.92 96.67 sdy 0.00 0.00 501.15 0.00 32073.60 0.00 64.00 0.47 0.93 0.73 36.65 sdz 0.00 0.00 506.15 0.00 32393.60 0.00 64.00 3.47 6.85 1.92 97.28 sdaa 0.00 0.00 505.45 0.00 32348.80 0.00 64.00 3.53 6.99 1.95 98.45 sdab 0.00 0.00 507.10 0.00 32454.40 0.00 64.00 0.51 1.00 0.78 39.43 sdac 0.00 0.00 504.25 0.00 32272.00 0.00 64.00 0.46 0.92 0.74 37.47 sdad 0.00 0.00 506.30 0.00 32403.20 0.00 64.00 0.61 1.21 0.89 45.24 sdae 0.00 0.00 500.80 0.00 32051.20 0.00 64.00 0.48 0.97 0.75 37.80 sdaf 0.00 0.00 501.70 0.00 32108.80 0.00 64.00 0.51 1.01 0.81 40.70 sdag 0.00 0.00 497.90 0.00 31865.60 0.00 64.00 0.49 0.98 0.76 37.72 sdah 0.00 0.00 499.20 0.00 31948.80 0.00 64.00 0.47 0.95 0.75 37.24 sdai 0.00 0.00 493.50 0.00 31584.00 0.00 64.00 0.54 1.09 0.83 40.93 sdak 0.00 0.00 505.10 0.00 32326.40 0.00 64.00 0.66 1.30 0.93 47.05 sdal 0.00 0.00 504.60 0.00 32294.40 0.00 64.00 0.61 1.21 0.86 43.54 sdaj 0.00 0.00 504.60 0.00 32294.40 0.00 64.00 0.53 1.06 0.79 40.05 sdam 0.00 0.00 505.85 0.00 32374.40 0.00 64.00 3.46 6.85 1.91 96.67 sdan 0.00 0.00 506.60 0.00 32422.40 0.00 64.00 0.47 0.92 0.73 36.80 sdaq 0.00 0.00 497.65 0.00 31849.60 0.00 64.00 3.54 7.11 1.97 97.96 sdao 0.00 0.00 500.25 0.00 32016.00 0.00 64.00 0.43 0.87 0.70 34.90 sdap 0.00 0.00 497.00 0.00 31808.00 0.00 64.00 3.43 6.91 1.96 97.41 sdar 0.00 0.00 494.80 0.00 31667.20 0.00 64.00 0.52 1.05 0.81 40.03 sdas 0.00 0.00 497.45 0.00 31836.80 0.00 64.00 0.61 1.23 0.90 44.75 sdat 0.00 0.00 507.25 0.00 32464.00 0.00 64.00 0.75 1.49 1.03 52.22 sdau 0.00 0.00 503.20 0.00 32204.80 0.00 64.00 3.74 7.44 1.96 98.61 sdav 0.00 0.00 506.55 0.00 32419.20 0.00 64.00 3.69 7.28 1.92 97.06 sdaw 0.00 0.00 498.65 0.00 31913.60 0.00 64.00 0.46 0.93 0.75 37.21 sdax 0.00 0.00 497.60 0.00 31846.40 0.00 64.00 0.90 1.80 1.14 56.59 sday 0.00 0.00 503.15 0.00 32201.60 0.00 64.00 3.53 7.02 1.94 97.52 sdaz 0.00 0.00 504.45 0.00 32284.80 0.00 64.00 0.44 0.88 0.72 36.44 VxVM56000 0.00 0.00 24109.05 0.00 1542979.20 0.00 64.00 62.84 2.61 0.04 100.00
Conclusion to TEST4: <read_ahead disabled>
The maximum read I/O throughput from one node is NOT being achieved, approx. 0.72 GBytes/sec.
The throughput for all 64 processes is balanced but is now much lower per process, they are now only performing approx. 12000 KB/s.
By disabling readahead the total throughput has halved.
All the read I/O is synchronous read I/O using a 32KB I/O request size.
The iostat above shows 64 sectors (32KB) as the average I/O size for all LUN paths –
avgrq-sz 64.00
Because readahead is disabled we are no longer submitting read_pref_io sized requests.
Instead we are submitting a 32KB read request size, because this is the I/O size that vxbench is using.
TEST5: change read_nstream to 6, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead enabled/read_nstream=6/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=6
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 6
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.008 sec 26677.91 KB/s cpu: 5.16 sys 0.05 user user 2: 300.107 sec 26689.61 KB/s cpu: 5.25 sys 0.04 user user 3: 300.116 sec 26596.61 KB/s cpu: 4.97 sys 0.04 user user 4: 300.031 sec 26716.80 KB/s cpu: 4.98 sys 0.05 user user 5: 300.089 sec 26680.92 KB/s cpu: 5.19 sys 0.05 user user 6: 300.072 sec 26631.30 KB/s cpu: 5.01 sys 0.04 user user 7: 300.099 sec 26843.86 KB/s cpu: 5.21 sys 0.04 user user 8: 300.091 sec 26762.65 KB/s cpu: 5.20 sys 0.04 user user 9: 300.074 sec 26784.68 KB/s cpu: 5.17 sys 0.04 user user 10: 300.076 sec 26774.26 KB/s cpu: 5.07 sys 0.05 user user 11: 300.062 sec 26785.71 KB/s cpu: 4.97 sys 0.04 user user 12: 300.027 sec 14609.45 KB/s cpu: 2.90 sys 0.02 user user 13: 300.035 sec 26675.62 KB/s cpu: 5.21 sys 0.05 user user 14: 300.101 sec 9641.12 KB/s cpu: 1.95 sys 0.01 user user 15: 300.066 sec 26897.99 KB/s cpu: 4.93 sys 0.04 user user 16: 300.027 sec 26645.46 KB/s cpu: 5.09 sys 0.04 user user 17: 300.016 sec 26677.21 KB/s cpu: 5.19 sys 0.04 user user 18: 300.020 sec 26636.02 KB/s cpu: 5.25 sys 0.05 user user 19: 300.012 sec 26728.77 KB/s cpu: 4.98 sys 0.05 user user 20: 300.081 sec 18732.43 KB/s cpu: 3.46 sys 0.04 user user 21: 300.008 sec 26729.13 KB/s cpu: 5.22 sys 0.04 user user 22: 300.087 sec 26701.62 KB/s cpu: 5.16 sys 0.04 user user 23: 300.083 sec 14616.98 KB/s cpu: 2.86 sys 0.01 user user 24: 300.085 sec 26926.99 KB/s cpu: 5.02 sys 0.03 user user 25: 300.031 sec 26542.74 KB/s cpu: 5.16 sys 0.05 user user 26: 300.101 sec 26608.19 KB/s cpu: 5.02 sys 0.06 user user 27: 300.112 sec 26760.74 KB/s cpu: 5.28 sys 0.03 user user 28: 300.050 sec 26674.13 KB/s cpu: 5.20 sys 0.04 user user 29: 300.058 sec 19430.05 KB/s cpu: 3.79 sys 0.03 user user 30: 300.062 sec 26703.79 KB/s cpu: 5.24 sys 0.04 user user 31: 300.079 sec 26692.03 KB/s cpu: 5.18 sys 0.05 user user 32: 300.078 sec 19572.11 KB/s cpu: 3.75 sys 0.03 user user 33: 300.014 sec 26872.00 KB/s cpu: 5.25 sys 0.05 user user 34: 300.035 sec 26593.60 KB/s cpu: 4.86 sys 0.04 user user 35: 300.011 sec 26554.73 KB/s cpu: 5.17 sys 0.04 user user 36: 300.065 sec 26713.74 KB/s cpu: 5.23 sys 0.05 user user 37: 300.011 sec 26687.96 KB/s cpu: 5.18 sys 0.04 user user 38: 300.034 sec 26696.03 KB/s cpu: 5.30 sys 0.04 user user 39: 300.046 sec 18888.19 KB/s cpu: 3.62 sys 0.03 user user 40: 300.019 sec 26656.42 KB/s cpu: 5.18 sys 0.04 user user 41: 300.039 sec 26685.39 KB/s cpu: 5.08 sys 0.05 user user 42: 300.041 sec 14332.34 KB/s cpu: 2.85 sys 0.02 user user 43: 300.112 sec 26863.12 KB/s cpu: 5.27 sys 0.04 user user 44: 300.008 sec 26667.66 KB/s cpu: 5.07 sys 0.05 user user 45: 300.060 sec 26949.71 KB/s cpu: 5.05 sys 0.04 user user 46: 300.021 sec 26635.77 KB/s cpu: 5.07 sys 0.05 user user 47: 300.052 sec 26817.32 KB/s cpu: 5.26 sys 0.03 user user 48: 300.110 sec 26760.94 KB/s cpu: 5.19 sys 0.04 user user 49: 300.096 sec 7747.49 KB/s cpu: 1.56 sys 0.00 user user 50: 300.116 sec 14676.80 KB/s cpu: 2.93 sys 0.03 user user 51: 300.026 sec 26737.82 KB/s cpu: 5.13 sys 0.06 user user 52: 300.027 sec 26737.80 KB/s cpu: 5.05 sys 0.05 user user 53: 300.044 sec 26777.10 KB/s cpu: 4.96 sys 0.04 user user 54: 300.017 sec 26769.30 KB/s cpu: 5.13 sys 0.04 user user 55: 300.024 sec 26799.31 KB/s cpu: 5.33 sys 0.05 user user 56: 300.102 sec 26720.72 KB/s cpu: 5.17 sys 0.05 user user 57: 300.043 sec 26807.85 KB/s cpu: 5.04 sys 0.03 user user 58: 300.055 sec 26868.24 KB/s cpu: 4.91 sys 0.04 user user 59: 300.047 sec 26879.17 KB/s cpu: 5.24 sys 0.05 user user 60: 300.070 sec 26907.83 KB/s cpu: 5.32 sys 0.04 user user 61: 300.055 sec 26786.37 KB/s cpu: 5.02 sys 0.05 user user 62: 300.097 sec 16684.09 KB/s cpu: 3.18 sys 0.03 user user 63: 300.093 sec 14063.68 KB/s cpu: 2.79 sys 0.01 user user 64: 300.024 sec 26635.57 KB/s cpu: 4.90 sys 0.04 user total: 300.117 sec 1572798.88 KB/s cpu: 302.31 sys 2.53 user
Conclusion to TEST5: <read_nstream set to 6>
The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s
The throughput per process is still imbalanced.
The maximum amount of readahead per process is 3MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes).
TEST6: change read_nstream to 12, keep everything else the same as the baseline test.
vxbench – 64files/64procs/32KB block size
Tuning – read_ahead enabled/read_nstream=12/read_pref_io=524288
# vxtunefs /data1 -o read_nstream=12
UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1
# vxtunefs /data1
Filesystem I/O parameters for /data1
read_pref_io = 524288
read_nstream = 12
read_ahead = 1
# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64 user 1: 300.152 sec 4957.28 KB/s cpu: 0.94 sys 0.00 user user 2: 300.133 sec 4896.28 KB/s cpu: 0.92 sys 0.00 user user 3: 300.068 sec 32767.44 KB/s cpu: 6.28 sys 0.05 user user 4: 300.090 sec 32949.22 KB/s cpu: 6.12 sys 0.05 user user 5: 300.067 sec 5041.22 KB/s cpu: 0.95 sys 0.01 user user 6: 300.139 sec 4855.57 KB/s cpu: 0.89 sys 0.00 user user 7: 300.048 sec 32872.02 KB/s cpu: 6.31 sys 0.05 user user 8: 300.129 sec 4610.50 KB/s cpu: 0.89 sys 0.00 user user 9: 300.146 sec 4855.02 KB/s cpu: 0.92 sys 0.00 user user 10: 300.015 sec 32855.14 KB/s cpu: 6.17 sys 0.05 user user 11: 300.040 sec 32811.44 KB/s cpu: 6.20 sys 0.06 user user 12: 300.069 sec 4897.65 KB/s cpu: 0.92 sys 0.00 user user 13: 300.013 sec 32793.85 KB/s cpu: 6.30 sys 0.06 user user 14: 300.082 sec 32806.79 KB/s cpu: 6.31 sys 0.04 user user 15: 300.033 sec 32914.55 KB/s cpu: 6.36 sys 0.04 user user 16: 300.067 sec 32726.51 KB/s cpu: 6.33 sys 0.05 user user 17: 300.057 sec 32604.74 KB/s cpu: 6.30 sys 0.05 user user 18: 300.140 sec 4753.19 KB/s cpu: 0.93 sys 0.00 user user 19: 300.090 sec 32703.57 KB/s cpu: 6.29 sys 0.06 user user 20: 300.030 sec 32914.89 KB/s cpu: 6.30 sys 0.06 user user 21: 300.005 sec 32835.71 KB/s cpu: 6.37 sys 0.05 user user 22: 300.103 sec 32845.42 KB/s cpu: 6.27 sys 0.05 user user 23: 300.061 sec 32993.42 KB/s cpu: 6.30 sys 0.06 user user 24: 300.152 sec 4732.43 KB/s cpu: 0.89 sys 0.01 user user 25: 300.067 sec 32501.34 KB/s cpu: 6.34 sys 0.05 user user 26: 300.162 sec 4794.20 KB/s cpu: 0.91 sys 0.00 user user 27: 300.006 sec 32651.33 KB/s cpu: 6.36 sys 0.05 user user 28: 300.067 sec 32767.47 KB/s cpu: 6.38 sys 0.05 user user 29: 300.147 sec 4791.47 KB/s cpu: 0.93 sys 0.01 user user 30: 300.020 sec 32711.16 KB/s cpu: 6.31 sys 0.04 user user 31: 300.151 sec 5113.69 KB/s cpu: 0.92 sys 0.00 user user 32: 300.017 sec 14987.11 KB/s cpu: 2.89 sys 0.02 user user 33: 300.028 sec 32689.88 KB/s cpu: 6.38 sys 0.06 user user 34: 300.136 sec 4856.04 KB/s cpu: 0.91 sys 0.00 user user 35: 300.146 sec 4794.78 KB/s cpu: 0.91 sys 0.00 user user 36: 300.005 sec 32712.86 KB/s cpu: 6.19 sys 0.05 user user 37: 300.100 sec 32927.68 KB/s cpu: 6.37 sys 0.04 user user 38: 300.048 sec 32994.80 KB/s cpu: 6.34 sys 0.04 user user 39: 300.010 sec 32630.41 KB/s cpu: 6.27 sys 0.04 user user 40: 300.054 sec 32768.91 KB/s cpu: 6.32 sys 0.05 user user 41: 300.019 sec 33100.39 KB/s cpu: 6.17 sys 0.04 user user 42: 300.066 sec 32726.68 KB/s cpu: 6.38 sys 0.05 user user 43: 300.035 sec 33221.54 KB/s cpu: 6.34 sys 0.06 user user 44: 300.008 sec 32692.06 KB/s cpu: 6.32 sys 0.06 user user 45: 300.025 sec 33181.65 KB/s cpu: 6.38 sys 0.05 user user 46: 300.146 sec 4773.57 KB/s cpu: 0.90 sys 0.00 user user 47: 300.108 sec 33172.52 KB/s cpu: 6.27 sys 0.04 user user 48: 300.073 sec 32766.86 KB/s cpu: 6.30 sys 0.06 user user 49: 300.007 sec 32814.98 KB/s cpu: 6.36 sys 0.06 user user 50: 300.050 sec 32933.22 KB/s cpu: 6.35 sys 0.05 user user 51: 300.026 sec 33038.23 KB/s cpu: 6.35 sys 0.05 user user 52: 300.087 sec 32970.03 KB/s cpu: 6.38 sys 0.07 user user 53: 300.022 sec 32833.90 KB/s cpu: 6.09 sys 0.05 user user 54: 300.091 sec 32990.10 KB/s cpu: 6.32 sys 0.04 user user 55: 300.075 sec 32991.84 KB/s cpu: 6.34 sys 0.06 user user 56: 300.075 sec 32909.93 KB/s cpu: 6.37 sys 0.04 user user 57: 300.059 sec 4778.89 KB/s cpu: 0.90 sys 0.00 user user 58: 300.062 sec 32993.34 KB/s cpu: 6.12 sys 0.06 user user 59: 300.064 sec 33156.83 KB/s cpu: 6.38 sys 0.05 user user 60: 300.079 sec 33011.92 KB/s cpu: 6.38 sys 0.05 user user 61: 300.044 sec 33097.62 KB/s cpu: 6.36 sys 0.05 user user 62: 300.049 sec 4774.69 KB/s cpu: 0.90 sys 0.00 user user 63: 300.142 sec 4897.10 KB/s cpu: 0.92 sys 0.00 user user 64: 300.039 sec 33221.04 KB/s cpu: 6.33 sys 0.04 user total: 300.163 sec 1581364.62 KB/s cpu: 302.20 sys 2.33 user
Conclusion to TEST6: <read_nstream set to 12>
The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s
The throughput per process is imbalanced.
The maximum amount of readahead per process is MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes).
Graphics for buffered I/O tests
The graphs below show the results of the tests running 64 processes (only Test3, which runs 16 processes, is excluded from the graphs). The second graph simply joins the dots for each process; each test uses a different colour. The graphs clearly show that only read_nstream=1 (Test2) and read_ahead off (Test4) provide an evenly balanced throughput across all 64 processes. However when read_ahead is disabled the throughput is much lower.
Therefore, in our test, read_nstream=1 (dark blue in the graphics) is clearly the correct value because the throughput is evenly balanced across all 64 processes and the maximum throughput is still achieved.
< 9. Final conclusions and best practices for optimizing sequential read I/O workloads>
To maximize the sequential read I/O throughout, maintain evenly balanced I/O across all the LUNs and balance the throughput across the active reading processes, we identified the following configuration for our test environment:
512KB VxVM stripe width (for the optimum I/O size reading from disk)
24 LUNs and 24 columns in our VxVM volume (to use maximum storage bandwidth)
Leave read_pref_io set to the default value of 524288 (max I/O size using readahead)
Reduce read_nstream from a default value of 24 to a value of 1 (to reduce the maximum amount of data to pre-fetch in one go using readahead)
The best practices for sequential read media server solution configurations are as follows:
Set up your hardware so that the maximum I/O bandwidth can be achieved.
We did not change the operating system maximum I/O size, we kept the default of 512KB.
Ensure that your I/O is balanced evenly across all your LUNs by using VxVM striped volumes
We found a VxVM stripe-width of 512KB is optimal, different stripe-widths can be tested, a stripe-width greater than 1024KB is not required.
We created 24 LUNs that maximized access to the storage arrays, we therefore created our VxVM volume with 24 columns to maximize the bandwidth to the storage arrays.
During this process identify any bottlenecks in your HBA cards and storage, begin with a single node, the bottlenecks will give you the maximum throughput you can achieve in your environment.
If VxVM mirroring had been required in our configuration then 12 LUNs would be used in each mirror.
As reads can come from either mirror the read I/O throughput should not be impacted by mirroring, because we are still reading from all 24 LUNs, however writes will be impacted.
The value of read_pref_io is the read I/O request size that VxFS readahead will submit to VxVM, we want a larger I/O size for performance (read_pref_io is set to the stripe-width).
Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead.
Using higher read_nstream values produced an imbalance in throughput between the different processes performing disk read I/O, this is due to overly aggressive read_ahead
No matter what value of read_nstream we used, we always hit the FC HBA Card throughput bottleneck of approximately 1.5GBytes/sec
The larger the value of read_nstream the more aggressive read_ahead becomes, and the greater the imbalance in read throughput between the different processes
Reduce read_nstream to reduce the amount of readahead. We found read_nstream=1 provided a perfect balance in throughout between processes.
Do not disable readahead unless absolutely necessary as sequential read performance will be impacted.
Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.
If this information is considered useful we will provide a second report for media server workload testing that explains sequential write I/O and some more best practices for balancing the throughput across processes performing a combination of read and write I/O.
Best regards
Veritas IA Engineering team
Server h/w configuration information: <2 nodes>
System
# dmidecode -q -t 1|head -5 System Information Manufacturer: HP Product Name: ProLiant DL380p Gen8
CPU
# dmidecode -q -t 4|grep -e Processor -e Socket -e Manufacturer -e Version -e "Current Speed" -e Core -e Thread|grep -v Upgrade Processor Information Socket Designation: Proc 1 Type: Central Processor Manufacturer: Intel Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz Current Speed: 2200 MHz Core Count: 8 Core Enabled: 8 Thread Count: 16 Processor Information Socket Designation: Proc 2 Type: Central Processor Manufacturer: Intel Version: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz Current Speed: 2200 MHz Core Count: 8 Core Enabled: 8 Thread Count: 16
Memory
# dmidecode -q -t 17|grep Size|grep -v "No Module Installed"|awk 'BEGIN{memsize=0}{memsize=memsize+$2}END{print memsize, $3}' 98304 MB # dmidecode -q -t 17|grep -e Speed -e Type|grep -v Detail|sort|uniq|grep -v Unknown Configured Clock Speed: 1600 MHz Speed: 1600 MHz Type: DDR3