I would suggest some explicit socket buffer size settings. For netperf
it would be something along the lines of:
for s in 32 64 128 256
do
netperf -H -- -m 64K -s $sK -S $sK
done
Perhaps a lot of this will not apply to your situation, but some will.
Here are some of my checklist items for when you are presented with
assertions of poor network performance, in no particular order:
- Is *any one* CPU on either end of the transfer at or close to 100%
utilization? A given TCP connection cannot really take advantage of more
than the services of a single core in the system, so average CPU utilization
being low does not a priori mean things are OK.
- Are there TCP retransmissions being registered in netstat statistics on
the sending system? Take a snapshot of netstat -s -t from just
before the transfer and one from just after, and run it through beforeafter
from
ftp://ftp.cup.hp.com/dist/networking/tools:
netstat -s -t > before
transfer or wait 60 or so seconds if the transfer was already going
netstat -s -t > after
beforeafter before after > delta
- Are there packet drops registered in ethtool -S statistics on
either side of the transfer? Take snapshots in a manner similar to that
with netstat.
- Are there packet drops registered in the stats for the switch(es) being
traversed by the transfer? These would be retrieved via switch-specific
means.
- What is the latency between the two end points? Install netperf
on both sides, start netserver on one side and on the other side run:
netperf -t TCP_RR -l 30 -H
and invert the transaction/s rate to get the RTT latency. There are
caveats involving NIC interrupt coalescing settings defaulting in favor of
throughput/CPU utilization over latency:
ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt
but when the connections are over a WAN, latency is important and may not be
clouded as much by NIC settings.
This all leads into: