Monday, March 24, 2014

1

Hi,

First, great job on Mininet. It's a wonderful tool, and I think
replicating the results of CS papers is extremely important.

Second, I've been having trouble replicating the dctcp experiments, as
described in
http://reproducingnetworkresearch.wordpress.com/2012/06/09/dctcp-2/

With 3.2 and 3.2.54 kernels, it doesn't appear as if the ECN marking
is occuring. The bottleneck queue length has the same oscillation
between 200 and 425 packets for reno, reno+ecn, and dctcp. I added an
execution of 'tc -s qdisc' on the switch at the end of dctcp.py and it
confirms that no packets are being marked.

The behavior improves somewhat with a 3.6 kernel (the patch required
little modification up to that point in the series). At this point I
see reno+ecn working to keep the bottleneck queue length below 30
packets. But dctcp still doesn't appear to work even though stats show
the switch is marking packets.  I have also uncommented the printks
marking the transition between CE=0 and CE=1 states in the ACK
generation state machine, but see nothing in dmesg.

Do you have any insights into what might be going wrong?

Sometimes I worry that my laptop isn't fast enough, but see below.

Thank you for any information you might share,

Andrew Shewmaker

My laptop specs are:

2GHz 8-core i7 laptop w/ 8GB RAM
Fedora 18
custom 3.2, 3.2.54, and 3.6 kernels + dctcp patch
openvswitch 2.0
mininet from git

'sudo mn --test=iperf' yields:
550+ Mbps on 3.2.x
750+ Mbps on 3.6
10+ Gbps on 3.10+ (tso on)
1+ Gbps on 3.10+ (tso off, gso on)
550+ Mbps on 3.10+ (tso/gso off)

'sudo mn --link=tc,bw=100 --test=iperf' yields
78+ Mbps on 3.2.x
90+ Mbps on 3.10+

And those rates decrease again for the dctcp.py experiment:
7-16 Mbps on 3.2.x
20-30 Mbps on 3.6+
These don't seem fast enough to cause congestion, but the bottleneck
queue length zigzags between 200 and 425 packets, and I see the
regular reno cwnd response.

-- Andrew Shewmaker

3 comments:

  1. Hi Andrew,

    The README (https://bitbucket.org/nikhilh/mininet_tests/src/ad08368cf347/dctcp/README) suggests using a 3.2.18 kernel. Have you tried the same version? You may find these deb packages useful: http://www.scs.stanford.edu/~jvimal/kernels. You can try using "alien" utility to convert the deb to rpm. IIRC, these were the kernel binaries we gave Stanford students for DCTCP related experiments.

    The TCP code in the kernel changes very often, and DCTCP hasn't been merged into mainline, so it would take a lot of time to debug the issue. Try the above kernel and let us know.

    The 78Mb/s avg rate on 3.2.x kernel with --bw=100 sounds troubling. At 100Mb/s, you would need a 120us timer to trickle 1500Byte packets for good rate limiting. I would try to use a server for running any experiment that requires good performance fidelity.

    --
    Vimal

    ReplyDelete
  2. Thanks Vimal,

    I had forgotten the README suggests that specific kernel. I've tried
    building that same version on multiple hosts, including a server that
    has a good timer.

    The server has significantly better fidelity, but I don't see the red
    queue on the switch marking any packets with the 3.2.18 kernel. With
    3.6 I see marked packets and tcp with standard ecn working.

    I'll try your deb packages next.

    Thanks again.

    Andrew

    ReplyDelete
  3. Vimal,

    I've used your kernel deb packages on Ubuntu 12.04.1. The results from
    dctcp look like those from tcpecn. The bottleneck queue length
    approaches 200 packets instead of oscillating around 20 packets as it
    should. I've attached the results, minus the large tcp_probe.txt file.

    The server I'm testing with is a 1.6GHz Opteron, and the timer
    precision is well below 120us.

    Any ideas for what I should look at next?

    Thanks!

    ReplyDelete