My Notes: March 2014

Saturday, March 29, 2014

BufferBloat

INTRODUCTION

In this exercise we will study the dynamics of TCP in home networks. Take a look at the figure below which shows a “typical” home network with a Home Router connected to an end host. The Home Router is connected via Cable or DSL to a Headend router at the Internet access provider’s office. We are going to study what happens when we download data from a remote server to the End Host in this home network.

In a real network it’s hard to measure cwnd (because it’s private to the Server) and the buffer occupancy (because it’s private to the router). To make our measurement job easier, we are going to emulate the network in Mininet (See Environment Setup for setting up the environment).

The goals of the exercise are to:

Learn first-hand the dynamics of cwnd and buffer occupancy in a “real” network.
Learn why large router buffers can lead to poor performance in home networks. This problem is often called “Buffer Bloat.”
Learn how to use Mininet so you can repeat or extend the experiments in your own time.

PART 1: GET MININET UP AND RUNNING

Get the Bufferbloat Topology

The bufferbloat topology is in svn under the hw8 directory.

Run the Mininet Emulator

> cd cs144_bufferbloat/
> sudo ./run.sh

Measure the Delay Between the Two Hosts

After Mininet is running, you can measure the delay from H1 to H2 with the command: mininet> h1 ping -c 10 h2

PART 2: WEB PAGE DOWNLOAD - SKETCH THE TCP CWND

Measure how long it takes to download a web page from H1

mininet> h2 wget http://10.0.0.1

Answer: _____________ seconds

Sketch how you think cwnd evolves over time at H1. Mark multiples of RTT on the x-axis.

PART 3: “STREAMING VIDEO” - SKETCH THE TCP CWND AND BUFFER OCCUPANCY.

Create the Video Flow

To see how the dynamics of a long flow (which enters the AIMD phase) differs from a short flow (which never leaves slow-start), we are going to repeat Part 2 for a “streaming video flow”. Instead of actually watching videos on your machine, we are going to set up a long-lived high speed TCP connection instead, to emulate a long-lived video flow. You can generate long flows using the iperf command, and we have wrapped it in a script which you can run as follows:

mininet> h1 ./iperf.sh

You can see the throughput of TCP flow from H1 to H2 by running:

mininet> h2 tail -f ./iperf-recv.txt

You can quit viewing throughput by pressing CTRL-C.

The TCP CWND of the Video Flow

Sketch how you think cwnd evolves over time at H1. You might find it useful to use ping to measure how the delay evolves over time, after the iperf has started:

mininet> h1 ping -c 100 h2

The Impact on the Short Flow

To see how our long-lived iperf flow affects our web page download, download the webpage again - while iperf is running. Write down how long it takes.

mininet> h2 wget http://10.0.0.1

Answer: _____________ seconds

Why does the web page take so much longer to download?

Please write your explanation below. Answer:

PART 4: MEASURING THE REAL CWND AND BUFFER OCCUPANCY VALUES.

It turns out that Mininet lets you measure cwnd and buffer occupancy values. A script is provided to dump the values of cwnd and buffer occupancy into files. We’re going to re-run a couple of the experiments and plot the real values.

Restart Mininet

Stop and restart Mininet and the monitor script, then re-run the above experiment as follows.

mininet> exit
bash# sudo ./run.sh

Monitor TCP CWND and Buffer Occupancy in Mininet

In another bash terminal, go to cs144_bufferbloat directory and type the following giving a name for your experiment.

bash# ./monitor.sh <EXP_NAME>

Don’t worry if you see “ERROR: Module tcp_probe does not exist in /proc/modules”, it just means this module is not previously loaded.

mininet> h1 ./iperf.sh

(wait for 70 seconds …)

mininet> h2 wget http://10.0.0.1

Wait for the wget to complete, then stop the python monitor script followed by the instructions on the screen. The cwnd values are saved in:

_tcpprobe.txt and the buffer occupancy in _sw0-qlen.txt.

Plot CWND and Queue Occupancy

Plot the TCP cwnd and queue occupancy from the output file

bash# ./plot_figures.sh <EXP_NAME>

Adjust command line parameters to generate the figure you want.

The script will also host a webserver on the machine and you can use the url the script provided to access to your figures if it is a remote machine w/ public IP. Sample figures. If you are unable to see the cwnd, ensure you run wget after you started the monitor.sh script.

By now you will have realized that the buffer in the Headend router is so large that when it fills up with iperf packets, it delays the short wget flow. Next we’ll look at two ways to reduce the problem.

PART 5: MAKE THE ROUTER BUFFER SMALLER. REDUCE IT FROM 100 PACKETS TO 20 PACKETS.

Restart Mininet with small buffer

Stop any running Mininet and start Mininet again, but this time we will make the buffers 20 packets long instead:

prompt> sudo ./run-minq.sh

Let’s also run the monitor script on the side:

prompt> sudo ./monitor.sh <EXP_NAME>

Repeat the steps in Parts 2 and 3:

mininet> h2 wget http://10.0.0.1
mininet> h1 ping -c 10 h2
mininet> h1 ./iperf.sh
mininet> h1 ping -c 30 h2
mininet> h2 wget http://10.0.0.1

What do you think the cwnd and queue occupancy will be like in this case?

Plot CWND and Queue Occupancy

Plot the figure for cwnd and queue occupancy, this time using the script “./plot_figures_minq.sh”

prompt> ./plot_figures_minq.sh

Then again, use the url to see your figures.Sample figures

Why does reducing the queue size reduce the download time for wget?

Please put your explanation below. Answer:

DIFFERENT QUEUES

The problem seems to be that packets from the short flow are stuck behind a lot of packets from the long flow. What if we maintain a separate queue for each flow and then put iperf and wget traffic into different queues?

For this experiment, we put the iperf and wget/ping packets into separate queues in the Headend router. The scheduler implements fair queueing so that when both queues are busy, each flow will receive half of the bottleneck link rate. Queue

Restart Mininet

Start Mininet again, but this time we will create two queues, one for each type of traffic.

prompt> sudo ./run-diff.sh

Repeat the steps in Parts 2 and 3

mininet> h2 wget http://10.0.0.1
mininet> h1 ping -c 10 h2
mininet> h1 ./iperf.sh
mininet> h1 ping -c 30 h2
mininet> h2 wget http://10.0.0.1

You should see the ping delay (and the wget download time) doesn’t change much before and after we start the iperf.

Just for test title

hello

Tuesday, March 25, 2014

2

-Others while we debug.

That is very weird.  Could you modify the DCTCP script to spawn the CLI just before the experiment starts, so you can check whether the "red" qdisc is actually getting installed?

thanks,
--
Vimal

************************************************************************************************
The red queue on the switch, right? The one on h1 looked similar. Both
showed 0 for dropped, overlimits, and marked. Shouldn't it be non-zero
afterwards?

before

qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 1446 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0

after

qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 1594 bytes 21 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0




Full output:

shewa@h0:~/src/mininet_tests/dctcp$ ./run-dctcp.sh
net.ipv4.tcp_dctcp_enable = 0
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_dctcp_enable = 1
net.ipv4.tcp_ecn = 1
~~~~~~~~~~~~~~~~~> BW = 100.0
*** Creating network
*** Adding controller
*** Adding hosts:
h1 h2 h3
*** Adding switches:
s1
*** Adding links:
(100.00Mbit ECN) (100.00Mbit ECN) (h1, s1) (100.00Mbit 0.075ms  0.05ms
distribution normal   delay) (100.00Mbit 0.075ms  0.05ms distribution
normal   delay) (h2, s1) (100.00Mbit 0.075ms  0.05ms distribution
normal   delay) (100.00Mbit 0.075ms  0.05ms distribution normal
delay) (h3, s1)
*** Configuring hosts
h1 (cfs -1/100000us) h2 (cfs -1/100000us) h3 (cfs -1/100000us)
*** Starting controller
*** Starting 1 switches
s1 (100.00Mbit ECN) (100.00Mbit 0.075ms  0.05ms distribution normal
delay) (100.00Mbit 0.075ms  0.05ms distribution normal   delay)
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_req=1 ttl=64 time=2.40 ms
64 bytes from 10.0.0.2: icmp_req=2 ttl=64 time=1.07 ms

--- 10.0.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.075/1.741/2.408/0.667 ms

*** Starting CLI:
mininet> s1 tc -s qdisc
qdisc mq 0: dev eth0 root
 Sent 830689151 bytes 682384 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth1 root
 Sent 670 bytes 5 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1446 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 1446 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0
qdisc netem 10: dev s1-eth1 parent 6: limit 425
 Sent 1446 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth2 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1458 bytes 19 pkt (dropped 0, overlimits 17 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth2 parent 5:1 limit 1000000 delay 74us  49us
 Sent 1458 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth3 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1154 bytes 15 pkt (dropped 0, overlimits 14 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth3 parent 5:1 limit 1000000 delay 74us  49us
 Sent 1154 bytes 15 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
mininet> quit
     1 seconds left

*** Starting CLI:
mininet> s1 tc -s qdisc
qdisc mq 0: dev eth0 root
 Sent 850160497 bytes 700768 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth1 root
 Sent 670 bytes 5 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1594 bytes 21 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 1594 bytes 21 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0
qdisc netem 10: dev s1-eth1 parent 6: limit 425
 Sent 1594 bytes 21 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth2 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1512 bytes 20 pkt (dropped 0, overlimits 18 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth2 parent 5:1 limit 1000000 delay 74us  49us
 Sent 1512 bytes 20 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth3 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1282 bytes 17 pkt (dropped 0, overlimits 16 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth3 parent 5:1 limit 1000000 delay 74us  49us
 Sent 1282 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
************************************************************************************************
Forget dropped/overlimits -- from your output before/after the experiment it seems like most of the bytes were sent out of eth0.  Are you sure TCP is even senidng data on the right interfaces?  Can you check the throughput during the experiment?

ethstats is a nice tool (sudo apt-get install ethstats; ethstats -n1)

--
Vimal
************************************************************************************************
Yes.

The bwm-ng monitoring graph shows 100Mbps over s1-eth1. I turned
bwm-ng off and monitored with ethstats (apparently they can't both run
simultaneously) and it agrees.

$ ethstats -n1 &> /tmp/ethstats.txt
$ cat /tmp/ethstats.txt | grep s1-eth1 | sort | uniq -c
      1   s1-eth1:    0.00 Mb/s In     0.00 Mb/s Out - 4293979006.0
p/s In  4293977900.0 p/s Out
      1   s1-eth1:    0.00 Mb/s In     0.00 Mb/s Out - 4293987295.0
p/s In  4293986192.0 p/s Out
      1   s1-eth1:    0.00 Mb/s In     0.00 Mb/s Out - 4294959007.0
p/s In  4294959004.0 p/s Out
      1   s1-eth1:    0.00 Mb/s In     0.00 Mb/s Out - 980001.0 p/s In
 981104.0 p/s Out
      1   s1-eth1:    4.86 Mb/s In   100.29 Mb/s Out -   8241.0 p/s In
   8287.0 p/s Out
      1   s1-eth1:    4.86 Mb/s In   100.30 Mb/s Out -   8261.0 p/s In
   8297.0 p/s Out
...

Since bwm-ng and ethstats conflicted, I assume that so does 'tc -s
qdisc' ... so I reran with just it.
Now it shows the traffic going through s1-eth1 and the red queue. But
it still doesn't show any
marked packets. HTB shows overlimits for each switch interface, but I
don't see anything else.

before:

qdisc mq 0: dev eth0 root
 Sent 365727120 bytes 290930 pkt (dropped 0, overlimits 0 requeues 1)
 backlog 0b 0p requeues 1
qdisc mq 0: dev eth1 root
 Sent 1012 bytes 6 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 702 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 702 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0
qdisc netem 10: dev s1-eth1 parent 6: limit 425
 Sent 702 bytes 9 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth2 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 620 bytes 8 pkt (dropped 0, overlimits 7 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth2 parent 5:1 limit 1000000 delay 74us  49us
 Sent 620 bytes 8 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth3 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 316 bytes 4 pkt (dropped 0, overlimits 4 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth3 parent 5:1 limit 1000000 delay 74us  49us
 Sent 316 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

after:

qdisc mq 0: dev eth0 root
 Sent 483556337 bytes 377972 pkt (dropped 0, overlimits 0 requeues 1)
 backlog 0b 0p requeues 1
qdisc mq 0: dev eth1 root
 Sent 1012 bytes 6 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 1502786766 bytes 993579 pkt (dropped 0, overlimits 1953808 requeues 0)
 backlog 0b 0p requeues 0
qdisc red 6: dev s1-eth1 parent 5:1 limit 1000000b min 30000b max 35000b ecn
 Sent 1502786766 bytes 993579 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  marked 0 early 0 pdrop 0 other 0
qdisc netem 10: dev s1-eth1 parent 6: limit 425
 Sent 1502786766 bytes 993579 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth2 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 36634038 bytes 492727 pkt (dropped 0, overlimits 562072 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth2 parent 5:1 limit 1000000 delay 74us  49us
 Sent 36634038 bytes 492727 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc htb 5: dev s1-eth3 root refcnt 2 r2q 10 default 1 direct_packets_stat 0
 Sent 37156074 bytes 499835 pkt (dropped 0, overlimits 568936 requeues 0)
 backlog 0b 0p requeues 0
qdisc netem 10: dev s1-eth3 parent 5:1 limit 1000000 delay 74us  49us
 Sent 37156074 bytes 499835 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0



-- Andrew Shewmaker 
************************************************************************************************
Thanks!   Looking at the queue sizes and the cwnd plots, it seems like DCTCP is working.  The graph with TCP would look very different with pronounced sawtooths.

Can you confirm that the RED queue does ECN marking by doing a tcpdump at s1-eth1 and checking for ECN marked packets (tcpdump -ni s1-eth1 'ip && (ip[1]&0x3==0x3)'?   I am guessing the bug is that RED isn't reporting stats correctly....

bwm-ng and ethstats read /proc/net/dev to report stats.  tc doesn't use that interface, so you can run it with the other tools.  Though I wonder why bwm-ng and ethstats lock /proc/net/dev....

--
Vimal
************************************************************************************************
I ran the tcpdump outside of mininet and didn't see anything.

I added the above tcpdump to dctcp.py to run alongside the others
already built into the script for the hosts, and I'm getting an empty
pcap file for the switch and none for the hosts. I checked stderr in
their Popen objects and they're all getting:

setns: Bad address

I didn't see that error (or anything useful) when I executed by hand in mininet:

mininet> s1 tcpdump -ni s1-eth1 'ip && (ip[1]&0x3==0x3)'

I'm still looking into that error message.



-- Andrew Shewmaker 
************************************************************************************************
Hmm, could you see what tcpdump -ni s1-eth1 'ip && (ip[1]&0x3==0x2)' outputs?  This checks for ECN capable packets that are not ECN marked.  You needn't have to run it inside Mininet -- it should work in the root namespace as well.

--
Vimal


************************************************************************************************
It looks like everything is ECN capable, but no ECN markings.

shewa@h0:~/src/mininet_tests/dctcp$ uname -r; sudo tcpdump -ni s1-eth1
-s0 -w /tmp/s1-eth1.pcap
3.2.18dctcp3
tcpdump: WARNING: s1-eth1: no IPv4 address assigned
tcpdump: listening on s1-eth1, link-type EN10MB (Ethernet), capture
size 65535 bytes
tcpdump: pcap_loop: The interface went down
1733228 packets captured
1969960 packets received by filter
236683 packets dropped by kernel

shewa@h0:~/src/mininet_tests/dctcp$ sudo tcpdump -n -r
/tmp/s1-eth1.pcap 'ip && (ip[1]&0x3==0x3)'
reading from file /tmp/s1-eth1.pcap, link-type EN10MB (Ethernet)

shewa@h0:~/src/mininet_tests/dctcp$ sudo tcpdump -n -r
/tmp/s1-eth1.pcap 'ip && (ip[1]&0x3==0x2)'
...
many, many lines
...
10:27:03.377120 IP 10.0.0.2.60928 > 10.0.0.1.5001: Flags [.], seq
594378104:594379552, ack 1, win 29, options [nop,nop,TS val 804230 ecr
804229], length 1448
10:27:03.377148 IP 10.0.0.1.5001 > 10.0.0.2.60928: Flags [.], ack
594382448, win 1076, options [nop,nop,TS val 804268 ecr 804230],
length 0
10:27:03.377241 IP 10.0.0.2.60928 > 10.0.0.1.5001: Flags [.], seq
594382448:594383896, ack 1, win 29, options [nop,nop,TS val 804230 ecr
804229], length 1448
10:27:03.377285 IP 10.0.0.1.5001 > 10.0.0.2.60928: Flags [.], ack
594383896, win 1076, options [nop,nop,TS val 804268 ecr 804230],
length 0
10:27:03.377362 IP 10.0.0.2.60928 > 10.0.0.1.5001: Flags [P.], seq
594383896:594384008, ack 1, win 29, options [nop,nop,TS val 804230 ecr
804229], length 112
10:27:03.377371 IP 10.0.0.2.60928 > 10.0.0.1.5001: Flags [F.], seq
594384008, ack 1, win 29, options [nop,nop,TS val 804232 ecr 804231],
length 0
10:27:03.383468 IP 10.0.0.1.5001 > 10.0.0.2.60928: Flags [F.], seq 1,
ack 594384009, win 1076, options [nop,nop,TS val 804275 ecr 804232],
length 0
10:27:03.383506 IP 10.0.0.1.5001 > 10.0.0.3.43140: Flags [F.], seq 0,
ack 838574850, win 1726, options [nop,nop,TS val 804275 ecr 804225],
length 0
10:27:03.383633 IP 10.0.0.2.60928 > 10.0.0.1.5001: Flags [.], ack 2,
win 29, options [nop,nop,TS val 804275 ecr 804275], length 0
10:27:03.383680 IP 10.0.0.3.43140 > 10.0.0.1.5001: Flags [.], ack 1,
win 29, options [nop,nop,TS val 804275 ecr 804275], length 0


************************************************************************************************
Hmm, that is very weird.

OK, let's try something else.  Here is a modified htb module that does ECN marking:
http://stanford.edu/~jvimal/htb-ecn/

You might have to:
* edit the experiment script to remove RED qdisc from being added
* rmmod sch_htb before running any experiment
* download the two files from the link above and type "make" to create a new sch_htb.ko module
* insmod /path/to/new/sch_htb.ko ecn_thresh_packets=30
* make sure the expt script doesn't do "rmmod sch_htb"...  if so, comment it out
* rerun the experiments, checking for ECN marks both on both s1-eth1 as well as the host that receives the two flows (mininet> hN tcpdump ... hN-eth0 'ip &&...' )

And see what happens.  Since the queue occupancy output suggests that it is greater than 300, packets must be marked....

--
Vimal

************************************************************************************************

That worked. Thanks!

My tcpdump on s1-eth1 sees ECN marked packets and the bottleneck queue
length is mostly between 23 and 32 packets. 50MB/s per flow.

I wonder why red isn't working on any of the systems I tested, but I
suppose the main thing is that your htb module does.

Again, thank you so much.

Andrew

************************************************************************************************
+Others.

Glad it worked.

@all: It appears the DCTCP bug was due the RED module
not ECN marking packets.  A custom htb module that did
ECN marking seems to have resolved the issue.

Andrew: Here is a pointer if you want to further debug
this:

1. Look at the red_enqueue function: http://lxr.free-electrons.com/source/net/sched/sch_red.c?v=3.2#L57

2. Line 74 does the ECN marking but it's conditioned on a few cases.  It's worth checking if those conditions are triggered.

And, here is a patch to modify sch_htb.c from any recent
kernel.  It's worth keeping it safe somewhere since my
webpage might not be around forever.

@@ -39,6 +39,7 @@
 #include <linux/slab.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/inet_ecn.h>

 /* HTB algorithm.
     Author: devik@cdi.cz
@@ -64,6 +65,10 @@
 module_param    (htb_hysteresis, int, 0640);
 MODULE_PARM_DESC(htb_hysteresis, "Hysteresis mode, less CPU load, less accurate");

+static int ecn_thresh_packets __read_mostly = 0; /* 0 disables it */
+module_param(ecn_thresh_packets, int, 0640);
+MODULE_PARM_DESC(ecn_thresh_packets, "ECN marking threshold in packets");
+
 /* used internaly to keep status of single class */
 enum htb_cmode {
  HTB_CANT_SEND,  /* class can't send and can't borrow */
@@ -552,6 +557,9 @@
  struct htb_sched *q = qdisc_priv(sch);
  struct htb_class *cl = htb_classify(skb, sch, &ret);

+ if (ecn_thresh_packets && (sch->q.qlen >= ecn_thresh_packets))
+  INET_ECN_set_ce(skb);
+
  if (cl == HTB_DIRECT) {
   /* enqueue to helper queue */
   if (q->direct_queue.qlen < q->direct_qlen) {

--
Vimal
************************************************************************************************
Thanks, Vimal! I'll let you know if I fix a bug.

-- Andrew Shewmaker 
************************************************************************************************

************************************************************************************************

Monday, March 24, 2014

1

Hi,

First, great job on Mininet. It's a wonderful tool, and I think
replicating the results of CS papers is extremely important.

Second, I've been having trouble replicating the dctcp experiments, as
described in
http://reproducingnetworkresearch.wordpress.com/2012/06/09/dctcp-2/

With 3.2 and 3.2.54 kernels, it doesn't appear as if the ECN marking
is occuring. The bottleneck queue length has the same oscillation
between 200 and 425 packets for reno, reno+ecn, and dctcp. I added an
execution of 'tc -s qdisc' on the switch at the end of dctcp.py and it
confirms that no packets are being marked.

The behavior improves somewhat with a 3.6 kernel (the patch required
little modification up to that point in the series). At this point I
see reno+ecn working to keep the bottleneck queue length below 30
packets. But dctcp still doesn't appear to work even though stats show
the switch is marking packets.  I have also uncommented the printks
marking the transition between CE=0 and CE=1 states in the ACK
generation state machine, but see nothing in dmesg.

Do you have any insights into what might be going wrong?

Sometimes I worry that my laptop isn't fast enough, but see below.

Thank you for any information you might share,

Andrew Shewmaker

My laptop specs are:

2GHz 8-core i7 laptop w/ 8GB RAM
Fedora 18
custom 3.2, 3.2.54, and 3.6 kernels + dctcp patch
openvswitch 2.0
mininet from git

'sudo mn --test=iperf' yields:
550+ Mbps on 3.2.x
750+ Mbps on 3.6
10+ Gbps on 3.10+ (tso on)
1+ Gbps on 3.10+ (tso off, gso on)
550+ Mbps on 3.10+ (tso/gso off)

'sudo mn --link=tc,bw=100 --test=iperf' yields
78+ Mbps on 3.2.x
90+ Mbps on 3.10+

And those rates decrease again for the dctcp.py experiment:
7-16 Mbps on 3.2.x
20-30 Mbps on 3.6+
These don't seem fast enough to cause congestion, but the bottleneck
queue length zigzags between 200 and 425 packets, and I see the
regular reno cwnd response.


-- 
Andrew Shewmaker

Friday, March 14, 2014

rtnetlink - Linux IPv4 routing socket

netem - network emulator

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

Monday, March 3, 2014

nohup Execute Commands After You Exit From a Shell Prompt

aaaMost of the time you login into remote server via ssh. If you start a shell script or command and you exit (abort remote connection), the process / command will get killed. Sometime job or command takes a long time. If you are not sure when the job will finish, then it is better to leave job running in background. But, if you log out of the system, the job will be stopped and terminated by your shell. What do you do to keep job running in the background when process gets SIGHUP?

Say hello to nohup command

The answer is simple, use nohup command line-utility which allows to run command/process or shell script that can continue running in the background after you log out from a shell:

nohup command syntax:

The syntax is as follows
nohup command-name &
OR
nohup /path/to/command-name arg1 arg2 &http://www.cyberciti.biz/tips/nohup-execute-commands-after-you-exit-from-a-shell-prompt.html

How do I unzip a tar gz archive to a specific destination?

ou have two choices:

cd /root/Desktop/folder
tar zxf /root/Documents/file.tar.gz

tar zxf file.tar.gz -C /root/Desktop/folder