Measuring Latency Variation in the Internet
This is the companion web site to the paper “Measuring Latency Variation in the Internet”, which is currently under submission.
Data and scripts
The file below contains the scripts, BQL queries and intermediate data files used to produce the graphs in the paper:
- measuring-latency-variation.tar.gz (284MiB; sha1sum c963339efc5bdc9579e441a2f05480c2945f7c50)
Identifying queueing latency in the M-Labs dataset
This is a longer explanation of the algorithm to identify self-induced queueing than that given in the paper. For context, see the paper. The algorithm is implemented in analyse.py in the file linked above.
Our analysis is based upon the fact that for some flows, we have observed a distinct pattern where the sample RTT increases over the lifetime of the flow until a congestion event1, then sharply decreases afterwards. An example of this pattern is seen in Figure 1. We believe it is reasonable to assume that when this pattern occurs, the drop in RTT is because the queueing delay induced by the flow dissipates as it slows down. Thus, with this sharp correlation between a congestion event and a subsequent drop in RTT, we can measure the magnitude of the drop and use it as a measure of (the lower bound of) queueing delay.
We limit the analysis to flows that have exactly one congestion event, and spend the larger part of its lifetime being limited by the congestion window (and not by other factors such as the receiver window or sender processing). Additionally, we filter out flows that run for less than 9 seconds, or transfer less than 0.2 MB of data. For the remaining flows, we identify the pattern mentioned above by the following algorithm:
Find three values:
first_rtt, the first non-zero RTT sample;
cong_rtt, the RTT sample at the congestion event; and
cong_rtt_next, the first RTT sample after the congestion event that is different from
Compute the differences between
cong_rtt_ next. If both of these values are above 40 ms2, return the difference between
We found this basic algorithm to give good results in itself. However, we found that it could be improved further by the following refinements of the basic detection algorithm:
cong_rtt, use the median of
cong_rttand the two previous RTT samples. This weeds out tests where only a single RTT sample (coinciding with the congestion event) is higher than the baseline.
cong_rtt_next, use the minimum of the five measurements immediately following the first RTT sample that is different from
cong_rtt. This makes sure we include cases where the decrease after the congestion event is not instant, but happens over a couple of RTT samples.
Compute the maximum span between the largest and smallest RTT sample in a sliding window of 10 data samples over the time period following the point of
cong_rtt_next. If this span is higher than the drop in RTT after the congestion event, filter out the flow.