From lachlan.andrew at gmail.com Wed Feb 6 21:22:59 2008 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 6 Feb 2008 21:22:59 -0800 Subject: [Tmrg] Mix of RTTs Message-ID: Greetings Sally, I have a question about the connection between the traffic model and RTTs to use in TCP analysis. When the "better models" paper compares the simulated and measured RTT distributions, it mentions that most packets come from the short-RTT flows. That will clearly be the case if all flows are long-lived, or if the traffic model is "closed loop" in the sense that it consists only of alternating "think times" and fixed-time files. If the traffic consists instead of Poisson arrivals of "sessions", each carrying a fixed amount of traffic (possibly in several think/send bursts), then the amount of data sent at each RTT is determined by the traffic model, independent of the actual RTTs. At the round table, we agreed to have a traffic model of the second kind. Will that change the RTTs that we should use in the test suite? As I recall, you wanted to revise that section before the final submission anyway. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820 Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan From sallyfloyd at mac.com Wed Feb 13 17:21:31 2008 From: sallyfloyd at mac.com (Sally Floyd) Date: Wed, 13 Feb 2008 17:21:31 -0800 Subject: [Tmrg] Towards a Common TCP Evaluation Suite Message-ID: <2FE95FF6-9F61-45D1-9D2E-9BE2D1637B2E@mac.com> Some of us (nine co-authors) have submitted a draft paper to PFLDnet 2008 on "Towards a Common TCP Evaluation Suite", and the draft paper has been accepted. This paper grew out of a workshop organized by Lachlan Andrew at CalTech last November. The draft paper is available from "http://www.icir.org/floyd/papers/pfldnet2008-draft.pdf". We are revising the paper now, and the final version is due on February 22. Any feedback would be welcome. - Sally (one of the nine co-authors) http://www.icir.org/floyd/ From sallyfloyd at mac.com Mon Feb 18 18:17:50 2008 From: sallyfloyd at mac.com (Sally Floyd) Date: Mon, 18 Feb 2008 18:17:50 -0800 Subject: [Tmrg] (limited) measurement of file size vs congestion level In-Reply-To: <478345FF.6050306@ftw.at> References: <478345FF.6050306@ftw.at> Message-ID: <1FC26B49-CC2D-40C7-A034-1C4541050C86@mac.com> Fabio - Many thanks for the report. - Sally ... > 3. However, based on our experience (we have seen many severe > congestion events in this network), I can report the following > qualitative observations > > A. the user abandoning process seems to be "with threshold" : > if you consider the frequency of TCP RST as a gross indicator of > user (or server) impatience, we saw that for mild congestion (right > before the peak hour, on a congested link) the RST stay at > physiological level (pretty low), while it sharply jumps to > abnormally high values when the congestion becomes severe (during > the peak hour) > > B. if you look at the distribution of the number of packets > downloaded by each users in fixed timebins (e.g. 1 min), you see > that after a capacity upgrade that removes a congestion points, > such distribution changes, with more user downloading more packets > (as expected). > > > 4. My expectation is that the users regulate the duration of the > *session* and the total download rate (often across multiple > parallel TCP connection) based on the experienced response time, > there it is the *session* attributes (duration, rate), rather than > the *file* ones, that are dependent on the congestion level. At the > TCP level, this might means that it?s the connection arrival > process that is mostly impacted, rtaher than the size (the latter > is probably affected only in the tail of long files, which are > probably truncated upon congestion). > Furthermore, after a certain threshold (severe congestion), users > or servers suddenly get crazy and start to reclick/reset the > downloads, and eventually give up the session. ... > [1] "User patience and the Web: a hands-on investigation", by > Rossi, Casetti, Mellia, @ Globecom 2003. > [2] F. Ricciato, F. Vacirca, P. Svoboda, Diagnosis of Capacity > Bottlenecks via Passive Monitoring in 3G Networks: an Empirical > Analysis, Computer Networks, vol. 51, n.4, pp. 1205-1231, March 2007 > [3] http://userver.ftw.at/~ricciato/darwin/ - Sally http://www.icir.org/floyd/ From sallyfloyd at mac.com Mon Feb 18 19:35:55 2008 From: sallyfloyd at mac.com (Sally Floyd) Date: Mon, 18 Feb 2008 19:35:55 -0800 Subject: [Tmrg] Mix of RTTs In-Reply-To: References: Message-ID: Lachlan - (Getting to old email...) > I have a question about the connection between the traffic model and > RTTs to use in TCP analysis. > > When the "better models" paper compares the simulated and measured RTT > distributions, it mentions that most packets come from the short-RTT > flows. That will clearly be the case if all flows are long-lived, or > if the traffic model is "closed loop" in the sense that it consists > only of alternating "think times" and fixed-time files. > > If the traffic consists instead of Poisson arrivals of "sessions", > each carrying a fixed amount of traffic (possibly in several > think/send bursts), then the amount of data sent at each RTT is > determined by the traffic model, independent of the actual RTTs. I don't understand this. Assume Poisson arrivals of sessions, each carrying a fixed amount of traffic. The amount of data sent in each RTT is determined by the end-to-end congestion control. For TCP, where in congestion avoidance a flow increases its sending rate by one packet per RTT, short-RTT flows send at a much higher sending rate *in packets per second* than do long-RTT flows, given the same packet drop rates for the two flows. I agree that we want a traffic model of Poisson arrivals of sessions, each carrying a fixed amount of traffic (from a heavy-tailed distribution). > At the round table, we agreed to have a traffic model of the second > kind. Will that change the RTTs that we should use in the test suite? Figure 5 from the Internet Research Needs Better Models paper has most of the traffic on the second kind above (from the traffic generator in ns-2, with Poisson arrivals of sessions, and heavy-tailed distributions of file sizes, along with other parameters), though there are a few long-lived flows in Figure 5 of that paper. The simulation was run for 100 seconds of simulation time, with packet drop rates over the second half of the simulation of roughly 3%. I would assume that at the end of the 100 seconds, the long-RTT flows had more unfilled demand that the short-RTT flows. > As I recall, you wanted to revise that section before the final > submission anyway. The Internet Research Needs Better Models paper used RTTs that were uniformly distributed between 20 and 460 ms, in the absence of queueing delay. Table 1 of the PFLDnet paper now gives RTTs in the range of 4 to 200 ms, in the absence of queueing delay. It has varied over time - for revision 1.13, Table 1 had a range of RTTs from 0 to 100 ms. In revision 1.14, Table 1 was changed to have a range of RTTs from 0 to 200 ms. In revision 1.15, this was changed (perhaps by me) to have a range of RTTs from 4 to 400 ms. In revision 1.18, this was changed back to a range of RTTs from 4 to 200 ms. The range of RTTs up to 400 ms seems the most realistic to me, for the default scenario, but I could live with a range up to 200 ms, for the first pass at the scenarios. Perhaps it was changed back because 200 ms is easier for testbeds than 400 ms? I don't remember, and it is impossible to tell from the logs who made which change. - Sally http://www.icir.org/floyd/ From lachlan.andrew at gmail.com Mon Feb 18 20:17:38 2008 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Mon, 18 Feb 2008 20:17:38 -0800 Subject: [Tmrg] Mix of RTTs In-Reply-To: References: Message-ID: Greetings Sally, On 18/02/2008, Sally Floyd wrote: > > If the traffic consists instead of Poisson arrivals of "sessions", > > each carrying a fixed amount of traffic (possibly in several > > think/send bursts), then the amount of data sent at each RTT is > > determined by the traffic model, independent of the actual RTTs. > > I don't understand this. Assume Poisson arrivals of sessions, each > carrying a fixed amount of traffic. The amount of data sent in > each RTT is determined by the end-to-end congestion control. Yes. My wording was confusing. When I said "data sent *at* each RTT", I meant "data eventually sent by flows having a particular RTT", not "data in a particular interval of duration one RTT". Each individual long-RTT flow will transmit slower, but as a result, there will be more of them in the system. The total data (eventually) sent by these flows equals the sum of the file sizes which arrive, regardless of how slowly they are sent. (Of course, this only applies exactly if the time scale of the simulation is long compared to one flow transfer time, but that is the way the real world is.) > For > TCP, where in congestion avoidance a flow increases its sending > rate by one packet per RTT, short-RTT flows send at a much higher > sending rate *in packets per second* than do long-RTT flows, given > the same packet drop rates for the two flows. Agreed. The rate that an individual long-RTT flow sends will be lower, but this is balanced by the fact that it keeps sending for longer. > > At the round table, we agreed to have a traffic model of the second > > kind. Will that change the RTTs that we should use in the test suite? > > Figure 5 from the Internet Research Needs Better Models paper ... > I would assume that at the end of the 100 seconds, the long-RTT > flows had more unfilled demand that the short-RTT flows. Yes, the long flows will have more unfilled demand. However, if the simulation has been run long enough, the unfilled demand of the remaining flows will be a small fraction of the total data. This gets back to our discussion about whether it is meaningful to study time-average properties of systems which haven't yet reached equilibrium. The reason for taking measurements only over the second half is to avoid non-equilibrium effects, isn't it? If so, it would be consistent to wait until the system actually has reached equilibrium. Otherwise, time averages are misleading quantities. Do you agree? At a random point in the real world, the each long RTT flow will be about half-finished, just like each short RTT flow will be. If the long RTT flows have more unsent data in the real world, it is because there are more of them. > > As I recall, you wanted to revise that section before the final > > submission anyway. > > The Internet Research Needs Better Models paper used RTTs that > were uniformly distributed between 20 and 460 ms, in the > absence of queueing delay. Table 1 of the PFLDnet paper > now gives RTTs in the range of 4 to 200 ms, in the absence > of queueing delay. Your email below suggests that it was a little more complicated than uniform [20,460]. I interpreted it as saying that most of the traffic was [0,220], but I could have misunderstood. If you're happy with the current text, we'll just go with it. Cheers, Lachlan On 11/12/2007, Sally Floyd wrote: > > You're right that the delays don't match those in the paper very well. > > Our reference was the link you sent in November. As I commented on 3 > > December, there seems to be a discrepancy between the paper and the > > scripts on the web which purport to have produced those graphs. Since > > the paper didn't have values and we didn't hear your response to my > > query, we used the values in the scripts we were pointed to. > > > > Are you sure that the scripts on the web are the ones used? > > Yep. But it turns out that the topology in the scripts is more > complicated that I remembered. > > In the scripts, there are two sets of access links. > > The access links for the long-lived traffic are as follows: > $ns duplex-link $node_(s$i) $node_(r1) 100Mb [expr $delay2]ms > DropTail > $ns duplex-link $node_(k$i) $node_(r2) 100Mb [expr $delay2a]ms > DropTail > for > set delay2 [expr 2*$opt(secondDelay)*((($i+3)%10)/9.0)] > set delay2a [expr 2*$opt(secondDelay)*((($i+3)%10)/9.0)] > and secondDelay set to 55 ms. > > This gives one-way propagation delays for each of the access links > for the long-lived traffic of [0,110] ms, giving RTTs for the > long-lived traffic, > in the absence of queueing delay and the small delay for the central > link, > equally distributed in [0, 440] ms. > > The access delays for the web traffic are as follows: > $ns duplex-link $s_($i) $node_(r1) 2000Mb $x DropTail > $ns duplex-link $r_($i) $node_(r2) 2000Mb $y DropTail > for > set x [expr $bdel*((($i+3)%10)/9.0)]ms > set y [expr $bdel*((($i+3)%10)/9.0)]ms > and bdel set to 55 ms. > > This gives one-way propagation delays for each of the access links > for the web traffic of [0,55] ms, giving RTTs, in the absence of > queueing > delay etc., equally distributed in [0, 220] ms. > > I don't think that this difference between the RTTs for the long-lived > traffic and the web traffic was on purpose. > > Figure 5 was run with a range of web traffic and long-lived traffic, > but dominated by web traffic: > ./ns sims.tcl -flows 18 -web 400 -rtts 1 -title two > two.data > > I just reran the simulations, one with RTTs for the web traffic > equally distributed in [0, 220] ms., as used for Figure 5 in the paper, > and the other with RTTs for the web traffic equally distributed > in [0, 440] ms. This first one matched the experimental data > better, so I will change the one-way propagation delays for the > access links in the paper to give RTTs of [0, 220] ms. > > (I am assuming that everything in this first draft is subject to > change as we learn more from measurements and from running > the simulations and experiments....) > > > On 03/12/2007, Lachlan Andrew wrote: > >> Greetings Sally, > >> > >> On 26/11/2007, Sally Floyd wrote: > >>> Cesar wrote > >>>> 1) the RTTs of the access links for the dumbbell scenarios > >>>> In this topic, I read the paper by Sally and Kohler about > >>>> "Internet Research Needs Better Models". > >>>> "http://www.icir.org/models/hotnetsFinal.pdf" > >>> > >>> For the scenario in that paper, the flows are distributed evenly > >>> over all of the nine links pairs (that is, those pairs that have > >>> one link on the left, and one link on the right). The simulation > >>> scripts are available from "http://www.icir.org/models/sims.html". > >> > >> I've tried reading the scripts at > >> , and really can't > >> see what RTTs the links used. (I'm not very fluent at TCL.) > >> > >> It seems to me that the number of web nodes created in > >> add_web_traffic is numWeb=10, and the RTTs seem to be drawn from > >> a discretized uniform distribution [generated by > >> $bdel*((($i+3)%10)/9.0] with a maximum value of > >> opt(secondDelay)=55ms. That doesn't mesh with the maximum RTT of > >> 460ms in the paper. > >> > >> As I recall, at the meeting you offered to find the RTTs which would > >> match a measured distribution. As a short-cut for the PFLDnet > >> abstract, could you please let us know what link delays were used in > >> the "better models" paper? > >> > >> Thanks, > >> Lachlan > > - Sally > http://www.icir.org/floyd/ > > -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820 Fax: +1 (626) 568-3603 http://netlab.caltech.edu/lachlan From sallyfloyd at mac.com Thu Feb 21 16:42:10 2008 From: sallyfloyd at mac.com (Sally Floyd) Date: Thu, 21 Feb 2008 16:42:10 -0800 Subject: [Tmrg] Mix of RTTs In-Reply-To: References: Message-ID: Lachlan - On Feb 18, 2008, at 8:17 PM, Lachlan Andrew wrote: > Greetings Sally, > > On 18/02/2008, Sally Floyd wrote: >>> If the traffic consists instead of Poisson arrivals of "sessions", >>> each carrying a fixed amount of traffic (possibly in several >>> think/send bursts), then the amount of data sent at each RTT is >>> determined by the traffic model, independent of the actual RTTs. >> >> I don't understand this. Assume Poisson arrivals of sessions, each >> carrying a fixed amount of traffic. The amount of data sent in >> each RTT is determined by the end-to-end congestion control. > > Yes. My wording was confusing. When I said "data sent *at* each > RTT", I meant "data eventually sent by flows having a particular RTT", > not "data in a particular interval of duration one RTT". > > Each individual long-RTT flow will transmit slower, but as a result, > there will be more of them in the system. The total data (eventually) > sent by these flows equals the sum of the file sizes which arrive, > regardless of how slowly they are sent. (Of course, this only applies > exactly if the time scale of the simulation is long compared to one > flow transfer time, but that is the way the real world is.) Actually, the *real world* contains users whose behavior is a function of congestion and download times experienced so far. And in the real world (with current TCP), users over connections with longer RTTs have much slower download times that users over connections with shorter RTTs. And therefore will download less. But since our simulations and experiments don't yet have user behavior sensitive to past congestion and to past download times, this doesn't happen in our simulations and experiments... >> For >> TCP, where in congestion avoidance a flow increases its sending >> rate by one packet per RTT, short-RTT flows send at a much higher >> sending rate *in packets per second* than do long-RTT flows, given >> the same packet drop rates for the two flows. > > Agreed. The rate that an individual long-RTT flow sends will be > lower, but this is balanced by the fact that it keeps sending for > longer. > >>> At the round table, we agreed to have a traffic model of the second >>> kind. Will that change the RTTs that we should use in the test >>> suite? >> >> Figure 5 from the Internet Research Needs Better Models paper > ... >> I would assume that at the end of the 100 seconds, the long-RTT >> flows had more unfilled demand that the short-RTT flows. > > Yes, the long flows will have more unfilled demand. However, if the > simulation has been run long enough, the unfilled demand of the > remaining flows will be a small fraction of the total data. Yep, if the average load is less than 100%. If the average load is greater than 100%, then the unfilled demand increases and increases, the longer we run the simulation, with a lot of the unfilled demand from the longer-RTT flows. > This gets back to our discussion about whether it is meaningful to > study time-average properties of systems which haven't yet reached > equilibrium. The reason for taking measurements only over the second > half is to avoid non-equilibrium effects, isn't it? If so, it would > be consistent to wait until the system actually has reached > equilibrium. Otherwise, time averages are misleading quantities. Do > you agree? For me, the reason to take measurements over the second half of the experiment is to avoid the odd and atypical period in the beginning of the simulation when all flows are slow-starting at the same time. But personally, I am perfectly happy to run simulations for finite, specified time periods when the average load is greater than 100%, and there is no equilibrium. (In fact, I think it is probably quite necessary, if one wants scenarios with higher levels of congestion over the lifetime of the simulation.) > At a random point in the real world, the each long RTT flow will be > about half-finished, just like each short RTT flow will be. If the > long RTT flows have more unsent data in the real world, it is because > there are more of them. > >>> As I recall, you wanted to revise that section before the final >>> submission anyway. >> >> The Internet Research Needs Better Models paper used RTTs that >> were uniformly distributed between 20 and 460 ms, in the >> absence of queueing delay. Table 1 of the PFLDnet paper >> now gives RTTs in the range of 4 to 200 ms, in the absence >> of queueing delay. > > Your email below suggests that it was a little more complicated than > uniform [20,460]. I interpreted it as saying that most of the traffic > was [0,220], but I could have misunderstood. If you're happy with the > current text, we'll just go with it. Ah dear, I had forgotten about that email. Yep, I am happy with the current text. Take care, - Sally http://www.icir.org/floyd/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/tmrg-interest/attachments/20080221/3491d8c2/attachment.html From lachlan.andrew at gmail.com Thu Feb 21 17:17:46 2008 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Thu, 21 Feb 2008 17:17:46 -0800 Subject: [Tmrg] Mix of RTTs In-Reply-To: References: Message-ID: Greetings Sally, Thanks for your reply. On 21/02/2008, Sally Floyd wrote: > > Actually, the *real world* contains users whose behavior is a function > of congestion and download times experienced so far. And in the real > world (with current TCP), users over connections with longer RTTs > have much slower download times that users over connections with > shorter RTTs. And therefore will download less. > > But since our simulations and experiments don't yet have user > behavior sensitive to past congestion and to past download times, > this doesn't happen in our simulations and experiments... True. However, we can easily model "users with long RTTs choose to download less" in a way which doesn't need their behaviour to reflect actual experience. We can just choose the load at each RTT. > > Yes, the long flows will have more unfilled demand. However, if the > > simulation has been run long enough, the unfilled demand of the > > remaining flows will be a small fraction of the total data. > > Yep, if the average load is less than 100%. If the average load is > greater than 100%, then the unfilled demand increases and > increases, the longer we run the simulation, with a lot > of the unfilled demand from the longer-RTT flows. True. In the "better models" paper, were the RTT comparison tests run at over 100% load? I would have thought that comparing the RTT distribution at a load which lets all the traffic through would be the natural setting. > For me, the reason to take measurements over the second half > of the experiment is to avoid the odd and atypical period in the > beginning of the simulation when all flows are slow-starting at > the same time. Yes, that is certainly the biggest artefact to avoid. > But personally, I am perfectly happy to run > simulations for finite, specified time periods when the average > load is greater than 100%, and there is no equilibrium. > (In fact, I think it is probably quite necessary, if one wants scenarios > with higher levels of congestion over the lifetime of the simulation.) OK. I'll get back to you on this when/if I try some simulations which start in equilibrium... > Yep, I am happy with the current text. Great. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820 Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan From sallyfloyd at mac.com Thu Feb 21 18:05:43 2008 From: sallyfloyd at mac.com (Sally Floyd) Date: Thu, 21 Feb 2008 18:05:43 -0800 Subject: [Tmrg] Mix of RTTs In-Reply-To: References: Message-ID: Lachlan - >>> Yes, the long flows will have more unfilled demand. However, if the >>> simulation has been run long enough, the unfilled demand of the >>> remaining flows will be a small fraction of the total data. >> >> Yep, if the average load is less than 100%. If the average load is >> greater than 100%, then the unfilled demand increases and >> increases, the longer we run the simulation, with a lot >> of the unfilled demand from the longer-RTT flows. > > True. In the "better models" paper, were the RTT comparison tests run > at over 100% load? I would have thought that comparing the RTT > distribution at a load which lets all the traffic through would be the > natural setting. For the "better models" paper, I didn't calculate the load. (The simulation scripts are on-line at "http://www.icir.org/models/sims.html", but there is a mix of long-lived traffic and traffic from the traffic generator. And the traffic from the traffic generator is specified by specifying the session arrival rate, the average connection size in packets, etc.) You are of course right that the distribution of RTTs shown in Figure 5 of the "better models" paper would be a function of the level of load... - Sally http://www.icir.org/floyd/