From: rr-fs-bounces@caida.org on behalf of Dmitri Krioukov [dima@caida.org]
Sent: Tuesday, August 23, 2005 4:03 AM
To: routopia-wg@caida.org
Cc: rr-fs@caida.org; rrg@psg.com
Subject: [rr-fs] recent progress in topology research

Below is my review of few recent results in topology research. I select the papers to discuss based mostly on my current interests. I split the review into the following two parts: papers from networking and from statistical physics.

  1. Networking
    1. The best SIGCOMM paper award winning http://www.sigcomm.org/sigcomm2004/papers/p599-li.pdf is definitely worth reading. The authors employ degree-preserving rewiring to show that that HOT-like router-level topologies are not the maximally random graphs in the set of simple connected graphs having a given degree sequence. Furthermore, graphs in this set can have different properties, so that a given degree distribution is not a constraint restrictive enough. Note that the first experiments of this kind for AS-level topologies were performed in http://arxiv.org/abs/cond-mat/0205379 (but see also II.B and II.C below).
    2. The Netdimes project http://www.netdimes.org/ implementing the idea of traceroute@home has produced the first public results in http://arxiv.org/abs/cs.NI/0506099. The idea is to find 'hidden tangential links', i.e. peering links between small ISPs, by having sources of traceroute measurements placed at the periphery of the network -- namely, at the PCs of the project participants: anyone can download the agent and become a participant. These tangential links can hardly be discovered by any other means. The first results seem to confirm findings in II.A.1 below of estimates of (in)accuracies of resource-limited traceroute-like explorations of different network topologies.
    3. http://arxiv.org/abs/cs.NI/0502085 delivers the best available generator of simple connected random graphs with power-law degree sequences. It improves on http://www.cc.gatech.edu/~gantsich/TopologyGenerators/TheMCSimulationMethodForGeneratingConnectedPLRG.pdf
  2. Statistical physics
    1. Sampling biases of traceroute-like topology measurements
      1. http://arxiv.org/abs/cs.NI/0412007 (there are also two journal versions: for Theoretical Computer Science and Physical Review E) essentially dismisses fears http://www.ieee-infocom.org/2003/papers/09_01.PDF that the real Internet topology might be qualitatively different than the measured one. The authors employ the 'mean field' approximation (supported with simulations) to analyze as to what degrees a wide range of the most commonly used statistical characteristics of traceroute-measured topologies can differ from the same characteristics of the real topologies under probe. Roughly, the main finding is that these differences cannot be large. In principle, the Internet has a chance to be a classical Erdos-Renyi (ER) random graph in reality, but then its average node degree must be of the order of the measured maximum degree. The latter means that the overwhelming majority of nodes in the real topology must have degrees very close to the observed maximum degree since the node degree distribution in ER graphs is Poissonian, narrowly centered around its average. We know for sure that 90% of ASs in the Internet do not have degree of around 2000, therefore the real Internet can only be quantitatively, not qualitatively, different from the measured one. The authors also show then that the more fat-tailed the betweenness distribution -- the betweenness distribution in the Internet is fat-tailed http://www.caida.org/analysis/topology/as_topo_comparisons/ -- the higher the quantitative accuracy of traceroute-like probing. Those who are interested in quick reading should go directly to the conclusions section of the paper containing a nice summary of implications. One of them, mentioned at the very end, questions a potential value of distributed traceroute@home-like explorations with large numbers of sources (cf. I.2 above).
      2. A more rigorous approach to the same problem yields a smaller number of useful results: http://arxiv.org/abs/cond-mat/0503087 derives only the degree distributions seen by traceroute-like measurements with one source only. The paper confirms that an ER graph can produce a power-law distribution.
    2. Equilibrium vs. non-equilibrium network models (aka top-down vs. bottom-up)
      There are two approaches to modeling complex networks. The equilibrium (top-down) approach is to construct an ensemble of static random graphs reproducing certain properties of observed networks and then to derive their other properties by the standard methods. The classic example of this approach is PLRG. The non-equilibrium approach (bottom-up) tries to mimic the actual dynamics of network growth: if this dynamics is accurately captured, then the modeling algorithm, when let to run to produce a network of the required size, will output the topology coinciding with the observations. The classic examples of this approach are the Barabasi-Albert (BA) and HOT models. In reality, the ultimate truth is clearly behind the more ambitious non-equilibrium approach. In practice however, the non-equilibrium approach is much more involved, both theoretically and methodologically, and as such it has so far generated a much smaller number of practically useful results.
      1. Equilibrium ensemble construction procedures and their analysis
        1. http://arxiv.org/abs/cond-mat/0405566 might look, for a networking community reader, too involved mathematically, but one is encouraged to check at least Sections I and II. At the very beginning, the authors point out that the bottom-up approach in complex network studies is akin to kinetic theory of gases in physics: while trying to be more realistic, it quickly becomes over-complicated, intractable, and eventually unproductive. The authors then discuss how the standard tools of equilibrium statistical mechanics can be directly applied to analysis of random graph ensembles.
        2. Note that the first consistent set of coherent definitions followed by in-depth analysis appeared much earlier in http://arxiv.org/abs/cond-mat/0204111
        3. However, the 'Polish group' has accumulated probably the largest number of results in this area. A couple of recent highlights include the papers analyzing properties of both equilibrium, http://arxiv.org/abs/cond-mat/0502124, as well as non-equilibrium, http://arxiv.org/abs/cond-mat/0503548, networks (the authors call them 'homogeneous' and 'causal', respectively). Their work resulted also in a public release of the graph generator software producing ensembles of random graphs with prescribed expected degree sequences http://arxiv.org/abs/cond-mat/0506330 Interestingly, all the last papers, starting 2001, of Andre Krzywicki, who is very distinguished http://qcd.th.u-psud.fr/page_perso/Krzywicki/, are dedicated to the subject of equilibrium statistical mechanics of complex networks http://arxiv.org/find/cond-mat/1/au:+Krzywicki_A/0/1/0/all/0/1.
      2. HOT FKP revisited
        1. http://arxiv.org/abs/cond-mat/0407643 analytically solves a slight modification of the HOT FKP model, shows lack of 'clean' power-laws in it because of unavoidable degree distribution degeneracy---condensation of "stubs on hubs", which can be easily reproduced in experiments. The authors go further to show that, in the large network limit, power laws are necessarily absent in any HOT/FKP model modification involving intervertex distance optimization. The power laws can partially appear though but only in a form of pre-asymptotic finite-size effects. That is, the situation is almost exactly the same as with the PFP model http://arxiv.org/abs/cs.NI/0402011 most accurately reproducing Internet topology properties among all the non-equilibrium models.
        2. The subset of the same findings (i.e., degree distribution degeneracy) had been previously proven in http://research.microsoft.com/~borgs/Papers/FKPdgrees.pdf
      3. Hidden variables http://arxiv.org/abs/cond-mat/0306072 contribution highlights:
        1. develops the recently introduced formalism of hidden variables, which is an alternative to the BA preferential attachment: every node is assigned a tag drawn from some distribution and connection between node pairs are formed according to some probability function of tag values at the nodes. These hidden variables can be related to expected node degrees, or it can something completely different, e.g. ISP brand name...
        2. proposes an algorithm to construct maximally random graphs with the prescribed expected degree correlations (i.e., the joint degree distribution). http://arxiv.org/abs/cond-mat/0308336v1 independently proposes the same algorithm.
        3. demonstrates that non-equilibrium growing network models can be formally mapped to equilibrium models with one of the hidden variables being... time!
    3. Correlations
      All real networks, including the Internet, are growing, and all growing networks necessarily exhibit various types of correlations which significantly complicate topology analysis. One of the central problem in this area is to try to differentiate between 'forced' and 'natural' correlations: forced correlations are those due to constraints imposed by a combination of other network properties (e.g., degree correlations due to network's having a given degree sequence, being simple and connected), while natural correlations are those appearing in addition to forced ones, 'inside' the forced constraints. Thus, ability to differentiate between forced and natural correlations is critical for these two purposes: 1) only the natural correlations can tell us how 'far away' a real network is from its maximally random equilibrium counterpart respecting the given set of forced constraints (cf. II.B and I.1 above); and 2) only the natural correlation can provide some additional information about the actual laws of the network evolution.
      1. http://arxiv.org/abs/cond-mat/0408110 and somewhat earlier http://arxiv.org/abs/cond-mat/0303327 address the problem of to which extent certain forms of degree distribution force degree correlations. The first paper even suggests an algorithm producing uncorrelated networks even if the original degree distribution necessarily forces degree correlations. Of course, the resulting degree distribution is slightly different from the original, but this difference is minimized.
      2. http://arxiv.org/abs/cond-mat/0409686 suggests another definition of local clustering decoupled from node degree correlations. The definition seems to be useful, although it is not commonly accepted.
      3. http://arxiv.org/abs/cond-mat/0308444 addresses the problem of to which extent degree correlations force clustering. Surprisingly, we have recently found the formulas from this paper, even though derived for pseudographs, are in an extremely good agreement with observation of clustering in maximally random simple connected graphs with the joint degree distributions observed in the AS-level Internet topologies.
    4. Distance distributions
      The performance parameters (such as stretch) of compact routing algorithms depend mostly on (if not only on) the specifics of the graph's distance distribution. Therefore, knowledge of analytical properties of distance distributions in scale-free networks are critical for moving forward with providing any analytic estimates of the performance of compact routing applied to these networks.
      1. http://arxiv.org/abs/cond-mat/0411160 empirically discovers that the average distance between nodes of degree k and k' is ~ a - b log(kk') in a very wide class of graphs. The authors also try to provide some preliminary explanation of the discovered effect.
      2. http://arxiv.org/abs/cond-mat/0407098 uses random walks to calculate distance distribution in uncorrelated networks. The results closely match simulations.
      3. http://arxiv.org/abs/cond-mat/0411526 is the first paper to analytically derive the average distance from a node of degree k in correlated networks. The results are obtained only for deterministic network models though.

dima.
http://www.caida.org/~dima/