From the distributed computation theory, it is known that the core of scalability concerns associated with routing on dynamic graphs can be rigorously reformulated in the form of the triangle of trade-offs between the routing table size, adaptation costs, and stretch. The term "trade-off" used here means that, for example, routing table size decrease comes with the price of increase of stretch.
Note that in distributed computing, the routing table size is sometimes referred to as the memory space, meaning the amount of memory required by a routing algorithm (also called scheme) either per node (so called, local space) or for the whole network (the total memory space).
Adaptation costs usually refer to the amount of messages generated by a routing scheme per topology change, or to the sizes of those messages, or to some similar parameters (like those measuring routing stability, etc.). For example, thanks to works of Tim Griffin et al., and to experimental observations, we know that the adaptation costs of today's BGP routing are not upper-bounded at all since there are scenarios when the routing path vector algorithm implemented by BGP does not converge but oscillates persistently.
It is widely known that the routing table size and adaptation costs are the real scalability problems with BGP routing in the Internet today. But what is the stretch?
The stretch is a multiplicative path length increase factor associated with a given routing scheme. To clarify, take a graph and a pair of nodes in it, and then calculate shortest path length between them, then take the routing scheme, run it on the graph and calculate the distance between the same pair of nodes but along the path produced by the routing scheme. This second path length can be greater than the shortest distance, obviously. Find the ratio between this greater path length and the shortest path length. The stretch factor is the maximum of this ratio over all node pairs over all the graphs to which the routing scheme applies, while the average stretch is obviously the corresponding average. Given this definition, stretch-1 routing is synonymous to shortest path routing.
Are there any concerns associated with the stretch in practice (as opposed to in theory) today? The answer is fortunately no, since BGP implements a stretch-1 routing algorithm (although there is some "external" stretch associated with policy routing).
Clearly, what makes the theoretical scalability trade-offs outlined above the real problems is the properties of the real network, that is, the Internet. The Internet is a massive dynamic network. More importantly, the Internet topology is scale-free, meaning that the node degree distribution of the Internet inter-AS graph (as well as the router-level graph) is a power law with a negative exponent and, hence, it lacks any characteristic scale as opposed to Poisson node degree distributions in classical Erdos-Reny random graphs. A particularly important fact is that scale-free networks possess the so called small-world property: they are so densely connected that the average shortest path length (or, simply, the average distance) is very low, and so is the width of the distance distribution, so that one can say that the small-world graphs have very small first two moments of the distance distribution. Recall, that the average AS-hop distance in the Internet is approximately 3.5 and the dispersion of its distance distribution is around 1. Approximately 86% of AS pairs are at the distance of 3 or 4 AS hops. This small-world property affects drastically the traditional views, approaches, and even the methodology of network-related research. Scale-free networks possess rather intricate properties that are quite opposite to the properties of many other network classes that have been traditionally considered, very unfortunately, as more or less perfect models for realistic networks. As known today, those models have very little to do with reality.
The immediate causes of the first major practical problem (from the two mentioned above) with BGP routing today (the table size growth) are well studied. These causes are multihoming, increased topology density because of more peering (the "smaller-world" effect), inbound traffic engineering, address allocation policies, etc. There are several efforts, proposals, and even new routing architectures that are aimed at addressing those immediate causes. The most notorious efforts are the work of the MULTI6 working group, Atomized routing, and ISLAY routing architecture. However, for any of those proposals, it is easy to see that, if deployed, they would provide only a more or less short-term solution just because they address not the root problem but its consequences.
What is the root problem then? The answer is obvious. The problem is ultimately caused by the fundamental inefficiencies of algorithms underlying today's BGP (recall that BGP was originally constructed without any references to theoretical considerations of its performance on realistic network topologies), and the path to a potential solution must go through finding routing algorithms with better theoretical memory space mean values and upper bounds on scale-free graphs.
In the networking community, the single idea on how to make the routing table size scale well seems to be aggregation, multiple levels of hierarchical network partitioning in the Kleinrock-Kamoun (KK) style (L. Kleinrock and F. Kamoun, "Hierarchical Routing for Large Networks: Performance Evaluation and Optimization", Computer Networks, 1977, 1:155-174), where the idea of hierarchical network partitioning was introduced and the trade-off between memory space and stretch was analyzed for the first time. Unfortunately, the scheme introduced there did not provide any algorithm for construction of the required partitioning. Furthermore, assuming that a required partitioning exists, the authors showed that the stretch produced by their scheme would be reasonable low only for very sparsely connected networks, while realistic scale-free networks are, on the contrary, extremely densely connected. As was recently shown (D. Krioukov, K. Fall, and X. Yang, "Compact Routing on Internet-Like Graphs", INFOCOM 2004) (KFY), the stretch produced by the KK scheme on the Internet topology would be extremely high. This is another reason why the routing architectures based on the idea of hierarchical network partitioning (Nimrod, ISLAY, etc.) are not realistic (the primary reason being the lack of any algorithm to construct required partitioning for a given network topology).
On the other hand, it would be quite hard to expect that no progress has been made for routing in distributed computing since 1977. So that, one can turn his attention to more modern, compact routing schemes that guarantee very small routing table sizes but with smaller (than in the KK case) increase of stretch. Note that the stretch has to be allowed to increase since generic (applicable to all graphs) shortest path routing was shown to be incompressible (the shortest path routing table size lower bound cannot be made reasonably small for some graphs).
However, it is straightforward to see that ability to do stretch-1 routing is an essential requirement for any realistic Internet routing architecture. So that, the primary concern with those compact routing schemes is how far their average stretch on scale-free graphs is from 1. In other words, the very specific first question on the path toward scalable Internet routing is this: is it possible to construct a routing scheme that would be characterized by simultaneously low memory space and stretch on the scale-free graphs with Internet-like topologies? This question has been answered positively in KFY. In other words, one of the three trade-offs mentioned above has been shown to have a positive resolution, assuming the adaptation costs are unbounded.
2.1.1 Other Schemes
The analysis similar to KFY can be done for some other static compact routing schemes. The average stretch of the scheme considered in KFY on the Internet interdomain graph has still been found to be somewhat high (~1.1). There are routing schemes with slightly higher memory space upper bounds, but those schemes are expected to result in much lower stretch averages on scale-free graphs. Thus, such schemes might be much more attractive from the practical standpoint.
2.1.2 Additive Stretch
Nothing constrains one to considering only multiplicative stretch discussed above. The additive stretch and constructs utilizing it have been also defined and analyzed to some degree. The additive stretch might be more applicable to small-world graphs since the multiplicative stretch seems to be too coarse for graphs where short distances prevail.
2.1.3 Shortest Path Routing
Nothing is know regarding stretch-1 bounds on scale-free graphs. On one hand, stretch-1 routing is a requirement for Internet routing. On the other hand, stretch-1 routing is incompressible on generic graphs. However, if the class of graphs is narrowed to the scale-free graphs, can any nicer lower bounds be obtained then, and can any stretch-1 routing schemes with nicer upper bounds be then constructed?
2.1.4 Stretch>1 Path Bounds
A closely related question is about the bounds for the amount of stretch>1 paths produced by stretch>1 routing schemes on scale-free graphs. If these bounds turn out to be not greater than the memory space upper bounds, than routing information for non-shortest paths can be augmented with the corresponding shortest path routing information without increasing the memory space upper bounds.
2.1.5 The Dynamic Case
Routing on dynamic graphs is, of course, a much more complicated problem. First of all, it admits several theoretical formulations, some of which are irrelevant. The progress on the problems that are relevant for Internet routing has been very slow. There are very few results obtained for dynamic routing in distributed computing over the past 20 years. One can easily see that the problem is hard at the very fundamental level. The idea of representing a highly dynamic object, a real network, by an intrinsically static construct, a graph, seems to be a root cause of these difficulties. This apprehension is indirectly confirmed by several rather pessimistic lower bounds that are already known. For example, for any generic dynamic stretch-s routing scheme with unbounded memory space and message sizes, not less than n/s messages are required per some topology changes on some n-node graphs. The corresponding bounds for scale-free networks are unknown.
The first step towards the dynamic case is obviously the analysis of the name-independent routing schemes. The scheme considered in KFY is not name-independent: as soon as some node or link failure occurs, there are no bounds for the amount of nodes that need to be relabeled as a result of such topology change. Some significant progress has been recently made on the front of construction of low-stretch name-independent (but still static) compact routing schemes.
In general, the ultimate purpose is to construct a dynamic low-stretch compact routing scheme that would work well on dynamic scale-free graphs, but the analysis of the other two trade-offs (adaptations costs vs. memory space and adaptation costs vs. stretch) narrowed to scale-free graphs has not begun yet at all. This is a virtually untouched research area.
2.1.6 Routing Paradigm Shifts
It is quite plausible that the real cause of slow progress for dynamic routing in the framework of distributed computing is that the problem is fundamentally hard (or even unsolvable) there. So that, it may be reasonable to approach the problem from some other directions. One can see that this idea of horizon expansion in network research becomes more and more popular lately. A good example is few papers presented at SIGCOMM 2003.
As far as routing is concerned, what seems to be the most reasonable first step is to formulate the routing problem in a truly dynamic context. There are quite rigorous considerations (as opposed to just bare enthusiasm) showing that possibility of doing so looks fairly plausible (cf. the Midnight Sun Routing Workshop, Lulea, Sweden, 2002).
2.2.1 Topology
Construction of efficient graph-theoretic routing for realistic networks is clearly impossible without thorough understanding of their topology and evolution. Unfortunately, very little is known about the properties of realistic scale-free networks from the graph-theoretic perspective. Much greater number of results on realistic networks have recently come from statistical mechanics than from computer science. For example, all known results on distance distributions in uncorrelated scale-free graphs were obtained by physicists, and there are no analytical results for distance distribution in strongly correlated scale-free graphs such as the Internet (all growing networks are strongly correlated).
2.2.2 Evolution
Since the discovery of power-laws in the Internet, the number of models trying to explain them has been growing very rapidly. However, there is still no Internet evolution model that would be satisfactory from both the physical and networking standpoints. As a result, the laws governing the Internet evolution remain unclear, which is very unfortunate since knowing such laws would potentially allow one to influence the Internet topology evolution to result in graphs that would be "nicer" with respect to routing.
The Barabasi-Albert (BA) model and its derivatives, popular among physicists, have seen a lot of criticism from the networking community for being too general, not incorporating any domain specifics, and, hence, failing to predict correctly many characteristics of the Internet topology and evolution. The same type of argument has been actively used against the BA model by biologists.
On the other hand, the models proposed by the networking community try to incorporate Internet evolution specifics by introducing a number of non-physical parameters allowing one to easily fit the output of a model to the observed data. It is easy to see that any model with sufficient number of external parameters can be forced to produce any required output by parameter manipulations. A model can be of some theoretical value only when all its parameters can be expressed via physical variables.
Therefore, the lack of appropriate and commonly accepted methodology is the central hindrance on the path towards construction of an Internet evolution model that would be both sufficiently detailed and having physical meaning. While paying special attention to Internet business realities seems to be the right starting point on this path, using the right methodology is the key.
3.1.1 Analysis (similar to KFY) of the latest name-independent compact routing schemes.
3.1.2 Lower bounds for dynamic routing schemes on scale-free graphs.
3.1.3 Shortest path routing bounds on scale-free graphs.
3.1.4 If the answers to points in 3.1.2-3 are "negative", then either:
3.1.4.1 The paradigm-shifting research (physical routing, etc.); or
3.1.4.2 "Delay tolerant networking!" (for the future Internet composed of routable islands with just periodic connectivity between them (i.e. no global "routability")).
3.2.1 Internet topology measurements.
3.2.2 Comparative analysis of Internet AS-level topologies obtained via WHOIS, BGP, and traceroutes.
3.2.3 Economy-based Internet topology evolution models.
3.2.4 Explanation of effects discovered in KFY.
3.2.5 After finding the laws of the Internet topology evolution, can we influence them to evolve to topologies that would be "nicer" with respect to routing?