Issue 50 in ogf-nsi-project: Pathfinding functionality review
Status: New Owner: guy.robe...@gmail.com Labels: Type-Review Priority-Medium FoundInVersion-1.0 FixedInVersion-1.1 New issue 50 by guy.robe...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50 Description of Issue: Pathfinding. Currently the 'network-internal' path-finding is delegated to the network - this is not currently exported - how should this be done? TK: Static vs. dynamic information - topology is more static, connections and vlans are more dynamic. To scale well these should be handled in different ways. JS: Jeroen: Also an issue with vlan blocking found at Netherlight... this also needs to be solved. 'Label' (vlan) swapping needed based on negotiation between neighbouring networks. Jerry: negotiation in tree mode happens at RA when it has decided which connections point is needed. Jeroen: This means that we need some vlan information in NSI request. Jeroen: one STP per vlan does not scale. Jeroen: we need to have a way in the topology to express which vlans are available and which are not. Discussion of Issue: Resolution of Issue:
Updates: Owner: jeroen.v...@gmail.com Labels: -FoundInVersion-1.0 -FixedInVersion-1.1 FoundInVersion-1.sc FixedInVersion-2.0 Comment #1 on issue 50 by guy.robe...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50 Guy Roberts: This issue has been assigned to Jeroen van der Ham to review. We need a way of making the STPs scale, i.e. something better than one STP per vlan. I suggest we investigate the possibility of assigning technology specific attributes and vlan ranges to connection requests.
Comment #2 on issue 50 by jmacau...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50 At the NSI protocol currently stands we would have to fully qualify a reservation request at the root PA. There is no mechanism in place for space based parameters to be negotiated down the tree since each child PA may select a different set. Something to think about when designing the vlan capabilities. The root PA will need to be able to select that target vlan.
Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue. We need to engage on this... Our topology model works just fine for path finding. Period. As is. It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service. Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings. NSI is about *connections* - not VLANs, not waves, not LSPs... The abstraction it presents to the user is a connection model that works regardless of underlying technologies. We want to optimize it so that it is more efficient, but _/it does work now /_- make no mistake on this. And if we want to optimize it, you need to make sure we understand what is happening and that we are optimizing where it will help. Not simply posing one technology specific nuance. Given our present NSI topology, VLAN selection is not a problem. Any application agent (uRA) that wishes to initiate a connection between two endpoints can issue a ReservationRequest specifying the STPs it wants as the endpoints. Since that uRA is making the initial request - we *must* assume it knows the endpoints it wishes to connect. There is no way for a PA to presume to know where the RA wants to connect except by the RA specifying the specific endpoints in the connection request. If the uRA has some internal flexibility and is able to use any endpoint from of a custom set of useable endpoints, then the uRA can/should just choose one itself and ask for that endpoint in the connection request. We must presume that the requesting agent is also aware of which endpoints are available for its purposes if it knows which are usable. And why would any requesting agent say "oh - just give me a connection from anywhere to anywhere...I don't care..." ??? Thats just odd... So do we need to represent every possible label as an endpoint? as an STP? Maybe not... We don't necessarily define a host name for every IP address in our /8 - but those IP addresses nevertheless exist. But we do define host names when we want to disassociate the name of the host from the particular IP address it may currently be using. There are probably more efficient ways to represent the endpoint STPs than full enumeration, but _/the abstraction is critically important: Connections need specific endpoints. /_ So we can either specify a symbolic name (STP) to represent the specifics of the topological point it represents, or we need to find another generic way to represent topological points that are [un-enumerated] STPs. Perhaps an arbitrarily long tuple that describes the topological address: network/switch/blade/port/color/label/label/label/label...etc.? ( this is similar to using the IP address itself rather than a hostname, right?) In this scenario, we can enumerate named STPs, or we could simply use the raw tuple itself in the ReservationReq (e.g. <network><switch><port><label>) to identify the desired endpoint. This implies some added semantics are necessary in the topology relations, and at least a minor change to the CS protocol to accept either form of an endpoint specification,... but it does not break the high level connection abstractions. This is very important. Thoughts? Another key aspect we must now be explicit about is how we do pathfinding. Path selection is a process of selecting a sequence of hops through networks, nodes devices etc. In order for a path finder to decide if a "hop" will work for a path request, the pathfinder must be able to "model" that hop, or predict its function - match a set of service constraints against the parameters that describe that object. For example: In order to decide if a SONET circuit can be crossconnected through a device, one must be able to understand the functioning of that device- is it a Ethernet switch? SDH? Infinband? MPLS? Likewise, if one wants to know if an NSI Network will form a valid component of a "connection", one must know something about how that NSI Network object functions. We are no longer in a monolithic network where every switching device is implicitly the same. So each topological element must have a Transit Function (TF) that describes it. Ala "network", "GOLE", "node", a "switch", etc. Without this, explicitly or implicitly, pathfinding cannot work. End of story. This TF may be a type code: such as F10E300, or C7609, etc. or a generic function: Node, Multiplexor, Demux, Encapsulater, de-encapsulator. switch, etc. and those codepoints must be explicitly defined so that *all* pathfinders can interpret their TF consistently when they are found in a topology. Is DToX ontology something that could function in this manner? Could we tweek it to do so? We can possibly define a set of basic "well known" Transfer Functions, but I doubt we will be able to define /all possible/ transfer functions, and at some point there will be some device or region which we will not know how to model...an opaque region of the topology. In these cases, the only alternative is to have an agent written that models that particular device and ask that agent to model it for us and just give us a pass/fail for our constraints. So while we may be able to model some generic topological functions, and maybe even some common complex devices, we won't be able to practically model in detail complex regions or the exact details of low level devices...so we need to identify an oracle or an agent associated with each such topological object that we can query to see if the object can/will work, and to reserve the resources for this request. So for now, all we have in the NSI topology are "NSI Networks". These *DO* have a Transfer Function. Its implicit, and may be the default, but it is the following: "Any [ingress] port can be crossconnected to any [egress] port - if resources are available." The "any to any" is important, but the real rub is the "if resources are available". So in our current model, the TF is implicit but clear nonetheless, and the "oracle" is the NSA. As we stand now, all NSI Networks by default are assumed to have the "any-to-any" TF. We use that fact to select a candidate path, and then we check resource availability with a ReservationRequest. Since some of our networks cannot actually do "any-to-any" (i.e. they were advertising untruths about their capabilities), those NSAs simply reject the requests they cannot perform as if they had just run out of resources. A robust remote pathfinder will try alternate paths until all possibilities are exhausted. Indeed, to indicate limited VLAN crossconnectivity we could in fact advertise separate networks - one for vlan 1780, one for vlan 1781, etc. This works. Quite well in fact. And short circuits the exhaustive search issue...as all crossconnects *can* actually be made if resources are available, and a remote PF simply has to choose the right egress from the preceding network. IF they are advertising correctly, then the exhaustive VLAN search problem is solved. For example: NL > SL; if NL can swap and indicates so by offering all SDPs in one network, and SL splits into four networks SL80,SL81,SL82,and SL83 to indicate which crossconnects can be made by each service region - each terminating the right link from NL, then it all works...no changes necessary except to the topology description. This may suck as a general solution due to the many VLAN planes required, but its really a flaw with the Ethernet switching capabilities at SL - it can not swap VLANs...no wishing will change that. Further, such VLAN planes may not represent VLAN planes at all - they may represent other types of limitations...so simply because they rub the fur the wrong way doesn't mean they are inappropriate or an aberation that will never really be seen. Its not the topology abstractions or the protocol abstractions at fault. But admittedly, searching the space is slow - which is one problem, but not a fault of the architecture or the protocol. We need to understand why such simple ReservationRequests with known available but limited resources like in the demos takes more than a couple milliseconds... (!) Even latency across oceans cannot be held blameless here - as a more efficeint MTL might require far fewer handshakes...The NSI protocol itself is not chatty. The false assumption we have is that by knowing topology that we will magically be able to speed everything up. There is no free lunch - labels do not fundamentally remove the complexity of the processing and so will only speed things up in certain cases. It would be a tragedy to throw away powerful abstractions for such a small gain. And Global topology won't speed things up either, indeed it is far more likely to overwhelm with deluge of detail. What we *might* be able to do is provide a *better* solution...i.e. even though the complexity is high, we may be able to produce a more exacting path - one that is better in certain measurable ways over conventional hop-by-hop PF. So let me reiterate: Enumerating the STPs we wish to use to form a resource pool is *NOT* a scaling problem. The size of the XML data representation is not the problem - if it were, we would not insist on using XML based representations. Is it the size of the database that concerns us by representing each possible STP for each VLAN??...nah - we would have to do that anyway if each VLAN were actually used for circuits. So that doesn't fly as a scaling issue either. And in fact, the enumeration of every VLAN does not add computational complexity in itself - its linear with the number of VLANs available no matter how many networks there are. And frankly, there is *no* requirement that the XML (OWL) representation must be retained internal to an NSA...we only stipulate that it be in RDF/OWL format for common exchange and common interpretation. An NSA can store the info internally however it deems most efficient. And any decent RDB can index millions of VLAN entries and/or connections and/or semantic relations in a DB effortlessly. So just enumerating the VLANs (labels) as STPs is not really an issue...its just not. Where we have a possible concern is the search space and the potential of an exhaustive search over a large space can in fact be huge. So we want tread carefully, to do so efficiently. But this complexity problem will not be solved with labels - The search space is still the same whether you pose labels as attributes or as enumerated items. And since all you can minimally assume will be available in the topology are the STPs and SDP adjacencies, you might have to do the search anyway like it or not. The real constraints are not choosing a label, its checking the availability of selected resources. The only way to optimize the constraint based search problem to choose the order of your constraint checking so as to prune the search space effectively. VLAN id is probably not an efficient primary constraint to accomplish that. Further, how you represent the topology internal to an NSA is a implementation issue. If a remote pathfinder has to search by trial and error to find a successful VLAN...you are ignoring all the other constraints that could either themselves make the search space enormous or could significantly prune the search space apriori if chosen wisely. We cannot dis an exhaustive search for a useable VLAN unless you also look at all the other constraints for their complexity as well. Pathfinding *IS* hard.. but you must optimize the entire process - but label representation and selection is both a non-issue and the least of our concerns. And having selected a path, there is *still* the reservation process - even if you had neon lights advertising the available VLANids at every boundary, you still need to access the oracle (NSA) to confirm any candiate path. And so you could very easily find yourself in an exhaustive search looking for a VLAN that has the available capacity or adequate buffering or...etc. Labels do not solve the search problem. The only way to solve it is to provide a full detailed topology to every pathfinder (not likely in real world or even the R&E world) or ask the local pathfinder with access to the local topology details and state to select a path for you (much more viable approach). There is a middle ground also, where the remote pathfinder may have some partial (high level) topology view and can select some high level path, and then contact each object (network) along that path and ask them to confirm their portion of it with their internal detailed information. This latter middle ground is IMO the best we can hope for. But even our worst case on minimal topology knowledge works as we saw last week in Seattle. Opaque topology regions are not evil. They can be highly useful. They are how the NSI Framework scales. And they are essential to enable each network to assert their own autonomy and to hide minutia details. Asserting that technology specific nuances are necessary for NSI remote pathfinders to work is simply not true. These past demos are proof. You can't tell me that its a protocol scaling issue if the absolute worst case exhaustive search of a VLAN space consisting of roughly 1000 combinations would take more than a few seconds ... if it does, on a network with 12 nodes, we have some very very poor implementations. And unfortunately, labels do not solve this problem. The problem is the fact that you want to do tree segmentation (i.e. remotely choose the endpoints across each network) and at the same time want the target network to choose the endpoints for you so you don't have to try lots of reservations...duh! What you are implicitly saying is that "I really want the local pathfinder would do the hard stuff because it is so much faster dealing with its own topology and has so much more state knowledge...hmmm.... Finally, I am unable to see a requirement where intermediate VLAN IDs that form a peering between two transit networks are of issue to the original connection request. If the transit network can swap vlans, then you can choose any intermediate vlan you want - i.e. any STP regardless of which VLAN it represents... And if it doesn't do vlan translation but implied it could because it advertised the functionality in the topology and now you are stuck doing an exhaustive search...who's fault is that? Its not a NSI problem if a network misrepresents what it can do or runs out of resource. If a PF simply cannot progress the connection - for VLAN reasons or other resources, its not going to be solved by labels. As discussed above, a reservation can be rejected for lots of reasons leading to an exhaustive search for a viable STP.. In any case, picking an intermediate point because of its VLAN tag is IMO superfluous. Its what old signaling protocols had to do because of the way they were designed. NSI accomplishes this but in a highly structured manner. We have an extremely novel and powerful architecture in NSI. Don't trip over the conventional thinking of conventional protocols...if we want conventional - just use GMPLS. or Q2931 or Q2764... or IDCP (:-) Best regards all- Thanks for reading... Jerry On 11/28/11 11:23 AM, ogf-nsi-project@googlecode.com wrote:
Comment #2 on issue 50 by jmacau...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50
At the NSI protocol currently stands we would have to fully qualify a reservation request at the root PA. There is no mechanism in place for space based parameters to be negotiated down the tree since each child PA may select a different set. Something to think about when designing the vlan capabilities. The root PA will need to be able to select that target vlan.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org http://www.ogf.org/mailman/listinfo/nsi-wg
Hello, On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:
Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue. We need to engage on this...
Our topology model works just fine for path finding. Period. As is. It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service. Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings. NSI is about *connections* - not VLANs, not waves, not LSPs... The abstraction it presents to the user is a connection model that works regardless of underlying technologies.
Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance. Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine". In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels. Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms. I do not think that that is a desirable solution. Let me reiterate: The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially. The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network. The connections between those different networks will then depend on the label-swapping capabilities of those networks. Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label). Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks. Jeroen.
Hello, I think it's most important to identify the requirements that we have for the topology, and work from there: - We need to have some distribution method for topology - This method must be maintainable for changes in the network, so it should allow updates. - It must be possible to request a connection from port A with VLAN X to port B with VLAN X. - It must be possible to request a connection from port A with VLAN X to port B with VLAN Y. Nice to haves: - Dynamic availability information for both links and labels. Do we also want to include a connection from port A to port B where you don't care about the label? After we have the requirements, we can look at solutions, and how these solutions solve those requirements, and what the implications of those solutions are. Jeroen. On 30 Nov 2011, at 11:01, Jeroen van der Ham wrote:
Hello,
On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:
Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue. We need to engage on this...
Our topology model works just fine for path finding. Period. As is. It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service. Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings. NSI is about *connections* - not VLANs, not waves, not LSPs... The abstraction it presents to the user is a connection model that works regardless of underlying technologies.
Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance.
Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine".
In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels. Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms. I do not think that that is a desirable solution.
Let me reiterate: The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially.
The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network. The connections between those different networks will then depend on the label-swapping capabilities of those networks. Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label).
Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks.
Jeroen. _______________________________________________ nsi-wg mailing list nsi-wg@ogf.org http://www.ogf.org/mailman/listinfo/nsi-wg
Rebuttals in line:-) On 11/30/11 5:30 AM, Jeroen van der Ham wrote:
Hello,
I think it's most important to identify the requirements that we have for the topology, and work from there:
- We need to have some distribution method for topology - This method must be maintainable for changes in the network, so it should allow updates. - It must be possible to request a connection from port A with VLAN X to port B with VLAN X. - It must be possible to request a connection from port A with VLAN X to port B with VLAN Y. The first two I am in agreement with. The second two I will argue are not real requirements of topology - they reflect some conventional notions of traditional signaling protocols and assume specific technology. Try to remember that the objective of NSI is to build *connections* - not VLANs per se. In NSI we have an abstracted model of a "connection" as a conduit for transporting payload data between two endpoints. These connections simply ride atop the infrastructure whatever it is. So the VLAN itself is not critical to NSI.
I assert the following: - In an *ethernet* environment and traditional protocols you might expect this to be necessary, but its not a broad based requirement for NSI. We need to generalize the sentiment in order to keep the abstractions. - We *can* reserve a connection from a specific port and VLAN by associating the switch, port, vlan etc with and STP. As long as the RA somehow maps the VLAN to the STP then the RA places that STP as the endpoint. Simple. The CS protocol can do this. Further, since NSI addresses the inter-domain problem - where external agents do not pry into local affairs but ask politely for services to be provided - we have specified NSA/NRMs to deal with local pathfinding and resource allocation. The remote pathfinder may not have sufficient information available to make a VLAN selection...indeed probably will not (VLAN selection is not simply finding an avaialble VLAN id.) I would argue that the RA either knows apriori which STPs are associated with the VLANs it wants, or it doesn't care. But the PA doesn't care. The question is: does a remote pathfinder have access to the technology specific details and the current state information from the local network? I.e. you must know both in order to short circuit the exhaustive search issue. If it does, the remote pathfinder discerns the STP it needs from that topology information and makes a reservation request usng that STP. It is a vlan specific request. But it does not make the CS protocol vlan sensitive. Even if the remote [RA] pathfinder knows which VLAN it wants and knows the associated STP but has no internal state information about the STPs, then it must still guess. So the real issue is not whether we can request a specific VLAN, but knowing which STP represents a specific VLAN, on a particular port, on a particular switch, in a particular network _/and its state/_. This is starting to be a lot of detailed information that is all internal to a foreign network.
Nice to haves: - Dynamic availability information for both links and labels.
As stated above, unless you have this "availability" information for your labels (or *any* termination point), you will end up guessing at their availability - which means you still have not solved the exhaustive search problem. But more generally, availability is in fact "state" information. This is a *real* scaling challenge as state is myriad and changes often. The different state values associated with a topological object as simple as a VLAN might be: a) is it operationally up or down? b) is it allocated or available? c) how much of the resource is available? Flooding this information for 4000 different vlans on every port is impractical, let alone a whole network making this information available to any/every other curious agent. I think we can find a middle ground and say we want to update *some* minimal state such as "operational availability", at some topological aggregation level, but even this is non-trivial given the related aspects across labels, ports, and/or other groupings. And since this is proprietary information, you must be prepared to not have *any* such state associated with the topology you know about. In one respect - learning/knowing about topology is itself a state update in itself, e.g. do you flood/broadcast/publish topology updates when a link goes up/down? or just when it is permanently added or removed from the infrastructure? Topology and State distribution are two heads of the same detail+coherency monster and pose a multitude of serious scaling challenges.(!!) So (IMO) this is useful to explore and we should consider this, but with en eye to the significant scaling issues in a large global multi-domain network.
Do we also want to include a connection from port A to port B where you don't care about the label?
This suggestion is not really a topology issue but a CS Protocol issue - it breaks the "specific endpoint" (point to point) semantics of the Connection request. Do you want a raw unlabeled connection? or a labeled connection but where you don't care which label? Will a stacked label be acceptable (e.g. QinQ)? What if the port is not a basic ethernet port, e.g. what if the port is a WDM port carrying many differnent colors each with different framing? What would you specify as the "endpoint" for a "Connection" request? If you don't care, then why can't the RA take the responsibility to simply select one regardless of the underlying labeling? And thus you leave it to the local NRM to engineer the connection internally to its network between the two *specific* endpoints requested by the RA. The easy answer here is to for the RA to not use tree segmentation at all but to specify a downstream endpoint and a chain request and let the local PF decide the local egress point. If we are going to break the basic pt-to-pt "Connection" abstraction that requires specific endpoints such that the Termination points are no longer fixed but a set of acceptable [constrained] components of a connection, perhaps we should generalize it to treat such sets as /constraints/ on the connection rather than fundamental components of a connection. This could actually work. (This is called /anycast/ in the literature or pt-to-anypt.) We would need to review the CS protocol, but this model would still pose an abstracted "connection" but the abstraction gets a bit wierder: It results in a ordered set of resources who's only requirement is adjecency. This is an interesting prospect, but I would place it as a potential feature of version 3+, possibly along with pt-to-mp, negotiation, and volume requests. What I think is required in the immediate term is the following: - A *concise* means of expressing STPs that does not change their semantics. They are still "Service Termination Points" and still map to particular internal topological constructs, but we find a more efficient representation that also integrates well with the adjacency advertisements (SDPs). The recursive nature of various framing technologies and label stacking would tend to make labels (where they exist) good candidates for children nodes in a topology tree, but perhaps there is a mechanism that would allow us to create "summarized" children (?) in the topology representation that is more efficient means of expressing the semantics but leaves the abstraction and alone. (For instance, if we expand STPs resources to describe a set of cannonically [sequential] tags, we can maintain the current elegance of the model, while at the same time providing a label functionality.) Thoughts? - We also need a means of expressing basic abstract topological objects (e.g. nodes, ports, links, agents) that allows for recursive relational descriptions and maintains the strict technology agnostic abstractions NSI Framework requires, yet allows us to complement this basic ontology with enhanced detail - under local advertisement control of course. Remote PF implementations can leverage whatever detail is available to optimize the PF/Reservation process. Thoughts?
On 30 Nov 2011, at 11:01, Jeroen van der Ham wrote:
Hello,
On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:
Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue. We need to engage on this...
Our topology model works just fine for path finding. Period. As is. It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service. Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings. NSI is about *connections* - not VLANs, not waves, not LSPs... The abstraction it presents to the user is a connection model that works regardless of underlying technologies. Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance.
In most cases you could look at this as actually more likely to work. In SC topo if we had one VLAN in use (25%utilization), we had a 75% chance of a successful second choice. In the scenario above if we have one VLAN in use, we have a 99.92% chance of a correct hit for the second VLAN (!). And we would still have better chances for the first 1000 VLANs we randomly choose!!!! And if we have 1000 vlans in service (25% utilization) we still have a 75% chance of a successful choice.
Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine".
Sure it does. Why not? What would you do? The (implicit) Transit Function of the networks in the path was "We can connect any port to any other port if resources are available." That is a very powerful statement. If that network cannot actually do that, then its not a failure of the topology or the protocol.
A remote pathfinder will never have a crystal ball and ultimately still must consult the local authoritative Resource Manager...So the remote PF (the aggregator in your example above) will *always* be subject to the local NRM rejecting the request no matter how specific or well informed you are. Lables don't solve this basic problem that Remote PF is not authoritative - it *must* consult local agents and is dependent upon them confirming the path...or it fails.
In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels. Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms. I do not think that that is a desirable solution.
While I am afraid and _/literally appalled (!)/_ that some NSAs may have indeed parsed the STP name for a vlan hint, this was incorrect and is easily broken. It makes totally incorrect use of the topo information and is a really REALLY BAD assumption. (I put that vlan info in the STP tag to make it easier for developers to debug things - not as a shortcut for anything...rest assured the next topo file will have no such human readability.) This is like parsing an IP Hostname (www.google.com) to recover its IP address...it doesn't work. I can easily create a topology that describes the same SC layout that breaks those implementations. Would you trust other networks to be so exacting? STPs are symbolic references - they do not contain any technology specific information themselves.
Let me reiterate: The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially.
What do you mean by an "informed" decision? Even if you knew all about
We relied on humans to simply *optimize* the selection order - to "choose wisely". If any arbitrary pair of STPs are requested, the PA should either reserve it or reject it. And if the path finder in the remote NSA "chooses poorly" and is not robust enough to try another possible path, then that is a very weak implementation - not a flaw in either the NSI CS protocol or the Topology we used for SC. Further, even the human end point educated guess could fail in many cases. A reasonable pathfinder *must* be prepared to try alternate paths in the case of blocking conditions or take responsibility for not doing so...its not the standards responsibility to make sure resources are available in every network. Indeed, we could have redefined the topology slightly to reflect the separate VLAN planes at, say, StarLight - by defining sepearate NSI Networks for each VLAN. This would have made explicit in the topology the constraint that certain STPs cannot be cross-connected to other STPs. Just as I get grief about the STPs enumeration, I also got seriously flamed for this approach as well. BUT BOTH WORK! All you need is a fundamentally simple pathfinder. And this latter separated vlan planes approach works better than we had at SC because it expresses more topological constraints than the topology we actually used - and I bet most of the pathfinders would have eaten it up just fine. So I don't want to hear that about seriously flawed implementations and weak pathfinders are the driver excuse for changing the topology model or the abstractions of the architecture. the labels there is no guaranty that the other constraints on the connection are available. i.e. the endpoint (labeled or otherwise) is just one constraint that must be met for success. The chance of finding a successful path is a function of the number of labels, the diameter of the network, *AND* the availability of those labels, *AND* the algorithm for selecting the trial order by the RA, *AND* most importantly the availablitiy of the other transit resources. Yes the worst case is exponential...but the *likelyhood* of the worst case is of equal importance. The easiest way to reduce the lieklyhood of a worst case exhaustive search is to provide *MORE* STPs and do a random trial order. This would make the likelyhood of a hit camparitively much higher. Of course a better solution would be to have access to all topology state...but that poses equally exponentially complex issues and is not going to happen either.
The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network.
While this would work, its not the *only* way to work. Proof: It worked for SC.
The connections between those different networks will then depend on the label-swapping capabilities of those networks. Sigh. Lets face it: The reason VLANs pose a problem is that they block easily. The better networks will implement label swapping switching technologies. Flat vlans just don't scale well on a global basis. Particularly with existing conventional ethernet hardware. For instance: Even if you knew VLAN 1780 was available between StarLight and NetherLight and also available between StarLight and ESnet, if 1780 was in use on the port facing JGNX it would be unavailable to any other crossconnect. It would be blocked for your use between NL and Esnet. Which means you would have to select a different egress VLAN at NL *and* at ESnet. So just knowing which VLANs are available on one port does not tell you if it is available internally or the likelyhood that it might be. Its a crap shoot. A guess. A shot in the dark. Conventional Ethernet sucks for global provisioning. Accept this my child and enlightenment will open your eyes. (:-)
Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label). Exactly. At some point you will realize that pathfinding is not deterministic in an active network - you can optimize the process, but you cannot predict it or find an optimal path unless the global network is static and you know *ALL* the state details. Anything short of this omniscience means that we have to accept the fact that we may encounter blocking in the network for any number of reasons and all we can do is
Seriously, Label Swapping was designed to avoid this issue. LS makes all labels "link local." The label assignment in LS network is *not* based upon label availability within the network but on the link alone. Label swapping can be performed extremely fast and scales well - thus the success of MPLS. If we have ethernet hardware that did VLAN swapping *and* per-port VLAN scoping, we would have label swapping. Flat VLANs become just a bad dream. 802.1ah (PBB) addresses this issue and others. try another path, or fail. Path reservation is a two pass process: A high level candidate path selection followed by a low level confirmation pass...unless the confirmation process completes you cannot use it. And there is no practical way to know apriori which paths will work. You have to try. This is pathfinding.
Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks. I am not sure agree with this. Topology hiding and transfer functions make this a far simpler problem. The overall complexity is not reduced, but we delegate responsibility to agents who have the deatiled information and authority to allocate the resources. So the more topology and state you try to express the harder the problem becomes. At some point we have to accept that summarization is the only way we can hope to make this scale and that pathfinding will be a non-deterministic process - based on probablities of success, but guesses none the less. We want to always "choose wisely" but understand that we won't always be so lucky.
Thanks for your dedication to this issue, Jeroen. I appreciate your intensity. Best regards Jerry
Jeroen. _______________________________________________ nsi-wg mailing list nsi-wg@ogf.org http://www.ogf.org/mailman/listinfo/nsi-wg
Hi, On to the fine points of topology handling: On 1 Dec 2011, at 00:44, Jerry Sobieski wrote:
Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance. In most cases you could look at this as actually more likely to work. In SC topo if we had one VLAN in use (25%utilization), we had a 75% chance of a successful second choice. In the scenario above if we have one VLAN in use, we have a 99.92% chance of a correct hit for the second VLAN (!). And we would still have better chances for the first 1000 VLANs we randomly choose!!!! And if we have 1000 vlans in service (25% utilization) we still have a 75% chance of a successful choice.
I'm not talking about availability, I'm talking about compatibility. As it is right now, the inter-domain pathfinding is done by the Aggregator NSA agent. This aggregator agent has a view of the inter-domain topology, where STPs are mapped to ports with VLANs. So, we assume there are 4096 different STPs for one port. The actual value of the VLAN label is not available to Aggregator agent when it is doing inter-domain pathfinding. A path planned through an inter-domain object currently consists of a consecutive list of STPs. For example (domains are identified using the first letter): [source, A1, B22, B78, C6, C42, D09, destination] We do not know the value of the underlying VLAN labels. The inter-domain segments in this case are (A1,B22), (B78,C6) and (C42,D09). Since we have 4096 labels, there are 4096 different options for each of those segments. How do we know that we use the same VLAN label on all those three segments? It could very well be that this is an empty network, all of them are available, yet we still only have a 1 in 4096^2= 16,777,216 of getting the right option.
In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels. Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms. I do not think that that is a desirable solution. While I am afraid and _/literally appalled (!)/_ that some NSAs may have indeed parsed the STP name for a vlan hint, this was incorrect and is easily broken. It makes totally incorrect use of the topo information and is a really REALLY BAD assumption. (I put that vlan info in the STP tag to make it easier for developers to debug things - not as a shortcut for anything...rest assured the next topo file will have no such human readability.) This is like parsing an IP Hostname (www.google.com) to recover its IP address...it doesn't work. I can easily create a topology that describes the same SC layout that breaks those implementations. Would you trust other networks to be so exacting? STPs are symbolic references - they do not contain any technology specific information themselves.
We are all aware of that. However, with the current topology in the demonstration we did not have any other option.
So I don't want to hear that about seriously flawed implementations and weak pathfinders are the driver excuse for changing the topology model or the abstractions of the architecture.
I am not putting this up as an excuse, I'm observing reality. Pathfinders will indeed have to become more robust and do retries on failed paths. Given my above calculations, that retry should be really robust. We will also have to update our timeout values to use a days timescale, since trying 16 million different paths is going to take a long time.
Let me reiterate: The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially. What do you mean by an "informed" decision? Even if you knew all about the labels there is no guaranty that the other constraints on the connection are available. i.e. the endpoint (labeled or otherwise) is just one constraint that must be met for success.
The chance of finding a successful path is a function of the number of labels, the diameter of the network, *AND* the availability of those labels, *AND* the algorithm for selecting the trial order by the RA, *AND* most importantly the availablitiy of the other transit resources. Yes the worst case is exponential...but the *likelyhood* of the worst case is of equal importance. The easiest way to reduce the lieklyhood of a worst case exhaustive search is to provide *MORE* STPs and do a random trial order. This would make the likelyhood of a hit camparitively much higher. Of course a better solution would be to have access to all topology state...but that poses equally exponentially complex issues and is not going to happen either.
The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network.
While this would work, its not the *only* way to work. Proof: It worked for SC.
I do not want to hear that it worked for SC. We had a flawed implementation on a toy-scale model, where humans were imposing constraints on which paths were requested.
The connections between those different networks will then depend on the label-swapping capabilities of those networks. Sigh. Lets face it: The reason VLANs pose a problem is that they block easily. The better networks will implement label swapping switching technologies. Flat vlans just don't scale well on a global basis. Particularly with existing conventional ethernet hardware. For instance: Even if you knew VLAN 1780 was available between StarLight and NetherLight and also available between StarLight and ESnet, if 1780 was in use on the port facing JGNX it would be unavailable to any other crossconnect. It would be blocked for your use between NL and Esnet. Which means you would have to select a different egress VLAN at NL *and* at ESnet. So just knowing which VLANs are available on one port does not tell you if it is available internally or the likelyhood that it might be. Its a crap shoot. A guess. A shot in the dark. Conventional Ethernet sucks for global provisioning. Accept this my child and enlightenment will open your eyes. (:-)
Conventional Ethernet sucks balls. But we're pretty much stuck with it for a good while yet. Over the next few years label swapping will become easier to do, but on the other hand we will also see a rise in wavelength requests. Wavelength switching is possible, but costly, it is much easier to use the same wavelength through a whole path.
Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks. I am not sure agree with this. Topology hiding and transfer functions make this a far simpler problem. The overall complexity is not reduced, but we delegate responsibility to agents who have the deatiled information and authority to allocate the resources. So the more topology and state you try to express the harder the problem becomes. At some point we have to accept that summarization is the only way we can hope to make this scale and that pathfinding will be a non-deterministic process - based on probablities of success, but guesses none the less. We want to always "choose wisely" but understand that we won't always be so lucky.
Thanks for your dedication to this issue, Jeroen. I appreciate your intensity.
Likewise! I appreciate the discussion. But please try to keep this concise. Jeroen.
Comment #3 on issue 50 by guy.robe...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50 * 2 camps: Jerry/Kudoh-san and Chin/Jeroen * Chin/Jeroen camp would like to see explicit labels, i.e. allow vlans to be explicitly identified – (Explicit) * Jerry/Kudoh-san team would like to allow abstraction and aggregation to be * The down-side with Abstracted will then need some kind of local lookup to find the actual vlan. Pathfinding is also more difficult * The down-side with Explicit is that it means that vlans are advertised – also aggregation is more difficult. This topic will be discussed on mailing list
Comment #4 on issue 50 by guy.robe...@gmail.com: Pathfinding functionality review http://code.google.com/p/ogf-nsi-project/issues/detail?id=50 For a more detailed discussion on this issue see the following Google document: https://docs.google.com/document/d/1xXxRmH9LUuZoq-Ce-1clvLPDBUI9iiJEkmmw2bej...
participants (3)
-
Jeroen van der Ham
-
Jerry Sobieski
-
ogf-nsi-project@googlecode.com