Rebuttals in line:-)

On 11/30/11 5:30 AM, Jeroen van der Ham wrote:

Hello,

I think it's most important to identify the requirements that we have for the topology, and work from there:

- We need to have some distribution method for topology
- This method must be maintainable for changes in the network, so it should allow updates.
- It must be possible to request a connection from port A with VLAN X to port B with VLAN X.
- It must be possible to request a connection from port A with VLAN X to port B with VLAN Y.

The first two I am in agreement with. The second two I will argue are not real requirements of topology - they reflect some conventional notions of traditional signaling protocols and assume specific technology. Try to remember that the objective of NSI is to build *connections* - not VLANs per se. In NSI we have an abstracted model of a "connection" as a conduit for transporting payload data between two endpoints.   These connections simply ride atop the infrastructure whatever it is.   So the VLAN itself is not critical to NSI.

I assert the following:
-   In an *ethernet* environment and traditional protocols you might expect this to be necessary, but its not a broad based requirement for NSI. We need to generalize the sentiment in order to keep the abstractions.
- We *can* reserve a connection from a specific port and VLAN by associating the switch, port, vlan etc with and STP.   As long as the RA somehow maps the VLAN to the STP then the RA places that STP as the endpoint. Simple. The CS protocol can do this.

Further, since NSI addresses the inter-domain problem - where external agents do not pry into local affairs but ask politely for services to be provided - we have specified NSA/NRMs to deal with local pathfinding and resource allocation. The remote pathfinder may not have sufficient information available to make a VLAN selection...indeed probably will not (VLAN selection is not simply finding an avaialble VLAN id.) I would argue that the RA either knows apriori which STPs are associated with the VLANs it wants, or it doesn't care. But the PA doesn't care.

The question is: does a remote pathfinder have access to the technology specific details and the current state information from the local network? I.e. you must know both in order to short circuit the exhaustive search issue.   If it does, the remote pathfinder discerns the STP it needs from that topology information and makes a reservation request usng that STP.   It is a vlan specific request.   But it does not make the CS protocol vlan sensitive.

Even if the remote [RA] pathfinder knows which VLAN it wants and knows the associated STP but has no internal state information about the STPs, then it must still guess.

So the real issue is not whether we can request a specific VLAN, but knowing which STP represents a specific VLAN, on a particular port, on a particular switch, in a particular network and its state. This is starting to be a lot of detailed information that is all internal to a foreign network.


Nice to haves:
- Dynamic availability information for both links and labels.

As stated above, unless you have this "availability" information for your labels (or *any* termination point), you will end up guessing at their availability - which means you still have not solved the exhaustive search problem.

But more generally, availability is in fact "state" information. This is a *real* scaling challenge as state is myriad and changes often. The different state values associated with a topological object as simple as a VLAN might be: a) is it operationally up or down? b) is it allocated or available? c) how much of the resource is available? Flooding this information for 4000 different vlans on every port is impractical, let alone a whole network making this information available to any/every other curious agent.

I think we can find a middle ground and say we want to update *some* minimal state such as "operational availability", at some topological aggregation level, but even this is non-trivial given the related aspects across labels, ports, and/or other groupings. And since this is proprietary information, you must be prepared to not have *any* such state associated with the topology you know about.

In one respect - learning/knowing about topology is itself a state update in itself, e.g. do you flood/broadcast/publish topology updates when a link goes up/down? or just when it is permanently added or removed from the infrastructure? Topology and State distribution are two heads of the same detail+coherency monster and pose a multitude of serious scaling challenges.(!!) So (IMO) this is useful to explore and we should consider this, but with en eye to the significant scaling issues in a large global multi-domain network.


Do we also want to include a connection from port A to port B where you don't care about the label?

This suggestion is not really a topology issue but a CS Protocol issue - it breaks the "specific endpoint" (point to point) semantics of the Connection request. Do you want a raw unlabeled connection? or a labeled connection but where you don't care which label? Will a stacked label be acceptable (e.g. QinQ)? What if the port is not a basic ethernet port, e.g. what if the port is a WDM port carrying many differnent colors each with different framing? What would you specify as the "endpoint" for a "Connection" request?    If you don't care, then why can't the RA take the responsibility to simply select one regardless of the underlying labeling? And thus you leave it to the local NRM to engineer the connection internally to its network between the two *specific* endpoints requested by the RA.   The easy answer here is to for the RA to not use tree segmentation at all but to specify a downstream endpoint and a chain request and let the local PF decide the local egress point.

If we are going to break the basic pt-to-pt "Connection" abstraction that requires specific endpoints such that the Termination points are no longer fixed but a set of acceptable [constrained] components of a connection, perhaps we should generalize it to treat such sets as constraints on the connection rather than fundamental components of a connection.   This could actually work. (This is called anycast in the literature or pt-to-anypt.) We would need to review the CS protocol, but this model would still pose an abstracted "connection" but the abstraction gets a bit wierder: It results in a ordered set of resources who's only requirement is adjecency.   This is an interesting prospect, but I would place it as a potential feature of version 3+, possibly along with pt-to-mp, negotiation, and volume requests.

What I think is required in the immediate term is the following:

- A *concise* means of expressing STPs that does not change their semantics. They are still "Service Termination Points" and still map to particular internal topological constructs, but we find a more efficient representation that also integrates well with the adjacency advertisements (SDPs). The recursive nature of various framing technologies and label stacking would tend to make labels (where they exist) good candidates for children nodes in a topology tree, but perhaps there is a mechanism that would allow us to create "summarized" children (?) in the topology representation that is more efficient means of expressing the semantics but leaves the abstraction and alone. (For instance, if we expand STPs resources to describe a set of cannonically [sequential] tags, we can maintain the current elegance of the model, while at the same time providing a label functionality.) Thoughts?

- We also need a means of expressing basic abstract topological objects (e.g. nodes, ports, links, agents) that allows for recursive relational descriptions and maintains the strict technology agnostic abstractions NSI Framework requires, yet allows us to complement this basic ontology with enhanced detail - under local advertisement control of course.   Remote PF implementations can leverage whatever detail is available to optimize the PF/Reservation process. Thoughts?


On 30 Nov 2011, at 11:01, Jeroen van der Ham wrote:

Hello,

On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:

Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue.   We need to engage on this...

Our topology model works just fine for path finding.  Period.  As is.   It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service.  Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings.   NSI is about *connections* - not VLANs, not waves, not LSPs...  The abstraction it presents to the user is a connection model that works regardless of underlying technologies.

Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance.

In most cases you could look at this as actually more likely to work. In SC topo if we had one VLAN in use (25%utilization), we had a 75% chance of a successful second choice. In the scenario above if we have one VLAN in use, we have a 99.92% chance of a correct hit for the second VLAN (!). And we would still have better chances for the first 1000 VLANs we randomly choose!!!! And if we have 1000 vlans in service (25% utilization) we still have a 75% chance of a successful choice.


Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine".

Sure it does. Why not? What would you do? The (implicit) Transit Function of the networks in the path was "We can connect any port to any other port if resources are available." That is a very powerful statement. If that network cannot actually do that, then its not a failure of the topology or the protocol.

A remote pathfinder will never have a crystal ball and ultimately still must consult the local authoritative Resource Manager...So the remote PF (the aggregator in your example above) will *always* be subject to the local NRM rejecting the request no matter how specific or well informed you are. Lables don't solve this basic problem that Remote PF is not authoritative - it *must* consult local agents and is dependent upon them confirming the path...or it fails.


In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels.
Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms.
I do not think that that is a desirable solution.

While I am afraid and literally appalled (!) that some NSAs may have indeed parsed the STP name for a vlan hint, this was incorrect and is easily broken. It makes totally incorrect use of the topo information and is a really REALLY BAD assumption. (I put that vlan info in the STP tag to make it easier for developers to debug things - not as a shortcut for anything...rest assured the next topo file will have no such human readability.) This is like parsing an IP Hostname (www.google.com) to recover its IP address...it doesn't work. I can easily create a topology that describes the same SC layout that breaks those implementations. Would you trust other networks to be so exacting? STPs are symbolic references - they do not contain any technology specific information themselves.

We relied on humans to simply *optimize* the selection order - to "choose wisely". If any arbitrary pair of STPs are requested, the PA should either reserve it or reject it. And if the path finder in the remote NSA "chooses poorly" and is not robust enough to try another possible path, then that is a very weak implementation - not a flaw in either the NSI CS protocol or the Topology we used for SC. Further, even the human end point educated guess could fail in many cases. A reasonable pathfinder *must* be prepared to try alternate paths in the case of blocking conditions or take responsibility for not doing so...its not the standards responsibility to make sure resources are available in every network.

Indeed, we could have redefined the topology slightly to reflect the separate VLAN planes at, say, StarLight - by defining sepearate NSI Networks for each VLAN. This would have made explicit in the topology the constraint that certain STPs cannot be cross-connected to other STPs. Just as I get grief about the STPs enumeration, I also got seriously flamed for this approach as well. BUT BOTH WORK! All you need is a fundamentally simple pathfinder. And this latter separated vlan planes approach works better than we had at SC because it expresses more topological constraints than the topology we actually used - and I bet most of the pathfinders would have eaten it up just fine.

So I don't want to hear that about seriously flawed implementations and weak pathfinders are the driver excuse for changing the topology model or the abstractions of the architecture.


Let me reiterate:
The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially.

What do you mean by an "informed" decision? Even if you knew all about the labels there is no guaranty that the other constraints on the connection are available. i.e. the endpoint (labeled or otherwise) is just one constraint that must be met for success.

The chance of finding a successful path is a function of the number of labels, the diameter of the network, *AND* the availability of those labels, *AND* the algorithm for selecting the trial order by the RA, *AND* most importantly the availablitiy of the other transit resources. Yes the worst case is exponential...but the *likelyhood* of the worst case is of equal importance. The easiest way to reduce the lieklyhood of a worst case exhaustive search is to provide *MORE* STPs and do a random trial order. This would make the likelyhood of a hit camparitively much higher. Of course a better solution would be to have access to all topology state...but that poses equally exponentially complex issues and is not going to happen either.


The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network.

While this would work, its not the *only* way to work. Proof: It worked for SC.

 The connections between those different networks will then depend on the label-swapping capabilities of those networks.

Sigh. Lets face it: The reason VLANs pose a problem is that they block easily. The better networks will implement label swapping switching technologies. Flat vlans just don't scale well on a global basis. Particularly with existing conventional ethernet hardware. For instance: Even if you knew VLAN 1780 was available between StarLight and NetherLight and also available between StarLight and ESnet, if 1780 was in use on the port facing JGNX it would be unavailable to any other crossconnect. It would be blocked for your use between NL and Esnet. Which means you would have to select a different egress VLAN at NL *and* at ESnet. So just knowing which VLANs are available on one port does not tell you if it is available internally or the likelyhood that it might be. Its a crap shoot. A guess. A shot in the dark. Conventional Ethernet sucks for global provisioning. Accept this my child and enlightenment will open your eyes. (:-)

Seriously, Label Swapping was designed to avoid this issue. LS makes all labels "link local." The label assignment in LS network is *not* based upon label availability within the network but on the link alone. Label swapping can be performed extremely fast and scales well - thus the success of MPLS. If we have ethernet hardware that did VLAN swapping *and* per-port VLAN scoping, we would have label swapping. Flat VLANs become just a bad dream. 802.1ah (PBB) addresses this issue and others.

Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label).

Exactly. At some point you will realize that pathfinding is not deterministic in an active network - you can optimize the process, but you cannot predict it or find an optimal path unless the global network is static and you know *ALL* the state details. Anything short of this omniscience means that we have to accept the fact that we may encounter blocking in the network for any number of reasons and all we can do is try another path, or fail.

Path reservation is a two pass process: A high level candidate path selection followed by a low level confirmation pass...unless the confirmation process completes you cannot use it. And there is no practical way to know apriori which paths will work. You have to try. This is pathfinding.

Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks.

I am not sure agree with this. Topology hiding and transfer functions make this a far simpler problem. The overall complexity is not reduced, but we delegate responsibility to agents who have the deatiled information and authority to allocate the resources. So the more topology and state you try to express the harder the problem becomes. At some point we have to accept that summarization is the only way we can hope to make this scale and that pathfinding will be a non-deterministic process - based on probablities of success, but guesses none the less. We want to always "choose wisely" but understand that we won't always be so lucky.

Thanks for your dedication to this issue, Jeroen. I appreciate your intensity.

Best regards
Jerry


Jeroen.
_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/nsi-wg