I took a long bike ride today, and during my long periods of oxygen depravation I thought how I could make this "shadow" proposal work.  It occurred to me that we might fix it using a similar feature we implemented in OpenDRAC for diverse route path protection.  In fact, we might be able to extend it to be a more general solution to handle a true bridge and role type of function, but I think this is beyond the scope of release 2.0.

First off we will need to change the hiding the shadow connection behind the same connectionId as the existing reservation, and instead us a new connectionId.  This is where the existing proposal falls over since there is no way to directly address or manipulate the new modified reservation while the existing reservation is still in service.

Secondly, we will need to modify the existing reserve command to take a new type that will identify if the request is a new reservation, a modify of an existing reservation, or perhaps a new reservation that has some dependency on an existing reservation (say for diverse routing).  In addition, if we are doing anything other than a new reservation we will need to provide the connectionId of the existing reservation to provide context for this new reservation.

New reservation - reserve(type="new", connectionId="ABCD");
Modify reservation - reserve(type="modify", connectionId="WXYZ", refConnectionId="ABCD");
New reservation diverse to existing reservation - reserve(type="diverse", connectionId="HIJK", refConnectionId="ABCD");

Now we can still independently control both the original reservation and the new reservation.  For example, if I get a partial reserveFailed I can use terminate(connectionId="WXYZ") to clean up while not impacting original reservation "ABCD".  I can also query both reservations independently and see that there is another reservation related through a pending modify operation.

This also fixes the provision issue in the current shadow proposal.  In the current proposal I would need to Release and then Provision so that we do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions resulting in a traffic hit.  With a separate connectionId I can kick the provision off on the new reservation connectionId="WXYZ" while the existing reservation connectionId="ABCD" is still active.

provision(connectionId="WXYZ") -> Reserved->Provisioning->Provisioned->Activated 

As the new reservation connectionId="WXYZ" transitions to Activated the existing reservation connectionId="ABCD" is transitioned from Activated to Terminated.  We can give a terminated reason of "Modify successful" to track the change.  The termination of the original reservation will be entirely handled by the uPA involved in the modification of the reservation, however, since we are doing the final coordination on the children NSA (i.e. moving from connectionId="ABCD" to connectionId="WXYZ"), these NSA will need to generate a forced_end, or perhaps a new modify_end event up the tree to notify the parents of the modify and termination of connectionId="ABCD".

There is an interesting side-effect of this model.  The new reservation does not need to follow the same path as the original reservation.  We may want to force an "in-place" option to stop additional new domains getting involved in the reservation, but even if they did it should work correctly.  For example, lets say the original path traversed domain B, but when the aggregator NSA did path computation it determined the new reservation should traverse domain C instead of domain B, but all other domains remained the same.  As the provision(connectionId="WXYZ") propagates down the tree the uPA involved in the new reservation activate their resources, and terminate any resources associated with connectionId="ABCD" (obviously, domain B does not get the provision since it has no new resources, and domain C has no old resources).  All other domains generate with original reservation segments generate a forced_end/modify_end event up the tree, which will eventually result in a terminate(connectionId="ABCD") being sent to domain B for its lone segment.

Very interesting solution to the problem, however, there will be some impact on the existing state machine.  I guess we need to decide if overloading the existing reserve command is worth not introducing the new Modify command set and it's associated state machine.

John.

On 2012-07-04, at 7:26 PM, John MacAuley wrote:

Jerry,

I like your thinking.  Chin and I when down a similar path trying to use exiting operation primitives in conjunction with a new Modify primitive.  We ruled out the idea because we needed to add additional conditional logic to the existing primitive in order to handle the modify specific behaviours.  In addition, it just didn't work since there were situations where you couldn't properly provide the needed modify behaviours (specifically around error handling and backing out a modify failure in a subtree).  It began to feel unnatural.  The bastard child of an unholy wedlock. ;-)

I have provided detailed comments below.  I would like you to give deep thought to whether you think trying to overload the existing primitives to support modify and the added complexity is going to really save us anything over a separate modify command set that has an independent state machine from the existing unmodified state machine.

John.

On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:

Hi everyone-

The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time.  Or perhaps to increase the bandwidth, on an existing reservation.  This was supposed to be a very limited functional tweek for v2.0.

Yes, I am still proposing this is the only capability we implement in 2.0.


But then we decide "hitless" was a requirement;  And then we added "path preservation" as a requirement.  It was *assumed* that we needed a unique Modify() primitive to do this...  probably because prior tools have them...      Suddenly, we are re-defining the entire state machine (yet again), and making it still more complex, in order to make this "simple" enhancement. 

Sigh (throwing hands up into the air then smashing head against table)... a new state machine modelling the Modify lifecycle with no changes to the existing state machine.  Two-phase reserve is a separate issue from Modify.


This increasing complexity is actually counter to what we were trying to do in Oxford: to simplify the state machine.  And in general, counter to good protocol design.

Okay, I will not write an extensive dissertation on how this statement is inaccurate.  The original intent of the Oxford state machine was to fix the delayed provisionConfirm message.  Henrick, Chin, and Tomohiro did a great job trying to rationalize the existing state machine and simplify it where possible.  What we landed on was two separate state machines to more simply describe each NSA role.  Out of Oxford we actually ended up with more state machines and states than we went into Oxford with.  Why is this you ask?  It is because we are designing a complex distributed reservation and provisioning system.  This is not a simple task given the behavioural constraints we have placed on the team.  People need to realize that sometimes correctness is complex and not as simple as first thought.  I think is NSI project is a perfect example of this.

I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have.   Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...


You are making a assumption that I believe is the key flaw in your argument - you are assuming that the existing vetted state machine will not change if you reuse the existing operation primitives.  I am not convinced it would not change.  It all comes down to if we decide to model the "shadow connection" conditional states within the machine.

Given the developing complexity, we should step back and re-evaluate  a) the urgency for Modify(),   b) the means/scope of implementing it,   and c) the timeline it will require to "do it properly". 

At the rate this working group makes decisions and closed on actions we could still be debating this next year.  We have time to agree and prototype before closing on the NSI 2.0 specification.  If I may point out, the only things we have actually closed on are changes I made to the WSDL that fixed deficiencies in release 1.1.  So far we have no new features in 2.0 fully agreed and committed.  To be honest, I still do not understand the process we follow.

I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:

In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready.  This shadow approach uses only existing protocol primitives and existing state machine.    (This is similar to John's talk about "bridge and roll"... but without a bridge:-)

Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation.  However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed.   (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.)   The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags.  Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please...

This breaks down completely with anything other than VLAN circuits.  If I have an existing EPL circuit that is encapsulating the entire contents of an Ethernet port end-to-end I have no ability to use another Ethernet port as an STP in this operation.  I must use the existing STP since this could be the only port dropped at my location.  Adding the requirement for an additional set of STPs just to modify the endTime of an existing reservation seems to add unnecessary complexity not only to the NSI implementation, but the end user consuming the service.

Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection.  Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection.  Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict.  This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.  

Looks like we agree on the application requirement.  Woo hoo!


We can easily indicate when we wish to create a shadow Reservation within the existing protocol:
We simply specify an existing ConnectionID in a Reservation Request.

How do I distinguish between an RA wanting to modify an existing reservation and a naughty NSA sending down a duplicate connectionId that we currently reject? I guess we will now always assume it is a modification...

If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation.

So can I assume that the STP are the same?  What else in the reservation needs to be maintained?  Is anything up for change?

Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection.  
When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation.

When you say "working connection" do you mean provisioned and active in the network?  Just to argue your point, we will need to modify the existing state machine to allow reserve requests to arrive on an existing connection which could be in any of the defined states.  The action an RA takes here on confirm is the following Release?

In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. 
When a Release occurs for the working connection, a check is made to see if a shadow is linked to it.   If so, the shadow will then replace the working, and the working connection is Terminated.

Are you saying that the Release operation will trigger a Release of the existing working path, and automatically provision the new modified working path, or are you saying I would need to do another Provision?  I think you mean Release and then Provision so that you do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions.  Anything else will require a complete change of the state machine.

The issue with requiring the Release and then Provision again is that your service will take a traffic hit.  We are not talking about a short blip either.  We are talking a considerable period as the operation message filter down and up the tree multiple times.  I would definitely dismiss this mechanism based solely on this deficiency.  I need something not so intrusive, especially if it is just an endTime extension.

The one key piece missing from this strategy is how do I back out a shadow connection?  Once again, the key point of the two phase commit is to handle a failure to reserve the additional resources across the entire connection path.  How do you handle when part of your shadow reservation fails?  I can't send down a terminate since this will terminate the entire connection.  I can't overload the terminate operation since part of the tree has failed the reservation modification, and therefore, no longer has record of it resulting in a termination of the reservation for those NSA upon receiving the Terminate request.  Even if I force the RA to send down another Reserve to force the shadow connection back to the original pre-modify path there is nothing saying the original resources are even available any longer as they may have been consumed for a new reservation.

Also, when I query the connectionId during this shadow reservation do I see the existing in service reservation, or do I see the modified values?


This process does not change the NSI-CS protocol or the state machine.  It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions.  Pathfinders should to also be enhanced to double-book shadow resources.

I think what you mean to say is that you do not require any new operation primitives.  You have changed the behaviour of the NSI-CS protocol by overloading the existing operation primitives.  I am still not convinced the existing state machine does not need to change, and you have some other issues to address as well.  What I did is call a spade a spade and defined a new operation set to do effectively the same thing as what you are doing.  I guess the big question is do we believe adding the additional complexity to existing operations is worth saving having to introduce an new set of operations better named for the activity at hand.


This "shadow" approach has this major advantage:  Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine.   All the "modification" processing is implemented using existing primitives and state transitions.  The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection.  And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.

I disagree on the hiccup.  Why take the hit when there is no need to?



Finally, as a general recommendation:  Modifying the existing primitives and the associated state machine should be a last resort.  Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged.   We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.

Thoughts?
Jerry



On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,

Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure.  Please study it and prepare questions for the Wednesday call.  We would like to close on this action ASAP.

Thank you,
John.


_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg

_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg