I like your thinking. Chin and I when down a similar path trying to use exiting operation primitives in conjunction with a new Modify primitive. We ruled out the idea because we needed to add additional conditional logic to the existing primitive in order to handle the modify specific behaviours. In addition, it just didn't work since there were situations where you couldn't properly provide the needed modify behaviours (specifically around error handling and backing out a modify failure in a subtree). It began to feel unnatural. The bastard child of an unholy wedlock. ;-)
I have provided detailed comments below. I would like you to give deep thought to whether you think trying to overload the existing primitives to support modify and the added complexity is going to really save us anything over a separate modify command set that has an independent state machine from the existing unmodified state machine.
John.
On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:
Hi everyone-
The connection modification capability for version 2.0 was initially
presented as a simple enhancement to extend the scheduled end time.
Or perhaps to increase the bandwidth, on an existing reservation.
This was supposed to be a very limited functional tweek for v2.0.
Yes, I am still proposing this is the only capability we implement in 2.0.
But then we decide "hitless" was a requirement; And then we added
"path preservation" as a requirement. It was *assumed* that we
needed a unique Modify() primitive to do this... probably because
prior tools have them... Suddenly, we are re-defining
the entire state machine (yet again), and making it
still more complex, in order to make this "simple" enhancement.
Sigh (throwing hands up into the air then smashing head against table)... a new state machine modelling the Modify lifecycle with no changes to the existing state machine. Two-phase reserve is a separate issue from Modify.
This increasing complexity is actually counter to what we were
trying to do in Oxford: to simplify the state machine. And
in general, counter to good protocol design.
Okay, I will not write an extensive dissertation on how this statement is inaccurate. The original intent of the Oxford state machine was to fix the delayed provisionConfirm message. Henrick, Chin, and Tomohiro did a great job trying to rationalize the existing state machine and simplify it where possible. What we landed on was two separate state machines to more simply describe each NSA role. Out of Oxford we actually ended up with more state machines and states than we went into Oxford with. Why is this you ask? It is because we are designing a complex distributed reservation and provisioning system. This is not a simple task given the behavioural constraints we have placed on the team. People need to realize that sometimes correctness is complex and not as simple as first thought. I think is NSI project is a perfect example of this.
I think the existing
state machine has been thoroughly vetted and is adequate for the
protocol, and that we should consider functions like "Modify" as
higher layer constructs that should be implemented using the
existing atomic primitives we already have. Things like protection
circuits, and diversity attributes, and the like will all pose
similar challenges - and we cant keep changing the state machine
everytime someone has a "simple" feature they can't live without...
You are making a assumption that I believe is the key flaw in your argument - you are assuming that the existing vetted state machine will not change if you reuse the existing operation primitives. I am not convinced it would not change. It all comes down to if we decide to model the "shadow connection" conditional states within the machine.
Given the developing complexity, we should step back and
re-evaluate a) the urgency for Modify(), b) the means/scope of
implementing it, and c) the timeline it will require to "do it
properly".
At the rate this working group makes decisions and closed on actions we could still be debating this next year. We have time to agree and prototype before closing on the NSI 2.0 specification. If I may point out, the only things we have actually closed on are changes I made to the WSDL that fixed deficiencies in release 1.1. So far we have no new features in 2.0 fully agreed and committed. To be honest, I still do not understand the process we follow.
I would like to also propose an alternative "shadow" approach to
provide a modify capability in version 2.0:
In a shadow approach, we build a simple second "shadow" connection
reservation, and then perform a Release()-Provision() sequence to
cut over to the modified service instance when ready. This shadow
approach uses only existing protocol primitives and existing state
machine. (This is similar to John's talk about "bridge and
roll"... but without a bridge:-)
Currently, a separate circuit approach like this would require
separate STPs as endpoints for the modified connection reservation.
However, given virtual STPs (e.g. VLANs), a shadow connection would
not *really* need to terminate at the same source or destination STP
to be useful - i.e. the A and Z endpoints of a modified connection
could be different VLANs without imposing any detectable performance
hit on end-to-end data flow (!) - the sending system simply begins
using a new tag when the shadow provisioning is completed. (This
requires the end systems agents to know this will occur, but,
strictly speaking, this is entirely feasible.) The shadow path
would likely even be along the same geographic route - i.e. the
packets would transit all the same network infrastructure, just with
different tags. Given this situation, the need to "modify" an
*existing* connection, particularly with ethernet based STPs, seems
somewhat unnecessary if you can simply request another connection
with the desired new attributes along the same path and start using
it whenever you please...
This breaks down completely with anything other than VLAN circuits. If I have an existing EPL circuit that is encapsulating the entire contents of an Ethernet port end-to-end I have no ability to use another Ethernet port as an STP in this operation. I must use the existing STP since this could be the only port dropped at my location. Adding the requirement for an additional set of STPs just to modify the endTime of an existing reservation seems to add unnecessary complexity not only to the NSI implementation, but the end user consuming the service.
Being pragmatic though, there are many applications that will not be
able to change their termination point, thus the source/destination
STPs should be simultaneously acceptable for both the shadow
connection as well as the working connection. Likewise, other
resources (say bandwidth) may not be sufficient to reserve a
completely separate upgraded Connection, and so the path finders
ought to be able to "double-book" resources assigned to the working
connection to be used by the shadow connection. Since the working
conenction and the shadow connection should never both be active,
this double booking will never cause a conflict. This ability for
shadows to double-book resources of their working counterpart
provides the functionality we initially wanted: simply upgrading the
existing path.
Looks like we agree on the application requirement. Woo hoo!
We can easily indicate when we wish to create a shadow Reservation
within the existing protocol:
We simply specify an existing
ConnectionID in a Reservation Request.
How do I distinguish between an RA wanting to modify an existing reservation and a naughty NSA sending down a duplicate connectionId that we currently reject? I guess we will now always assume it is a modification...
If the ReservationRequest
specifies an existing Reservation rather than a new Reservation,
then a [new] shadow Reservation/Connection is to be created and
linked to the original "working" reservation.
So can I assume that the STP are the same? What else in the reservation needs to be maintained? Is anything up for change?
Thus, an otherwise
normal Connection is identified as a "shadow" connection solely by
the link to a working Connection.
When a reservation is
confirmed, if it links to a working connection, the RA will
immediately replace the working with the shadow and Terminate the
working reservation.
When you say "working connection" do you mean provisioned and active in the network? Just to argue your point, we will need to modify the existing state machine to allow reserve requests to arrive on an existing connection which could be in any of the defined states. The action an RA takes here on confirm is the following Release?
In the one case where the working connection
is Active, the shadow will remain in its Reserved state as if it had
passed the start time and was awaiting a provision request.
When a
Release occurs for the working connection, a check is made to see if
a shadow is linked to it. If so, the shadow will then replace the
working, and the working connection is Terminated.
Are you saying that the Release operation will trigger a Release of the existing working path, and automatically provision the new modified working path, or are you saying I would need to do another Provision? I think you mean Release and then Provision so that you do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions. Anything else will require a complete change of the state machine.
The issue with requiring the Release and then Provision again is that your service will take a traffic hit. We are not talking about a short blip either. We are talking a considerable period as the operation message filter down and up the tree multiple times. I would definitely dismiss this mechanism based solely on this deficiency. I need something not so intrusive, especially if it is just an endTime extension.
The one key piece missing from this strategy is how do I back out a shadow connection? Once again, the key point of the two phase commit is to handle a failure to reserve the additional resources across the entire connection path. How do you handle when part of your shadow reservation fails? I can't send down a terminate since this will terminate the entire connection. I can't overload the terminate operation since part of the tree has failed the reservation modification, and therefore, no longer has record of it resulting in a termination of the reservation for those NSA upon receiving the Terminate request. Even if I force the RA to send down another Reserve to force the shadow connection back to the original pre-modify path there is nothing saying the original resources are even available any longer as they may have been consumed for a new reservation.
Also, when I query the connectionId during this shadow reservation do I see the existing in service reservation, or do I see the modified values?
This process does not change the NSI-CS protocol or the state
machine. It incur [minor] code additions to the existing
primitives, but does not change the event driven state transitions.
Pathfinders should to also be enhanced to double-book shadow
resources.
I think what you mean to say is that you do not require any new operation primitives. You have changed the behaviour of the NSI-CS protocol by overloading the existing operation primitives. I am still not convinced the existing state machine does not need to change, and you have some other issues to address as well. What I did is call a spade a spade and defined a new operation set to do effectively the same thing as what you are doing. I guess the big question is do we believe adding the additional complexity to existing operations is worth saving having to introduce an new set of operations better named for the activity at hand.
This "shadow" approach has this major advantage: Since it is
essentially just building a second reservation, it does not require
changing the fundamental NSI-CS protocol or the state machine. All
the "modification" processing is implemented using existing
primitives and state transitions. The cost to the user is minimal:
a single *potential* brief hit as the A and Z endpoints are switched
to the [new/modified] connection. And since the user initiated the
modify() in the first place, and will need to adjust the behaviour
of the application to take advantage of the new characteristics, it
does not seem unreasonable to expect the user to be able to deal
with a hiccup - if it occurs.
I disagree on the hiccup. Why take the hit when there is no need to?
Finally, as a general recommendation: Modifying the existing
primitives and the associated state machine should be a
last
resort. Any new feature should have a very strong case for
modifying the NSI-CS state machine, and alternatives that do not do
so should be strongly encouraged. We should only modify the NSI
core protocols in order to simplify them, delivering additional
features through higher level service constructs wherever possible.
Thoughts?
Jerry
On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you,
John.
_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg