New state machine with two phase reserve and modify
Peoples, Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP. Thank you, John.
Hi everyone- The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time. Or perhaps to increase the bandwidth, on an existing reservation. This was supposed to be a very limited functional tweek for v2.0. But then we decide "hitless" was a requirement; And then we added "path preservation" as a requirement. It was *assumed* that we needed a unique Modify() primitive to do this... probably because prior tools have them... Suddenly, _/we are re-defining the entire state machine/_ (yet again), and making it still more complex, in order to make this "simple" enhancement. This increasing complexity is actually counter to what we were trying to do in Oxford: to /simplify/ the state machine. And in general, counter to good protocol design. I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have. Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without... Given the developing complexity, we should step back and re-evaluate a) the urgency for Modify(), b) the means/scope of implementing it, and c) the timeline it will require to "do it properly". I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0: In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready. This shadow approach uses only existing protocol primitives and existing state machine. (This is similar to John's talk about "bridge and roll"... but without a bridge:-) Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation. However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed. (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.) The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags. Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please... Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection. Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection. Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict. This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path. We can easily indicate when we wish to create a shadow Reservation within the existing protocol: We simply specify an existing ConnectionID in a Reservation Request. If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation. Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection. When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation. In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. When a Release occurs for the working connection, a check is made to see if a shadow is linked to it. If so, the shadow will then replace the working, and the working connection is Terminated. This process does not change the NSI-CS protocol or the state machine. It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions. Pathfinders should to also be enhanced to double-book shadow resources. This "shadow" approach has this major advantage: Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine. All the "modification" processing is implemented using existing primitives and state transitions. The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection. And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs. Finally, as a general recommendation: Modifying the existing primitives and the associated state machine should be a /last/ resort. Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged. We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible. Thoughts? Jerry On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Four main points I do not want to lose track of: 1. The two phase Modify has nothing to do with the type of modification. I am still hoping to only support bandwidth and endTime modifications for release 2.0. The reason we proposed the two phase was because of the complexity of a distributed system and securing resources using the tree model. If we did a single phase then a modify request would change reservations on the way down the tree, and if a modify failure occurred on a branch, then the only way to restore the schedule to a correct state would be to send down another modify with the original schedule parameters and hope it can be restored. The only solution to this problem is to do the pre-allocation of resources, and once all participants confirm they can meet the need, then we commit the change. If we squint our eyes and stand back a bit, we can map two phase modify operations to the existing reservation state machine: modifyRequest = reserveRequest modifyCommitRequest = provisionRequest modifyCancelConfirm = terminateRequest We had looked at overloading the existing reservation operations but it was quickly dismissed as not feasible. 2. The Modify state machine is separate from the existing reservation and provision lifecycle state machine, and therefore, stands on its own while providing no additional complexity to the existing machine. 3. I was asked to propose a two phase reserve to fix the original release 1.0 deficiency we introduced for simplicity and to "get something out there". We were lucky with the existing single phase reservation because we get a pseudo commit by having to provision the circuit. We all agreed to revisit the two phase in release 2.0 so I had no objection to attempt a state machine. I would rather fix it in 2.0 than try to do it in 3.0 and need to support yet another state machine. It should also be noted that the improved and simplified provisioning state machine is still there as is. We just expanded the single reserve state into a two phase commit reservation which can stand on its own. Very elegant if you ask me :-) 4. Lastly, we need to avoid the pitfall of dropping things just so we can build a demo of release 2.0 for October. If we can accommodate these features in the specification for release 2.0, then we should and worry about lining up implementations afterwards. We have done the detailed legwork, and unless there is a fundamental flaw in the concept, I think we need to get it into the specification so we have something useful to put into production. Of course, if we remove the tree model and use a chain model you get the implicit commit for all your operations. John. On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:
Hi everyone-
The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time. Or perhaps to increase the bandwidth, on an existing reservation. This was supposed to be a very limited functional tweek for v2.0.
But then we decide "hitless" was a requirement; And then we added "path preservation" as a requirement. It was *assumed* that we needed a unique Modify() primitive to do this... probably because prior tools have them... Suddenly, we are re-defining the entire state machine (yet again), and making it still more complex, in order to make this "simple" enhancement.
This increasing complexity is actually counter to what we were trying to do in Oxford: to simplify the state machine. And in general, counter to good protocol design. I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have. Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...
Given the developing complexity, we should step back and re-evaluate a) the urgency for Modify(), b) the means/scope of implementing it, and c) the timeline it will require to "do it properly".
I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:
In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready. This shadow approach uses only existing protocol primitives and existing state machine. (This is similar to John's talk about "bridge and roll"... but without a bridge:-)
Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation. However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed. (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.) The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags. Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please...
Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection. Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection. Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict. This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.
We can easily indicate when we wish to create a shadow Reservation within the existing protocol: We simply specify an existing ConnectionID in a Reservation Request. If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation. Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection. When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation. In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. When a Release occurs for the working connection, a check is made to see if a shadow is linked to it. If so, the shadow will then replace the working, and the working connection is Terminated.
This process does not change the NSI-CS protocol or the state machine. It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions. Pathfinders should to also be enhanced to double-book shadow resources.
This "shadow" approach has this major advantage: Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine. All the "modification" processing is implemented using existing primitives and state transitions. The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection. And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.
Finally, as a general recommendation: Modifying the existing primitives and the associated state machine should be a last resort. Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged. We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.
Thoughts? Jerry
On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hey John- see notes inline... (thansk for quick response) On 7/3/12 4:13 PM, John MacAuley wrote:
Four main points I do not want to lose track of:
1. The two phase Modify has nothing to do with the type of modification. I am still hoping to only support bandwidth and endTime modifications for release 2.0. Right. I agree.
The reason we proposed the two phase was because of the complexity of a distributed system and securing resources using the tree model. If we did a single phase then a modify request would change reservations on the way down the tree, and if a modify failure occurred on a branch, then the only way to restore the schedule to a correct state would be to send down another modify with the original schedule parameters and hope it can be restored. Ah. This assumes you do a "modify: that is permanently (.e. unable to roll back) each child. This was not a requirement - this was an assumption/design decision you made. A pathfnder that selects resources based on a relationship to the original conenction, but does not immediately put them into service is simple. Our reserve provision process does this. All we need to do is be able to construct a new Connection that incorporates the existing resources. THis is far easier than modifying an already cmplex state machine. The only solution to this problem is to do the pre-allocation of resources, and once all participants confirm they can meet the need, then we commit the change. Bingo. THis is what we do now with the ReserveReq/Confirm process - reserve down the tree, confirm up the tree - done. Followed by the provision. No new state is missing or required.
If we squint our eyes and stand back a bit, we can map two phase modify operations to the existing reservation state machine:
modifyRequest = reserveRequest modifyCommitRequest = provisionRequest modifyCancelConfirm = terminateRequest
We had looked at overloading the existing reservation operations but it was quickly dismissed as not feasible. "quickly dismissed as not feasible" ? It is entirely feasible, and use my Shadow proposal as prrof that it is feasible.. Indeed, much more feasible than redoing the entire state machine. 2. The Modify state machine is separate from the existing reservation and provision lifecycle state machine, and therefore, stands on its own while providing no additional complexity to the existing machine. Ok. But our intention is to simplify the state machine - not complexify it. We simply do not need to have two state machines to do this.
3. I was asked to propose a two phase reserve to fix the original release 1.0 deficiency we introduced for simplicity and to "get something out there". What deficiency?
We were lucky with the existing single phase reservation because we get a pseudo commit by having to provision the circuit. No - we get a real commit by returning a ReservationConfirm()...nothing pseudo about it.
We all agreed to revisit the two phase in release 2.0 so I had no objection to attempt a state machine. I would rather fix it in 2.0 than try to do it in 3.0 and need to support yet another state machine. I too think we can do this in v2.0, but far simpler than introducing these state machine modifications.
It should also be noted that the improved and simplified provisioning state machine is still there as is. We just expanded the single reserve state into a two phase commit reservation which can stand on its own. Very elegant if you ask me :-) We "expanded it" ... unnecessary. We do things that are necessary - not just because they are eligant. If we need something, then we do indeed want an elegant solution. But an complex and elegant state machine does not trump a simple state machine in protocol terms.
4. Lastly, we need to avoid the pitfall of dropping things just so we can build a demo of release 2.0 for October. If we can accommodate these features in the specification for release 2.0, then we should and worry about lining up implementations afterwards. In general I agree. But the manner in which we choose to implement them can dictate whether we spend 3 months arguing and rehashing the state machine (yet again), or we find a simpler means of doing the same thing. When version 2.0 is ready is a function of what we choose to put into it. And if we decide we need certain features in order to move toward production, and other features are less critical, we can re-prioritize things to stay on a anticipated schedule. If "Modify()" cuases and explosion of code due to re-designing the state machine, then we ought to take it out of v2.0.
We have done the detailed legwork, and unless there is a fundamental flaw in the concept, I think we need to get it into the specification so we have something useful to put into production. The fundamental flaw is the complexity. You are imposing this on every user to interact using these new primitives. When the existing set are sufficient. Just because you worked on the issue (which I apreciate) does not mean we accept it. This is an significantly complex proposal that we should not rush it any more than we should rush other features to make a deadline.
Of course, if we remove the tree model and use a chain model you get the implicit commit for all your operations. The tree chain issue again? Really? This modify is not a state machine issue...its a pathinding and resource allocation issue. We DO NOT NEED to modify the state machine to do this. If the group wants to fine, but it will impose signiicant innecessary complexity on all concerned.
John. Best regards - and I do aprecitae the amount of work that went into the slidedeck. I just don't agree with it. Jerry
On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:
Hi everyone-
The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time. Or perhaps to increase the bandwidth, on an existing reservation. This was supposed to be a very limited functional tweek for v2.0.
But then we decide "hitless" was a requirement; And then we added "path preservation" as a requirement. It was *assumed* that we needed a unique Modify() primitive to do this... probably because prior tools have them... Suddenly, _/we are re-defining the entire state machine/_ (yet again), and making it still more complex, in order to make this "simple" enhancement.
This increasing complexity is actually counter to what we were trying to do in Oxford: to /simplify/ the state machine. And in general, counter to good protocol design. I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have. Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...
Given the developing complexity, we should step back and re-evaluate a) the urgency for Modify(), b) the means/scope of implementing it, and c) the timeline it will require to "do it properly".
I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:
In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready. This shadow approach uses only existing protocol primitives and existing state machine. (This is similar to John's talk about "bridge and roll"... but without a bridge:-)
Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation. However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed. (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.) The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags. Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please...
Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection. Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection. Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict. This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.
We can easily indicate when we wish to create a shadow Reservation within the existing protocol: We simply specify an existing ConnectionID in a Reservation Request. If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation. Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection. When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation. In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. When a Release occurs for the working connection, a check is made to see if a shadow is linked to it. If so, the shadow will then replace the working, and the working connection is Terminated.
This process does not change the NSI-CS protocol or the state machine. It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions. Pathfinders should to also be enhanced to double-book shadow resources.
This "shadow" approach has this major advantage: Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine. All the "modification" processing is implemented using existing primitives and state transitions. The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection. And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.
Finally, as a general recommendation: Modifying the existing primitives and the associated state machine should be a /last/ resort. Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged. We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.
Thoughts? Jerry
On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Jerry, I like your thinking. Chin and I when down a similar path trying to use exiting operation primitives in conjunction with a new Modify primitive. We ruled out the idea because we needed to add additional conditional logic to the existing primitive in order to handle the modify specific behaviours. In addition, it just didn't work since there were situations where you couldn't properly provide the needed modify behaviours (specifically around error handling and backing out a modify failure in a subtree). It began to feel unnatural. The bastard child of an unholy wedlock. ;-) I have provided detailed comments below. I would like you to give deep thought to whether you think trying to overload the existing primitives to support modify and the added complexity is going to really save us anything over a separate modify command set that has an independent state machine from the existing unmodified state machine. John. On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:
Hi everyone-
The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time. Or perhaps to increase the bandwidth, on an existing reservation. This was supposed to be a very limited functional tweek for v2.0.
Yes, I am still proposing this is the only capability we implement in 2.0.
But then we decide "hitless" was a requirement; And then we added "path preservation" as a requirement. It was *assumed* that we needed a unique Modify() primitive to do this... probably because prior tools have them... Suddenly, we are re-defining the entire state machine (yet again), and making it still more complex, in order to make this "simple" enhancement.
Sigh (throwing hands up into the air then smashing head against table)... a new state machine modelling the Modify lifecycle with no changes to the existing state machine. Two-phase reserve is a separate issue from Modify.
This increasing complexity is actually counter to what we were trying to do in Oxford: to simplify the state machine. And in general, counter to good protocol design.
Okay, I will not write an extensive dissertation on how this statement is inaccurate. The original intent of the Oxford state machine was to fix the delayed provisionConfirm message. Henrick, Chin, and Tomohiro did a great job trying to rationalize the existing state machine and simplify it where possible. What we landed on was two separate state machines to more simply describe each NSA role. Out of Oxford we actually ended up with more state machines and states than we went into Oxford with. Why is this you ask? It is because we are designing a complex distributed reservation and provisioning system. This is not a simple task given the behavioural constraints we have placed on the team. People need to realize that sometimes correctness is complex and not as simple as first thought. I think is NSI project is a perfect example of this.
I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have. Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...
You are making a assumption that I believe is the key flaw in your argument - you are assuming that the existing vetted state machine will not change if you reuse the existing operation primitives. I am not convinced it would not change. It all comes down to if we decide to model the "shadow connection" conditional states within the machine.
Given the developing complexity, we should step back and re-evaluate a) the urgency for Modify(), b) the means/scope of implementing it, and c) the timeline it will require to "do it properly".
At the rate this working group makes decisions and closed on actions we could still be debating this next year. We have time to agree and prototype before closing on the NSI 2.0 specification. If I may point out, the only things we have actually closed on are changes I made to the WSDL that fixed deficiencies in release 1.1. So far we have no new features in 2.0 fully agreed and committed. To be honest, I still do not understand the process we follow.
I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:
In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready. This shadow approach uses only existing protocol primitives and existing state machine. (This is similar to John's talk about "bridge and roll"... but without a bridge:-)
Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation. However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed. (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.) The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags. Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please...
This breaks down completely with anything other than VLAN circuits. If I have an existing EPL circuit that is encapsulating the entire contents of an Ethernet port end-to-end I have no ability to use another Ethernet port as an STP in this operation. I must use the existing STP since this could be the only port dropped at my location. Adding the requirement for an additional set of STPs just to modify the endTime of an existing reservation seems to add unnecessary complexity not only to the NSI implementation, but the end user consuming the service.
Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection. Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection. Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict. This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.
Looks like we agree on the application requirement. Woo hoo!
We can easily indicate when we wish to create a shadow Reservation within the existing protocol: We simply specify an existing ConnectionID in a Reservation Request.
How do I distinguish between an RA wanting to modify an existing reservation and a naughty NSA sending down a duplicate connectionId that we currently reject? I guess we will now always assume it is a modification...
If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation.
So can I assume that the STP are the same? What else in the reservation needs to be maintained? Is anything up for change?
Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection.
When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation.
When you say "working connection" do you mean provisioned and active in the network? Just to argue your point, we will need to modify the existing state machine to allow reserve requests to arrive on an existing connection which could be in any of the defined states. The action an RA takes here on confirm is the following Release?
In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. When a Release occurs for the working connection, a check is made to see if a shadow is linked to it. If so, the shadow will then replace the working, and the working connection is Terminated.
Are you saying that the Release operation will trigger a Release of the existing working path, and automatically provision the new modified working path, or are you saying I would need to do another Provision? I think you mean Release and then Provision so that you do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions. Anything else will require a complete change of the state machine. The issue with requiring the Release and then Provision again is that your service will take a traffic hit. We are not talking about a short blip either. We are talking a considerable period as the operation message filter down and up the tree multiple times. I would definitely dismiss this mechanism based solely on this deficiency. I need something not so intrusive, especially if it is just an endTime extension. The one key piece missing from this strategy is how do I back out a shadow connection? Once again, the key point of the two phase commit is to handle a failure to reserve the additional resources across the entire connection path. How do you handle when part of your shadow reservation fails? I can't send down a terminate since this will terminate the entire connection. I can't overload the terminate operation since part of the tree has failed the reservation modification, and therefore, no longer has record of it resulting in a termination of the reservation for those NSA upon receiving the Terminate request. Even if I force the RA to send down another Reserve to force the shadow connection back to the original pre-modify path there is nothing saying the original resources are even available any longer as they may have been consumed for a new reservation. Also, when I query the connectionId during this shadow reservation do I see the existing in service reservation, or do I see the modified values?
This process does not change the NSI-CS protocol or the state machine. It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions. Pathfinders should to also be enhanced to double-book shadow resources.
I think what you mean to say is that you do not require any new operation primitives. You have changed the behaviour of the NSI-CS protocol by overloading the existing operation primitives. I am still not convinced the existing state machine does not need to change, and you have some other issues to address as well. What I did is call a spade a spade and defined a new operation set to do effectively the same thing as what you are doing. I guess the big question is do we believe adding the additional complexity to existing operations is worth saving having to introduce an new set of operations better named for the activity at hand.
This "shadow" approach has this major advantage: Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine. All the "modification" processing is implemented using existing primitives and state transitions. The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection. And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.
I disagree on the hiccup. Why take the hit when there is no need to?
Finally, as a general recommendation: Modifying the existing primitives and the associated state machine should be a last resort. Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged. We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.
Thoughts? Jerry
On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
I took a long bike ride today, and during my long periods of oxygen depravation I thought how I could make this "shadow" proposal work. It occurred to me that we might fix it using a similar feature we implemented in OpenDRAC for diverse route path protection. In fact, we might be able to extend it to be a more general solution to handle a true bridge and role type of function, but I think this is beyond the scope of release 2.0. First off we will need to change the hiding the shadow connection behind the same connectionId as the existing reservation, and instead us a new connectionId. This is where the existing proposal falls over since there is no way to directly address or manipulate the new modified reservation while the existing reservation is still in service. Secondly, we will need to modify the existing reserve command to take a new type that will identify if the request is a new reservation, a modify of an existing reservation, or perhaps a new reservation that has some dependency on an existing reservation (say for diverse routing). In addition, if we are doing anything other than a new reservation we will need to provide the connectionId of the existing reservation to provide context for this new reservation. New reservation - reserve(type="new", connectionId="ABCD"); Modify reservation - reserve(type="modify", connectionId="WXYZ", refConnectionId="ABCD"); New reservation diverse to existing reservation - reserve(type="diverse", connectionId="HIJK", refConnectionId="ABCD"); Now we can still independently control both the original reservation and the new reservation. For example, if I get a partial reserveFailed I can use terminate(connectionId="WXYZ") to clean up while not impacting original reservation "ABCD". I can also query both reservations independently and see that there is another reservation related through a pending modify operation. This also fixes the provision issue in the current shadow proposal. In the current proposal I would need to Release and then Provision so that we do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions resulting in a traffic hit. With a separate connectionId I can kick the provision off on the new reservation connectionId="WXYZ" while the existing reservation connectionId="ABCD" is still active. provision(connectionId="WXYZ") -> Reserved->Provisioning->Provisioned->Activated As the new reservation connectionId="WXYZ" transitions to Activated the existing reservation connectionId="ABCD" is transitioned from Activated to Terminated. We can give a terminated reason of "Modify successful" to track the change. The termination of the original reservation will be entirely handled by the uPA involved in the modification of the reservation, however, since we are doing the final coordination on the children NSA (i.e. moving from connectionId="ABCD" to connectionId="WXYZ"), these NSA will need to generate a forced_end, or perhaps a new modify_end event up the tree to notify the parents of the modify and termination of connectionId="ABCD". There is an interesting side-effect of this model. The new reservation does not need to follow the same path as the original reservation. We may want to force an "in-place" option to stop additional new domains getting involved in the reservation, but even if they did it should work correctly. For example, lets say the original path traversed domain B, but when the aggregator NSA did path computation it determined the new reservation should traverse domain C instead of domain B, but all other domains remained the same. As the provision(connectionId="WXYZ") propagates down the tree the uPA involved in the new reservation activate their resources, and terminate any resources associated with connectionId="ABCD" (obviously, domain B does not get the provision since it has no new resources, and domain C has no old resources). All other domains generate with original reservation segments generate a forced_end/modify_end event up the tree, which will eventually result in a terminate(connectionId="ABCD") being sent to domain B for its lone segment. Very interesting solution to the problem, however, there will be some impact on the existing state machine. I guess we need to decide if overloading the existing reserve command is worth not introducing the new Modify command set and it's associated state machine. John. On 2012-07-04, at 7:26 PM, John MacAuley wrote:
Jerry,
I like your thinking. Chin and I when down a similar path trying to use exiting operation primitives in conjunction with a new Modify primitive. We ruled out the idea because we needed to add additional conditional logic to the existing primitive in order to handle the modify specific behaviours. In addition, it just didn't work since there were situations where you couldn't properly provide the needed modify behaviours (specifically around error handling and backing out a modify failure in a subtree). It began to feel unnatural. The bastard child of an unholy wedlock. ;-)
I have provided detailed comments below. I would like you to give deep thought to whether you think trying to overload the existing primitives to support modify and the added complexity is going to really save us anything over a separate modify command set that has an independent state machine from the existing unmodified state machine.
John.
On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:
Hi everyone-
The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time. Or perhaps to increase the bandwidth, on an existing reservation. This was supposed to be a very limited functional tweek for v2.0.
Yes, I am still proposing this is the only capability we implement in 2.0.
But then we decide "hitless" was a requirement; And then we added "path preservation" as a requirement. It was *assumed* that we needed a unique Modify() primitive to do this... probably because prior tools have them... Suddenly, we are re-defining the entire state machine (yet again), and making it still more complex, in order to make this "simple" enhancement.
Sigh (throwing hands up into the air then smashing head against table)... a new state machine modelling the Modify lifecycle with no changes to the existing state machine. Two-phase reserve is a separate issue from Modify.
This increasing complexity is actually counter to what we were trying to do in Oxford: to simplify the state machine. And in general, counter to good protocol design.
Okay, I will not write an extensive dissertation on how this statement is inaccurate. The original intent of the Oxford state machine was to fix the delayed provisionConfirm message. Henrick, Chin, and Tomohiro did a great job trying to rationalize the existing state machine and simplify it where possible. What we landed on was two separate state machines to more simply describe each NSA role. Out of Oxford we actually ended up with more state machines and states than we went into Oxford with. Why is this you ask? It is because we are designing a complex distributed reservation and provisioning system. This is not a simple task given the behavioural constraints we have placed on the team. People need to realize that sometimes correctness is complex and not as simple as first thought. I think is NSI project is a perfect example of this.
I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have. Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...
You are making a assumption that I believe is the key flaw in your argument - you are assuming that the existing vetted state machine will not change if you reuse the existing operation primitives. I am not convinced it would not change. It all comes down to if we decide to model the "shadow connection" conditional states within the machine.
Given the developing complexity, we should step back and re-evaluate a) the urgency for Modify(), b) the means/scope of implementing it, and c) the timeline it will require to "do it properly".
At the rate this working group makes decisions and closed on actions we could still be debating this next year. We have time to agree and prototype before closing on the NSI 2.0 specification. If I may point out, the only things we have actually closed on are changes I made to the WSDL that fixed deficiencies in release 1.1. So far we have no new features in 2.0 fully agreed and committed. To be honest, I still do not understand the process we follow.
I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:
In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready. This shadow approach uses only existing protocol primitives and existing state machine. (This is similar to John's talk about "bridge and roll"... but without a bridge:-)
Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation. However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed. (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.) The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags. Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please...
This breaks down completely with anything other than VLAN circuits. If I have an existing EPL circuit that is encapsulating the entire contents of an Ethernet port end-to-end I have no ability to use another Ethernet port as an STP in this operation. I must use the existing STP since this could be the only port dropped at my location. Adding the requirement for an additional set of STPs just to modify the endTime of an existing reservation seems to add unnecessary complexity not only to the NSI implementation, but the end user consuming the service.
Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection. Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection. Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict. This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.
Looks like we agree on the application requirement. Woo hoo!
We can easily indicate when we wish to create a shadow Reservation within the existing protocol: We simply specify an existing ConnectionID in a Reservation Request.
How do I distinguish between an RA wanting to modify an existing reservation and a naughty NSA sending down a duplicate connectionId that we currently reject? I guess we will now always assume it is a modification...
If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation.
So can I assume that the STP are the same? What else in the reservation needs to be maintained? Is anything up for change?
Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection.
When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation.
When you say "working connection" do you mean provisioned and active in the network? Just to argue your point, we will need to modify the existing state machine to allow reserve requests to arrive on an existing connection which could be in any of the defined states. The action an RA takes here on confirm is the following Release?
In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. When a Release occurs for the working connection, a check is made to see if a shadow is linked to it. If so, the shadow will then replace the working, and the working connection is Terminated.
Are you saying that the Release operation will trigger a Release of the existing working path, and automatically provision the new modified working path, or are you saying I would need to do another Provision? I think you mean Release and then Provision so that you do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions. Anything else will require a complete change of the state machine.
The issue with requiring the Release and then Provision again is that your service will take a traffic hit. We are not talking about a short blip either. We are talking a considerable period as the operation message filter down and up the tree multiple times. I would definitely dismiss this mechanism based solely on this deficiency. I need something not so intrusive, especially if it is just an endTime extension.
The one key piece missing from this strategy is how do I back out a shadow connection? Once again, the key point of the two phase commit is to handle a failure to reserve the additional resources across the entire connection path. How do you handle when part of your shadow reservation fails? I can't send down a terminate since this will terminate the entire connection. I can't overload the terminate operation since part of the tree has failed the reservation modification, and therefore, no longer has record of it resulting in a termination of the reservation for those NSA upon receiving the Terminate request. Even if I force the RA to send down another Reserve to force the shadow connection back to the original pre-modify path there is nothing saying the original resources are even available any longer as they may have been consumed for a new reservation.
Also, when I query the connectionId during this shadow reservation do I see the existing in service reservation, or do I see the modified values?
This process does not change the NSI-CS protocol or the state machine. It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions. Pathfinders should to also be enhanced to double-book shadow resources.
I think what you mean to say is that you do not require any new operation primitives. You have changed the behaviour of the NSI-CS protocol by overloading the existing operation primitives. I am still not convinced the existing state machine does not need to change, and you have some other issues to address as well. What I did is call a spade a spade and defined a new operation set to do effectively the same thing as what you are doing. I guess the big question is do we believe adding the additional complexity to existing operations is worth saving having to introduce an new set of operations better named for the activity at hand.
This "shadow" approach has this major advantage: Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine. All the "modification" processing is implemented using existing primitives and state transitions. The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection. And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.
I disagree on the hiccup. Why take the hit when there is no need to?
Finally, as a general recommendation: Modifying the existing primitives and the associated state machine should be a last resort. Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged. We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.
Thoughts? Jerry
On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hey John - some good thouhts below. See comments inline. J On 7/5/12 12:59 PM, John MacAuley wrote: > First off we will need to change the hiding the shadow connection > behind the same connectionId as the existing reservation, and instead > us a new connectionId. This is where the existing proposal falls over > since there is no way to directly address or manipulate the new > modified reservation while the existing reservation is still in service. We could define a "version" number...XYZ(v0) -> modify takes it to XYZ(v1) -> next modify takes it to XYZ(v2)...etc. If you want the ability to roll back, you *must* uniquely identify each version of the connection somehow and associate the resources with each. What if yet another Modify() request arrives before the first modify is fully completed? I agree we need to differentiate "before" and "after" connections somehow... But this is true of the separate Modify command as well.... > > Secondly, we will need to modify the existing reserve command to take > a new type that will identify if the request is a new reservation, a > modify of an existing reservation, or perhaps a new reservation that > has some dependency on an existing reservation (say for diverse routing). True. see below. > In addition, if we are doing anything other than a new reservation we > will need to provide the connectionId of the existing reservation to > provide context for this new reservation. EXACTLY. This is my point! You want to find the common aspects of all of these related requests... The common thread is _/the set resources that are available to the path finder/_: For the the normal Reserve() any resources that are marked "available" can be considered. For the modify Reserve() its got to use a) resources assigned to the "before" connection and/or b) "available" resources. For the diverse Reserve(), the path finder can consider only resources that are "available" *and* "greater than X distance" from the referenced connection(s). These are all path finding constraints, i.e. policy imposed by the RA(!) - not really a different protocol process. If we can see this - and see how it could help us down the road to do things like diverse path or parallel/reverse path, etc. it helps us avoid a potentially very costly SM change that will not scale to the other types of requests... > New reservation - reserve(type="new", connectionId="ABCD"); > Modify reservation - reserve(type="modify", connectionId="WXYZ", > refConnectionId="ABCD"); Or the <new connectionID> == <exisitng conneiton ID> ? > New reservation diverse to existing reservation > - reserve(type="diverse", connectionId="HIJK", refConnectionId="ABCD"); sure. Again - your "refConnectionID" is actually just a means of identifying certain resource constraints to be applied to the PF process. > > Now we can still independently control both the original reservation > and the new reservation. YES! No new SM required. We do want to consider how we ascribe characterisitcs like "these two connections cannot be active at the same time" ... or we could allow both to be active - and its the Users responsibiity to only use one at a time. But this could possibly be part of the higher layer composite functionality... > For example, if I get a partial reserveFailed I can use > terminate(connectionId="WXYZ") to clean up while not impacting > original reservation "ABCD". I can also query both reservations > independently and see that there is another reservation related > through a pending modify operation. > > This also fixes the provision issue in the current shadow proposal. > In the current proposal I would need to Release and then Provision so > that we do an > Activated->Releasing->Reserved->Provisioning->Provisioned->Activated > sequence of state transitions resulting in a traffic hit. hmm? My shadow proposal just suggested the user Releases the working Connection, and then Provisions the modified Connection (or re-Provisions the same connections ID but the Provision function uses the new version.) It could be simpler: once the shadow is reserved confirmed and waiting for a manual start, a Provision Request on the shadow recognizes that it is linked to the working connection and can either a) try to directly Provision(new) the new version (thus possibly avoiding the hit), or b) Release(old)-Provision(new) creating a brief hit. IMO - a modify is not a network outage. I.e. it does not need to be "hitless. It is a user RA initiated request. The PA can impose a potential hit without feeling guilty. And even for just a schedule extension, unless there is a requirement that schedule extensions be treated in a special manner (special hard coded cases...not generally a great idea) then even a simple extension may incure a path modification. > With a separate connectionId I can kick the provision off on the new > reservation connectionId="WXYZ" while the existing > reservation connectionId="ABCD" is still active. > > provision(connectionId="WXYZ") > -> Reserved->Provisioning->Provisioned->Activated > > As the new reservation connectionId="WXYZ" transitions to Activated > the existing reservation connectionId="ABCD" is transitioned from > Activated to Terminated. We can give a terminated reason of "Modify > successful" to track the change. The termination of the original > reservation will be entirely handled by the uPA involved in the > modification of the reservation, however, since we are doing the final > coordination on the children NSA (i.e. moving from connectionId="ABCD" > to connectionId="WXYZ"), these NSA will need to generate a forced_end, > or perhaps a new modify_end event up the tree to notify the parents of > the modify and termination of connectionId="ABCD". > > There is an interesting side-effect of this model. The new > reservation does not need to follow the same path as the original > reservation. We may want to force an "in-place" option to stop > additional new domains getting involved in the reservation, but even > if they did it should work correctly. Exactly. If the resource constraints say "PF *must* first use existing resources of old version for new version" then the path will not change - entirely (one could see an VCAT/LCAS split path occuring.) However, if we say "PF *should* use pre-existing resources in new version" it allows the pathfinding to allocate a different path if that will allow the modify to complete. I think a more powerful resource constraint specification in the Reserve message would also be highly beneficial to remote pathfinders that simply want a segment to do something specific - like exit one domain into a specific next hop domain but otherwise do not care which path is used... > For example, lets say the original path traversed domain B, but when > the aggregator NSA did path computation it determined the new > reservation should traverse domain C instead of domain B, but all > other domains remained the same. As the > provision(connectionId="WXYZ") propagates down the tree the uPA > involved in the new reservation activate their resources, and > terminate any resources associated with connectionId="ABCD" > (obviously, domain B does not get the provision since it has no new > resources, and domain C has no old resources). All other domains > generate with original reservation segments generate > a forced_end/modify_end event up the tree, which will eventually > result in a terminate(connectionId="ABCD") being sent to domain B for > its lone segment. This is why I think for a modify functionality we cannot escape there being a transitionary period as we reconfigure from the old to the new. Even just a schedule extension will cause a period of time when some of the segments will have the new end time, and some will have the old end time. This transition may not always require a hit in the data plane (such a schedule extension might or might not take a hit- depending on the path resources used.) > > Very interesting solution to the problem, however, there will be some > impact on the existing state machine. I guess we need to decide if > overloading the existing reserve command is worth not introducing the > new Modify command set and it's associated state machine. Hmm... No. If we do this right, the existing state machine (transitions) remains the same, but some of the states themselves do more intelligent things. I really suggest if we think about the modify as creating a new connection with certain resource constraints that we can safely avoid a substantial increase in protocol complexity. We need to also remember that few users will actually take advantage of the modify capability, and we need to make sure we do not make the SM so complex that users cannot understand it...which we are close to doing. > > John. > BEst regards Jerry
Hi John- Regarding this state machine- We really need to consider how this Modify can be handled more like ballet than hockey. :-) --- First, what are we trying to solve with 2PC? The multi-domain pass/fail scenario where some succeed and some fail causing the successful modifications to be rolled back is a resource management issue - not a protocol state issue. If the successfl domains will still have to be rolled back - we don't escape this fact. Its just that the RA is not responsible for doing it. You are placing the task on the PA. And forcing the RA to now issue two messages to make sure he wants what he just asked for. A better approach would be think of this differently - think of the new modified connection as a separate new connection ... what is it about the new connection that you want? The same path as the old connection? Ok. We do not want to release the old resources into the wild...may lose them. OK. We want to minimize the impact to the data flow of the resource modification. OK. So make a constraint for the path finder: "You must use resources that are assigned to ( "available" | "Connection X") " The protocol runs normally... The reservation is made. Resources are already multi-booked to different connections for scheduling purposes, so double booking resources to two connections simultaneously should not be technically difficult. Just need to make sure we can switch them over appropriately - but fortunately this isn't a state machine issue, its a configuration issue and resource database issue.. Please see my alternative "shadow" proposal that does not require wholesale replacement of the state machine. --- This proposal seems to require 2PC for normal Reservations as well as Modifications. Is this right? This is a major (and unnecessary IMO) departure from the existing State Machine we have worked on so hard for over two years. We have had so many Very Long discussions on this in early 2010 in which we decided the simple single Reserve command with an active Terminate was preferable to the 2PC (the 2PC requiring a top down confirm...doubling the number of messages required, and introducing race conditions with timeouts that were not easily resolved.) Why, after so long and demonstrated success in v1.0, do we suddenly need to change horses to a two phase commit for the entire model? I fear this will cause extensive re-writes ... for what feature again? TO extend a schedule? 2PC is an optimization strategy where you /assume a fail/ is likely to occur down tree, and the RA does not wish to be responsible for cleaning up unneeded reservations. If you assume reservations will mostly be successful, then a 2PC imposes twice the messaging, and imposes this coding complexity on all applications so that some very few can use this complex approach to extend their schedule? This is way too complex an approach. --- The 10 minute timeout for backing out a reservation implies that those resources are blocked until they are freed - which poses all sorts of security and resource management problems. If an agent is sophisticated enough to manage a Modify function, it should be sophisticated enough to be responsible for retracting unused reservations. Indeed, the 2PC should charge based upon request, rather than confirm - since a user could arbitrarilly request many big connections and just never confirm them. --- Other protocols recognize long delays in provisioning. Most of these have a "Continuing" message that represents a heartbeat of sorts... If a protocol agent cannot complete its processing in a set time - but is still trying, it sends a "Continuing" message to the requester to indicate it is still working. As long as CONT messages are being passed between nodes downstream, the upstream nodes will continue issuing CONT messages. If someone along the line gets fed up, they issue Terminates appropriately and everything is torn down. NSI could do this as well. --- In slide 6 (?) you describe a bunch of NRM functional requirements. I think I understand what you mean to do, but we should not phase these as NRM functions. NSI has nothing to do with NRM functions. Its inconsistent with the spec and can be very misleading and confusing. -> As far as NSI is concerned, all events are NSI protocol events. For example - given a NSA "John" presiding over a local domain "Ottawa", and say two peering with domains Guy/Cambridge, and Inder/Berkeley, all interactions between John and Guy or John and Inder are handled by NSI CS protocol. The local Ottawa domain - which John considers his "local" domain - is managed by John's NRM consisting of a dozen minions with hockey sticks. From an NSI standpoint, that local Ottawa domain could be placed under control of another [more sophisticated] NSA Jerry, and John would then interact with Jerry/Ottawa using the NSI protocol instead of body checks. (:-) As far as NSI is concerned, the interaction John had with Ottawa using his traditional ways is functionally and equivallently replaced by NSA Jerry running NSI. Thus, the *only* events that NSI needs to stipulate are those NSI events. :-) (The NSA _/implementation/_ is responsible for translating any local events into their appropriate NSI protocol events. but the spec does not care anything about this translation.) --- In general, it seems to me this whole proposal is trying to dance around the issue of adding or removing resources to a connection in service using our existing inflexible path finding and constraints specifications. All this means is the Path Finder should be able to compute a new path using the existing allocated resources - albeit allocated to a specific connection. It seems that if we were able to indicate this simple change to path finders, then all the funky modify state becomes unnecessary. We just reserve a connection that can do double booking of resources, and provision it when it is confirmed. If the new connection overlays the existing connection, all the better. Right? And we can use existing primitives. Best regards Jerry On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
On 2012-07-03, at 5:23 PM, Jerry Sobieski wrote:
Hi John-
Regarding this state machine- We really need to consider how this Modify can be handled more like ballet than hockey. :-)
--- First, what are we trying to solve with 2PC? The multi-domain pass/fail scenario where some succeed and some fail causing the successful modifications to be rolled back is a resource management issue - not a protocol state issue. If the successfl domains will still have to be rolled back - we don't escape this fact. Its just that the RA is not responsible for doing it. You are placing the task on the PA. And forcing the RA to now issue two messages to make sure he wants what he just asked for.
A better approach would be think of this differently - think of the new modified connection as a separate new connection ... what is it about the new connection that you want? The same path as the old connection? Ok. We do not want to release the old resources into the wild...may lose them. OK. We want to minimize the impact to the data flow of the resource modification. OK.
So far this sounds the same as the modify command, except for the "separate new connection" part.
So make a constraint for the path finder: "You must use resources that are assigned to ( "available" | "Connection X") " The protocol runs normally... The reservation is made. Resources are already multi-booked to different connections for scheduling purposes, so double booking resources to two connections simultaneously should not be technically difficult. Just need to make sure we can switch them over appropriately - but fortunately this isn't a state machine issue, its a configuration issue and resource database issue..
Okay, still exact same as the modify. Path finder is using the exact same set of resources and just seeing if it can extend the time, add additional bandwidth, etc.
Please see my alternative "shadow" proposal that does not require wholesale replacement of the state machine.
I still do not understand this comment. The existing state machine stays as is, with the modify state machine being an entirely different overlay machine. It has no impact on the existing state machine.
--- This proposal seems to require 2PC for normal Reservations as well as Modifications. Is this right?
No, the 2PC on reservations was an action i took out of Delft and it has nothing to do with the Modify except someone said "If we are doing a 2PC on Modify why don't we fix the Reserve at the same time?"
This is a major (and unnecessary IMO) departure from the existing State Machine we have worked on so hard for over two years. We have had so many Very Long discussions on this in early 2010 in which we decided the simple single Reserve command with an active Terminate was preferable to the 2PC (the 2PC requiring a top down confirm...doubling the number of messages required, and introducing race conditions with timeouts that were not easily resolved.) Why, after so long and demonstrated success in v1.0, do we suddenly need to change horses to a two phase commit for the entire model? I fear this will cause extensive re-writes ... for what feature again? TO extend a schedule?
As I said - I have no issue leaving Reserve as is, but we did agree back in release 1.0 to revisit it in release 2.0 at Tomohiro's request.
2PC is an optimization strategy where you assume a fail is likely to occur down tree, and the RA does not wish to be responsible for cleaning up unneeded reservations. If you assume reservations will mostly be successful, then a 2PC imposes twice the messaging, and imposes this coding complexity on all applications so that some very few can use this complex approach to extend their schedule? This is way too complex an approach.
Given our current topology and pathfinding model I think it is highly likely that the majority of reservations will fail on first attempt. This will happen even more often if we have successful take-up of the protocol and the network (a scarce resource) becomes more congested.
--- The 10 minute timeout for backing out a reservation implies that those resources are blocked until they are freed - which poses all sorts of security and resource management problems. If an agent is sophisticated enough to manage a Modify function, it should be sophisticated enough to be responsible for retracting unused reservations. Indeed, the 2PC should charge based upon request, rather than confirm - since a user could arbitrarilly request many big connections and just never confirm them.
10 minutes, 5 minutes, the time is not important. What you are suggesting is that we do not need a timer to protect the network from dangling resources associated with incomplete reservations or modification operations. I am okay with this just so long as there is a way for the RA to clean up afterwards, and any resources it strands are counted against the user's network cap.
--- Other protocols recognize long delays in provisioning. Most of these have a "Continuing" message that represents a heartbeat of sorts... If a protocol agent cannot complete its processing in a set time - but is still trying, it sends a "Continuing" message to the requester to indicate it is still working. As long as CONT messages are being passed between nodes downstream, the upstream nodes will continue issuing CONT messages. If someone along the line gets fed up, they issue Terminates appropriately and everything is torn down. NSI could do this as well.
I have no issue with this as an alternative design, other than it generates unneeded noise within the tree to just hold onto resources temporarily. I would need to see a detailed message and state machine to understand the proposal.
--- In slide 6 (?) you describe a bunch of NRM functional requirements. I think I understand what you mean to do, but we should not phase these as NRM functions. NSI has nothing to do with NRM functions. Its inconsistent with the spec and can be very misleading and confusing. -> As far as NSI is concerned, all events are NSI protocol events. For example - given a NSA "John" presiding over a local domain "Ottawa", and say two peering with domains Guy/Cambridge, and Inder/Berkeley, all interactions between John and Guy or John and Inder are handled by NSI CS protocol. The local Ottawa domain - which John considers his "local" domain - is managed by John's NRM consisting of a dozen minions with hockey sticks. From an NSI standpoint, that local Ottawa domain could be placed under control of another [more sophisticated] NSA Jerry, and John would then interact with Jerry/Ottawa using the NSI protocol instead of body checks. (:-) As far as NSI is concerned, the interaction John had with Ottawa using his traditional ways is functionally and equivallently replaced by NSA Jerry running NSI. Thus, the *only* events that NSI needs to stipulate are those NSI events. :-) (The NSA implementation is responsible for translating any local events into their appropriate NSI protocol events. but the spec does not care anything about this translation.)
These are the standard NRM operations and events associated with a reservation, which have existed since version 1 of the state machine. I just added the equivalent ones for the modify operation. Are you saying you no longer believe we need the uPA state machine? Do you just notice these now?
--- In general, it seems to me this whole proposal is trying to dance around the issue of adding or removing resources to a connection in service using our existing inflexible path finding and constraints specifications. All this means is the Path Finder should be able to compute a new path using the existing allocated resources - albeit allocated to a specific connection. It seems that if we were able to indicate this simple change to path finders, then all the funky modify state becomes unnecessary. We just reserve a connection that can do double booking of resources, and provision it when it is confirmed. If the new connection overlays the existing connection, all the better. Right? And we can use existing primitives
Okay, just to be 100% clear - I am not "dancing around the issue". I have taken the time (on numerous occasions now) to propose a viable solution to the Modify requirements. I used a two-phase commit model to perform controlled securing of network resources against an existing reservation, then the committing and activation of these new resources within the network. I presented the new set of operations and a funky new state machine. At its heart it even does with the path finder what you are looking to do, however, it does not overload existing NSI operations, but instead defines a clear set of new operations in support of the reservation modification. I will now look at your shadow proposal.
Best regards Jerry On 7/2/12 11:06 PM, John MacAuley wrote:
Peoples,
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. We would like to close on this action ASAP.
Thank you, John.
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hi On Mon, 2 Jul 2012, John MacAuley wrote:
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call.
I missed the call last week (because I love sitting in airplanes), but could you please explain why we need two phase reserve? (and modify as well though that one makes more sense). IDCP have two phase ... everything (more or less), and the main result of this was that the implementations became extremely complex and error handling even worse (often lacking). Two phase looks really nice from a distance. Until you are in the middle of it and there are crying babies and bleeding people everywhere :-). Best regards, Henrik Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
The general consensus was that if we are doing a two phase modify then we might as well do a two phase reserve. I am not hung up on the two phase reserve since the existing model does do a pseudo two phase. I absolutely do believe we need a two phase modify though, otherwise things can get extremely pooched, Perhaps some of the others who requested a two phase reserve could jump in and provide feedback? John. On 2012-07-04, at 8:20 AM, Henrik Thostrup Jensen wrote:
Hi
On Mon, 2 Jul 2012, John MacAuley wrote:
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call.
I missed the call last week (because I love sitting in airplanes), but could you please explain why we need two phase reserve? (and modify as well though that one makes more sense).
IDCP have two phase ... everything (more or less), and the main result of this was that the implementations became extremely complex and error handling even worse (often lacking).
Two phase looks really nice from a distance. Until you are in the middle of it and there are crying babies and bleeding people everywhere :-).
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
That is correct, the general trend of thought was consistency. Because reservation and modify are reasonably similar, they should have similar workflow processes. At the meeting in Delft, there were folks on both sides of this argument (have a similar workflow process for both reservation and modify). - Chin On 7/4/12 5:42 AM, John MacAuley wrote:
The general consensus was that if we are doing a two phase modify then we might as well do a two phase reserve. I am not hung up on the two phase reserve since the existing model does do a pseudo two phase. I absolutely do believe we need a two phase modify though, otherwise things can get extremely pooched,
Perhaps some of the others who requested a two phase reserve could jump in and provide feedback?
John.
On 2012-07-04, at 8:20 AM, Henrik Thostrup Jensen wrote:
Hi
On Mon, 2 Jul 2012, John MacAuley wrote:
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call. I missed the call last week (because I love sitting in airplanes), but could you please explain why we need two phase reserve? (and modify as well though that one makes more sense).
IDCP have two phase ... everything (more or less), and the main result of this was that the implementations became extremely complex and error handling even worse (often lacking).
Two phase looks really nice from a distance. Until you are in the middle of it and there are crying babies and bleeding people everywhere :-).
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hi Henrik, Actually the IDCP has a single phase reservation workflow. That is the reason why it explicitly only supports the chain model. The provisioning workflow however is more like a two-phase comment. It was initially a single phase commit, but was redesigned to better support the auto-timer triggered provisioning. I'm just say'in :) - Chin On 7/4/12 5:20 AM, Henrik Thostrup Jensen wrote:
Hi
On Mon, 2 Jul 2012, John MacAuley wrote:
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call.
I missed the call last week (because I love sitting in airplanes), but could you please explain why we need two phase reserve? (and modify as well though that one makes more sense).
IDCP have two phase ... everything (more or less), and the main result of this was that the implementations became extremely complex and error handling even worse (often lacking).
Two phase looks really nice from a distance. Until you are in the middle of it and there are crying babies and bleeding people everywhere :-).
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
----- Original Message -----
From: "Chin Guok" <chin@es.net> To: "Henrik Thostrup Jensen" <htj@nordu.net> Cc: "NSI Working Group" <nsi-wg@ogf.org> Sent: Wednesday, July 4, 2012 10:31:38 AM Subject: Re: [Nsi-wg] New state machine with two phase reserve and modify
Hi Henrik,
Actually the IDCP has a single phase reservation workflow. That is the reason why it explicitly only supports the chain model. The provisioning workflow however is more like a two-phase comment. It was initially a single phase commit, but was redesigned to better support the auto-timer triggered provisioning.
I'm just say'in :)
- Chin On 7/4/12 5:20 AM, Henrik Thostrup Jensen wrote:
Hi
On Mon, 2 Jul 2012, John MacAuley wrote:
Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure. Please study it and prepare questions for the Wednesday call.
I missed the call last week (because I love sitting in airplanes), but could you please explain why we need two phase reserve? (and modify as well though that one makes more sense).
IDCP have two phase ... everything (more or less), and the main result of this was that the implementations became extremely complex and error handling even worse (often lacking).
Two phase looks really nice from a distance. Until you are in the middle of it and there are crying babies and bleeding people everywhere :-).
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
On Wed, 4 Jul 2012, Chin Guok wrote:
Actually the IDCP has a single phase reservation workflow.
I stand corrected.
That is the reason why it explicitly only supports the chain model.
(going off topic here) Mmm... I do not see why that should restrict it to chain model. Sure tree will be somewhat oppertunistic, but as long as one can cancel the connections I do not see thep roblem. Could you eloborate? Anyway, my point was that we should be careful about trading a simpler model for a more complex one in order to gain potentially better fault handling. 2PC has problems as well, the main ones being coordinator failures and added implementation complexity. There is a middleground with oppertunistic methods (e.g., 3PC without the precommit - essentially pre-check + change without any blocking), which provide almost the same functionality but at a much lower implementation cost. It feels a lot like we are using 2PC for the sake of using 2PC. Best regards, Henrik Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
Hi Henrik, On 7/5/12 2:08 AM, Henrik Thostrup Jensen wrote:
On Wed, 4 Jul 2012, Chin Guok wrote:
Actually the IDCP has a single phase reservation workflow.
I stand corrected.
That is the reason why it explicitly only supports the chain model.
(going off topic here)
Mmm... I do not see why that should restrict it to chain model. Sure tree will be somewhat oppertunistic, but as long as one can cancel the connections I do not see thep roblem. Could you eloborate? Yes, the cancel operation can accomplish this for a tree, however we chose to go the easier route and model it after RSVP.
Anyway, my point was that we should be careful about trading a simpler model for a more complex one in order to gain potentially better fault handling. 2PC has problems as well, the main ones being coordinator failures and added implementation complexity.
There is a middleground with oppertunistic methods (e.g., 3PC without the precommit - essentially pre-check + change without any blocking), which provide almost the same functionality but at a much lower implementation cost. I can see some benefits to what you are proposing. Do you think you would like to expand on this idea and propose a state machine?
Thanks again for the comments. - Chin
It feels a lot like we are using 2PC for the sake of using 2PC.
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
All, I have modified my slide pack to remove the 2PC reserve operation since it seemed to cause some confusion on my modify proposal. I also removed the timeout as people suggested it was not required. Thanks, John.
Hi Tomohiro, all Tomohiro, thanks for the proposal. All, we currently we have 2 formal proposals with the modify capability (John's "2-Phase", and Tomohiro's "Pseudo 2-Phase"). We also have a description for a 3rd state machine (Jerry's "Bridge-&-Roll"). Jerry if you want to formally propose this, please submit a state machine diagram on how you see this working. If other folks have specific ideas or modifications, please send it to the Skype chat (email Tomohiro or myself if you want to be invited), and work with the corresponding author to have your changes considered and implemented. If you have a totally different state machine in mind, please draw it up and send it to the mailing list. Thanks! - Chin On 7/6/12 6:04 AM, Tomohiro Kudoh wrote:
Hi John, Chin and all,
Here is my proposal of modify state machine.
Tomohiro
_______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hi Henrik, On 7/5/12 2:08 AM, Henrik Thostrup Jensen wrote:
On Wed, 4 Jul 2012, Chin Guok wrote:
Actually the IDCP has a single phase reservation workflow.
I stand corrected.
That is the reason why it explicitly only supports the chain model.
(going off topic here)
Mmm... I do not see why that should restrict it to chain model. Sure tree will be somewhat oppertunistic, but as long as one can cancel the connections I do not see thep roblem. Could you eloborate? Yes, the cancel operation can accomplish this for a tree, however we chose to go the easier route and model it after RSVP. Remember: The Tree model does not imply "parallel" Reserve() requests in time domain. It simply means that the an RA is able to make reservation requests directly to downstream PAs. Such a tree RA can still reserve each segment of the path sequentially, proceeding from
On 7/6/12 1:23 AM, Chin Guok wrote: source to destination issuing a Reservation requests to each domain. Once each domain returns a confirmation, the RA moves to the next domain. This "sequential tree" process resolves dependencies at the reservation stage, but is unnecessary for the Provision process since the resources have already been confirmed. At the Provision stage, parallel in time provisioning can easily be performed. This subtle distinction sometimes creates confusion. A sequential Tree process is in effect a Chain path finding process, but allows the RA to control the inter-domain path and the PAs to authenticate the RAs more directly. Regards Jerry
participants (5)
-
Chin Guok
-
Henrik Thostrup Jensen
-
Jerry Sobieski
-
John MacAuley
-
Tomohiro Kudoh