Peoples,
Had someone show up in my office so I missed the conversation over
"Resource change from available to not available." I thought I would
provide some input on the topic based on my DRAC experiences.
I think there are three types of events that can initiate a topology
change that should be understood when defining the error handling. Two
of these are actually not errors but normal operating procedures within
a network:
1. Physical network failure resulting in a topology change - typically
the temporary removal of a link from topology with no knowledge of when
it will be restored.
2. The permanent removal of a link from the topology by a network
administrator. Actually, this one should include the reconfiguration
of the network where an entire node could be removed.
3. The temporary removal of a link by a network administrator for
maintainence purposes. This will typically have a defined start and
end time based on the maintenance window.
#1 is interesting in that it impacts existing schedules in an
in-service state, reserved schedules not yet in service, and any new
reservation requests.
a) Those schedules in-service using the links impacted by the topology
change may undergo some type of restoration. If this was a protected
circuit then underlying transport will restore the service and we may
not want to do anything about it. If this was an unprotected service
then perhaps re-dial could be initiated by the NRM in an attempt to
achieve a lazy restore.
b) Depending on the estimated length of the temporary topology change
we may need to recompute the paths of those schedules reserved but not
yet provisioned. We should not recompute the paths from the point of
failure to the end of time but for some predefined floating window
optimistic enough to give the failure time to recover, and reduce the
amount schedules that would be recomputed. For example, a floating one
hour window would mean all reservations up to an hour in the future
that could be impacted by the failure can be recomputed. If the
failure is cleared and the topology is restored then there is a one
hour window that should have been cleared. The interesting side-effect
is we now have a window of time to make sure the link remains trouble
free. The question is have we blocked that link from use or can a new
schedule use the remaining hour if it comes in after the trouble has
cleared.
c) If a new reservation request for a future point in time arrives
while a failure has taken the link out of topology do we remove the
link from computation, or do we add an optimistic guard time after
which we can assume the link will be restored?
#2 is different from a fault condition in that an administrator has
removed the link from topology. We can model this gracefully if we can
have a high priority (preemptive) administration reservation that can
block the bandwidth on a link from the point in time the link will be
removed through until infinity. Any schedules this preemptive schedule
impacts will need to be recomputed as discussed in the previous
example, or if provisioned switched to protection/re-dialed to
restore. At some point on or after the start of the preemptive
schedule the link can be permanently removed from topology and the
reservation blocking that link cleared.
#3 is similar to #2 except there is a defined end time for the
preemptive schedule blocking the link. Only reservations overlapping
with the maintenance window would need to be recomputed. Obviously,
any provisioned schedules would need to be switched to protection or
re-dialed to restore.
John.
On 10-04-28 2:14 AM, Inder Monga wrote:
Hi All,
An updated draft based on comments. We attached a table in the front to
summarize and use it for discussions. Look forward to discuss this
tomorrow.
Thanks,
Inder
On Apr 20, 2010, at 10:49 PM, Chin Guok wrote:
Hi all,
I've attached a draft of the error handling section that Inder and I
came up with for the NSI Architecture document.
This is a rough first draft, and there are some obvious portions
missing, but it gives an idea of where we heading.
Comments are most welcomed.
Thanks.
- Chin<NSI Error Handling Chin_Inder
v2.docx>_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/nsi-wg
_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/nsi-wg