On Apr 22, 2010, at 7:11 PM, Inder Monga wrote:

John,

If I may add my $0.02 cents.

On Apr 22, 2010, at 2:03 PM, John Vollbrecht wrote:

This is very nice.

A couple comments/ suggestions/ questions

1) I think all the actions you suggest for the transport plane failure  
are actually taken in the NRM or Service plane.  I may be wrong, but  
that is what it seems to me.  If so, then I think it would be helpful  
to describe the transport device/plane signalling failure to the NRM  
at different times.  Something like this would have made it easier for  
me to follow.

I agree that transport plane failures actions are either handled by the Service Plane aka NSA (for example "reserve alternative local resources") or in the transport plane (for example switch to backup). I do not understand what you mean by "describe the transport device/plane signalling failure to the NRM" - can you please elaborate?

I think there should be a statement something like " Transport plane failures are communicated to the NRM.  The NRM deals with these based on the state of the NRM at the time it learns of the failure".  The idea is to make it clear that this is dealing with how to deal with transport failures reported to the service plane.  One might use NSA instead of NRM - I am not sure which would be more appropriate.  I may not have explained this well- please ask questions if it is not clear.

The intention of this section was to indicate the error cases which would result in notification to the RA and possible cancelation of a connection. There are cases highlighted where the errors are handled completely by the Service Plane or the Transport plane with no need for notification to the user/RA. 

I note that the RA and PA are both in the service plane.  Presumably when an NSA with RA receives a fail message from the PA, the Segment/aggregate section of NSA also has state, and how the NSA deal with the message depends on the state of the NSA.


2) I don't the understand local and remote distinction in the Service  
Plane failure discussion.  Perhaps local meaning NRM and remote  
meaning reachable through NSI?

Local implies failure of own domain's RA or PA. Remote means failure of the remote RA or PA. The two cases are diagrammatically the same - the difference is in the context.
This is still confusing to me.   If the session between the RA and PA fails, then isn't everything a local failure - whichever side you are on?  If a PA tries to send a message and it does't make, how does it know whether the message got there or not?  If it is time when it notices the session fails, this also seems both sides are equivalent.  


3) I am wondering how service plane failures are discovered?  Is some  
sort of session failure?

There are a couple of assumptions here:
1. There is reliable messaging between RA and PA
2. There is a timeout if responses are not received from the RA/PA (could be after multiple tries). This timeout could be due to a management network failure between RA and PA. 

These seem like they could have different consequences.

I agree that the service plane failures are less well defined so far.  It is good you are thinking about them and starting discussion.  
The issue of whether NSAs (or only NRMs) keep state after a connection is reserved is another issue that impacts this.

John



Hope this helps - thanks for your feedback.

Inder



John

On Apr 21, 2010, at 1:49 AM, Chin Guok wrote:

Hi all,

I've attached a draft of the error handling section that Inder and I  
came up with for the NSI Architecture document.

This is a rough first draft, and there are some obvious portions  
missing, but it gives an idea of where we heading.

Comments are most welcomed.

Thanks.

- Chin<NSI Error Handling Chin_Inder  
v2.docx>_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/nsi-wg

_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/nsi-wg

---
Inder Monga http://100gbs.lbl.gov
imonga@es.net
http://www.es.net
(510) 499 8065 (c)

(510) 486 6531 (o)