ServiceException needs further details
Resend.... I have added some new errors I implemented as part of CS11. Please comment and add to the list. Guy would like to add an appendix in the CS protocol document. John. Peoples, I took and action to start the error handling discussion so that we, as a group, can document the error messages and behaviors. I would like to start it off with when an NSIServiceException is returned as a SOAP fault to a request, and when it is returned in a specific failed response message. OpenDRAC used the strategy of validating a “key” set of attributes in the request before returning an ACK response or SOAP fault. This gives OpenDRAC the opportunity to validate parameters needed to accept the message for processing, and possibly return the asynchronous failed message at a later time. Here is the list of parameters I will validate in a reservation message and return a SOAP fault if they are incorrect or missing. HTTP authentication – if you don’t have valid credentials you are rejected with an HTTP 40x message. correlationId - needed for any acknowledgment, confirmation, or failed message. Must be unique within the context of the providerNSA otherwise the request cannot be accepted. replyTo - we will send the confirmation, or failed message back to this location. We do not validate the contents of the endpoint, just that it exists. Reservation – if the reservation parameters are not present then we reject. requesterNSA and providerNSA – must be present and resolve to an NsNetwork in topology. Also, the providerNSA must be the NsNetwork OpenDRAC is managing or we reject the message. connectionId – this is used as the primary reference attribute for reservation state machines and must be present. If any of these fields are missing or invalid OpenDRAC will return a SOAP fault containing the NSIServiceException set to an appropriate error message. I typically fill in MISSING_PARAMETER - "SVC0001", "Invalid or missing parameter" for this generic case and specify attributes identifying the parameter in question. Here is a list of error messages currently implemented in OpenDRAC. The list continues to expand. I have kept the text generic with the specific error values being returned in the associated attribute list. We will also need to agree on the format of the message/errorId. MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter" UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be processed" ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId" DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId" MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials" TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database" TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA" PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation" INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message" INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure" INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure" STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use" BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation" Would people like to add to the list? Comments? John.
On Thu, 15 Dec 2011, John MacAuley wrote:
I took and action to start the error handling discussion so that we, as a group, can document the error messages and behaviors. I would like to start it off with when an NSIServiceExcepti on is returned as a SOAP fault to a request, and when it is returned in a specific failed resp onse message.
OpenDRAC ...
[snip] So a lot of these are more policy than mechanism, and could be subject to change. Lets focus on the error codes. What does the SVC prefix stand for? (and why a prefix at all, and why not "ERR" or "ERROR", which would be somewhat more intuituve.
Here is a list of error messages currently implemented in OpenDRAC. The list continues to exp and. I have kept the text generic with the specific error values being returned in the associ ated attribute list. We will also need to agree on the format of the message/errorId.
I think we also need a plan with the error codes and their classification. Do we provide the errors in order to tell a user went wrong (in which case a string will suffice), or do we provide error codes so a client can intelligently handle some cases, or both? The answer is probably the latter, with some semantics for errorId, which can enable the client to automatically classify and potentially recover from the error. The distinction between text and variables are somewhat artifial and only makes sense for missing of invalid parameters. If we assume that the error string is for humans only, the distincition between: text: Missing parameters: Start time, Dest STP. and text: Missing parameters variables: ["Start time", "Dest STP"] Is just unneeded complexity. If a client is missing a parameter, it probably won't be able to change the request and fill it out automatically by looking at the error response. It should just be fixed and send the parameter in the first place. What could make sense is that the NSI agent replies that it understands the request, but could for some reason not fulfill it, e.g., a path could not be found. In this case the client could retry the request elsewhere. However the client should not care about why the request could not be fulfilled as it highly unlikely that it would usefull anyway (it would however be usefull to provide back to the user, if a second or third request fails consecutively).
MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter" UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be processed"
How is this different from "invalid" in the previous? Is this a "i know this should be supported, but it isn't" ? Both of these would probably be equivalent to HTTP 400 (BAD_REQUEST), in which case a request should not be retried with being modified. While I can see the distinction between a missing, invalid, or unsupported, the end result is the same - human intervention is needed. In the case where the service knows the semantics, but hasn't implemented it, the HTTP 501 (NOT_IMPLEMENTED) would be suitable.
ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId"
Maybe "CONNECTION_EXISTS" or "CONNECTION_CONFLICT" as name. This would be equivalent to HTTP 409 (CONFLICT).
DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId"
Maybe "CONNECTION_NONEXISTENT". This would be equivalent to HTTP 404 (NOT_FOUND)
MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials"
The termin "Missing security is highly misleading. I strong suggest something else, perhaps: "UNUATHORIZED". Would be equivalent to HTTP 401 (UNAUTHORIZED).
TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database" TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA"
3 or 4 consecutive nouns following each other make a rather poor error name IMHO. Also topology and NSI are so interwoven, that we don't really need the topology word. How about "UNKNOWN_STP"? Do we expect the latter error to ever come up (we know the stp, but not the nsa for it - i would call this a topology description error). These would correspond to HTTP 422 (UNPROCESABLE_ENTITY).
PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation
Do we really need to say path twice? How about "NO_PATH_FOUND". For http this would probably also be 422, though this one does not have a clear fit.
INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message"
Invalid state has a bad ring to it. How about "INVALID_TRANSITION. For http this would be 422, though 405/406 could be misued for them.
INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure"
Would correspond to 500.
INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure"
The distinction between NSA and NRM is an artificial one, and in some cases they are the same (e.g., OpenNSA can speak directly to JunOS boxes). For the client, the result is the same: "The thing in the other didn't work". For humans/operators the distinction is important, but I would say the error code is for clients, and the error string for humans.
STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use"
I would call this "STP_UNAVALABLE", as we are dealing with a time span for the reservation. The "In use" reflects a current sitauation, which is rarely the case for us. The message should be something like "Specified STP not available in specified time span". In HTTP this one is a bit tricky, but 422 is probably the best fitting.
BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation"
Would people like to add to the list?
Maybe something for a connection which used to exist, but is now terminated or no longer available. I know this could fall under "DOES_NOT_EXIST", or "INVALID_STATE", but none of these actually capture what happened. This would be equivalent to 410 (GONE). Furthermore, something stating the the resource is not available for the specified user could be appropiate (corresponding to 401 (UNAUTHORIZED) in http. I've given mappings to HTTP status code the error codes. Most mappins are straightforward, but a couple are a bit edge and can be discussed. While not perfect, http codes are well understood by many developers, have clear semantics for request retry and modification, and have been well tested over a significant amount of time. Why do we need to invent our own? Of course we would only adapt the 400/500 class codes as the other classes does not make sense with our current protocol model. Best regards, Henrik Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility.
On 2011-12-21, at 5:44 AM, Henrik Thostrup Jensen wrote:
What does the SVC prefix stand for? (and why a prefix at all, and why not "ERR" or "ERROR", which would be somewhat more intuituve.
I had hoped that SVC would stand for service errors, POL for policy errors, etc. This is how it was done in the 3GPP ParlayX web services specification. The first three characters providing the error classification, and the remain the error number within that classification. Of course, we could just enumerate all of the errors linearly without a specific classification.
I think we also need a plan with the error codes and their classification. Do we provide the errors in order to tell a user went wrong (in which case a string will suffice), or do we provide error codes so a client can intelligently handle some cases, or both?
The answer is probably the latter, with some semantics for errorId, which can enable the client to automatically classify and potentially recover from the error. The distinction between text and variables are somewhat artifial and only makes sense for missing of invalid parameters. If we assume that the error string is for humans only, the distincition between:
text: Missing parameters: Start time, Dest STP.
and
text: Missing parameters variables: ["Start time", "Dest STP"]
Is just unneeded complexity. If a client is missing a parameter, it probably won't be able to change the request and fill it out automatically by looking at the error response. It should just be fixed and send the parameter in the first place.
What could make sense is that the NSI agent replies that it understands the request, but could for some reason not fulfill it, e.g., a path could not be found. In this case the client could retry the request elsewhere. However the client should not care about why the request could not be fulfilled as it highly unlikely that it would usefull anyway (it would however be usefull to provide back to the user, if a second or third request fails consecutively).
We need to consider two primary uses for the error information we are providing. The first is debugging of message content errors, messages out of order, timing, etc. A programmer should be able to integrate against our protocol implementations and have the high runner errors covered by error return codes, and not the classic "have a look in the debug file on the remote system." This is not scalable from a support perspective, and providing generalized access to debug files violate Jerry number one rule about privacy :-) The second use is for runtime recovery from errors. I would not expect to be able to dynamically recover from a syntactic error like missing startTime, but I should be able to recover from runtime type errors like "bandwidth unavailable."
MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter" UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be processed"
How is this different from "invalid" in the previous? Is this a "i know this should be supported, but it isn't" ?
The unsupported option is meant to identify things that can be part of the protocol but not supported by this specific implementation. For example, if someone passes UNIDIRECTIONAL connection requests when you only support BIDIRECTIONAL, PROTECTED when you only support UNPROTECTED, or perhaps an unsupported TechnologySpecificAttributesType, etc.
Both of these would probably be equivalent to HTTP 400 (BAD_REQUEST), in which case a request should not be retried with being modified. While I can see the distinction between a missing, invalid, or unsupported, the end result is the same - human intervention is needed.
Obviously, using HTTP error codes would make sense if this was a RESTful interface implementation and we were using HTTP errors to respond, however, we do not need to restrict ourselves to the limited HTTP error codes within our NSI implementation.
In the case where the service knows the semantics, but hasn't implemented it, the HTTP 501 (NOT_IMPLEMENTED) would be suitable.
NOT_IMPLEMENTED is a good suggestion.
ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId"
Maybe "CONNECTION_EXISTS" or "CONNECTION_CONFLICT" as name.
Strings like "MISSING_PARAMETER" or "UNSUPPORTED_OPTION" are the internal constant names within my implementation and I did not expect them to be used outside of my code base. Really, only the SVC/POL string and error string were to be used. If we want to replace the SVC/POL error strings with strings such as "CONNECTION_EXISTS" I am okay with this.
This would be equivalent to HTTP 409 (CONFLICT).
DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId"
Maybe "CONNECTION_NONEXISTENT".
This would be equivalent to HTTP 404 (NOT_FOUND)
MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials"
The termin "Missing security is highly misleading. I strong suggest something else, perhaps: "UNUATHORIZED".
Good suggestion.
Would be equivalent to HTTP 401 (UNAUTHORIZED).
TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database" TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA"
3 or 4 consecutive nouns following each other make a rather poor error name IMHO. Also topology and NSI are so interwoven, that we don't really need the topology word. How about "UNKNOWN_STP"?
Sounds good.
Do we expect the latter error to ever come up (we know the stp, but not the nsa for it - i would call this a topology description error).
Having implemented a number of protocols in production I have become a paranoid defensive programmer. So yes, it will come up ;-)
These would correspond to HTTP 422 (UNPROCESABLE_ENTITY).
PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation
Do we really need to say path twice? How about "NO_PATH_FOUND".
I want to make sure it sinks in! Actually, I was intending to use "PATH_COMPUTATION_" as a prefix to more path computation related errors.
For http this would probably also be 422, though this one does not have a clear fit.
INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message"
Invalid state has a bad ring to it. How about "INVALID_TRANSITION.
For http this would be 422, though 405/406 could be misued for them.
INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure"
Would correspond to 500.
INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure"
The distinction between NSA and NRM is an artificial one, and in some cases they are the same (e.g., OpenNSA can speak directly to JunOS boxes). For the client, the result is the same: "The thing in the other didn't work". For humans/operators the distinction is important, but I would say the error code is for clients, and the error string for humans.
STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use"
I would call this "STP_UNAVALABLE", as we are dealing with a time span for the reservation. The "In use" reflects a current sitauation, which is rarely the case for us. The message should be something like "Specified STP not available in specified time span".
In HTTP this one is a bit tricky, but 422 is probably the best fitting.
BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation"
Would people like to add to the list?
Maybe something for a connection which used to exist, but is now terminated or no longer available. I know this could fall under "DOES_NOT_EXIST", or "INVALID_STATE", but none of these actually capture what happened.
This would be equivalent to 410 (GONE).
Furthermore, something stating the the resource is not available for the specified user could be appropiate (corresponding to 401 (UNAUTHORIZED) in http.
I've given mappings to HTTP status code the error codes. Most mappins are straightforward, but a couple are a bit edge and can be discussed. While not perfect, http codes are well understood by many developers, have clear semantics for request retry and modification, and have been well tested over a significant amount of time. Why do we need to invent our own? Of course we would only adapt the 400/500 class codes as the other classes does not make sense with our current protocol model.
Best regards, Henrik
Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility. _______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
On 1/3/12 9:15 AM, John MacAuley wrote:
On 2011-12-21, at 5:44 AM, Henrik Thostrup Jensen wrote:
I think we also need a plan with the error codes and their classification. Do we provide the errors in order to tell a user went wrong (in which case a string will suffice), or do we provide error codes so a client can intelligently handle some cases, or both?
The answer is probably the latter, with some semantics for errorId, which can enable the client to automatically classify and potentially recover from the error. The distinction between text and variables are somewhat artifial and only makes sense for missing of invalid parameters. If we assume that the error string is for humans only, the distincition between:
text: Missing parameters: Start time, Dest STP.
and
text: Missing parameters variables: ["Start time", "Dest STP"]
Is just unneeded complexity. If a client is missing a parameter, it probably won't be able to change the request and fill it out automatically by looking at the error response. It should just be fixed and send the parameter in the first place.
It seems there are some key "classes" of errors with some relation to the stage at which it is generated: a) primitive formatting errors (a request primitive is ill-formed), b) protocol errors (confuses the state machine/flow chart), c) service request processing errors (e.g. Aggregator issues between NSAs) d) service request resource errors (the uPA/NRM cannot fill the request), and e) service instance failure errors (an otherwise good service breaks). Are there other classifications? (I don't recall the details of prior discussions on this.) These classes are listed in order of detection ...(sort of I think..:-) So a "service request resource error" implicitly means the request was well formed and conformed to protocol, and the message was processed to reach an appropriate NSA, but the resources required were incorrect or unavailable. (Note: the order above loops - as each message flows from NSA to NSA the gantlet of checks repeats) The order for detecting more detailed error conditions may be less than obvious: If a service resource request is rejected because available resources were not available for that user...is this a resource unavailable? or unauthorized? It depends on how you prioritize your constraints. If you prune the constraint based search by authorization policy first, and resources remain that are authorized for that user, but then the resources needed to reach his particular destination are already in use, then this is a "resource unavailable" error - regardless of whether other [unauthorized] resources are available and would have met the technical requirements for the connection. Likewise, if the technical requirements are used to prune the resources first, and then the user fails on authorization of the sole path to his destination, then this would generate a "unauthorized" error. And in either case, if the requested time window is blocked, but the path finder failed on "resource unavailable" or "unauthorized" then the path finder will terminate and the user will not know that his schedule was flawed as well. The request actually had several problems. (And if you tell a path finder to "continue searching" despite missed constraints, your search space explodes, and the results (or errors) may not even be relevant.) So we can re-order constraints, but we cannot ignore them. In general - the universe of possible primitive formulations are overwhelmingly "erroneous" in some fashion - only a very few intelligently constructed primitives are actually going pass the gauntlet to be successful. So our approach should not be to enumerate all possible errors we may encounter, but to classify the errors in some broad fashion and communicate that class only. We can refine some error classes to define more specific classes where they may be (a) [relatively] frequent/common errors, or (b) expose a key or "critical" condition (for the RA or the PA). just some thoughts... Jerry
participants (3)
-
Henrik Thostrup Jensen
-
Jerry Sobieski
-
John MacAuley