On Thu, 15 Dec 2011, John MacAuley wrote:
I took and action to start the error handling discussion so that we, as a group, can document the error messages and behaviors. I would like to start it off with when an NSIServiceExcepti on is returned as a SOAP fault to a request, and when it is returned in a specific failed resp onse message.
OpenDRAC ...
[snip] So a lot of these are more policy than mechanism, and could be subject to change. Lets focus on the error codes. What does the SVC prefix stand for? (and why a prefix at all, and why not "ERR" or "ERROR", which would be somewhat more intuituve.
Here is a list of error messages currently implemented in OpenDRAC. The list continues to exp and. I have kept the text generic with the specific error values being returned in the associ ated attribute list. We will also need to agree on the format of the message/errorId.
I think we also need a plan with the error codes and their classification. Do we provide the errors in order to tell a user went wrong (in which case a string will suffice), or do we provide error codes so a client can intelligently handle some cases, or both? The answer is probably the latter, with some semantics for errorId, which can enable the client to automatically classify and potentially recover from the error. The distinction between text and variables are somewhat artifial and only makes sense for missing of invalid parameters. If we assume that the error string is for humans only, the distincition between: text: Missing parameters: Start time, Dest STP. and text: Missing parameters variables: ["Start time", "Dest STP"] Is just unneeded complexity. If a client is missing a parameter, it probably won't be able to change the request and fill it out automatically by looking at the error response. It should just be fixed and send the parameter in the first place. What could make sense is that the NSI agent replies that it understands the request, but could for some reason not fulfill it, e.g., a path could not be found. In this case the client could retry the request elsewhere. However the client should not care about why the request could not be fulfilled as it highly unlikely that it would usefull anyway (it would however be usefull to provide back to the user, if a second or third request fails consecutively).
MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter" UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be processed"
How is this different from "invalid" in the previous? Is this a "i know this should be supported, but it isn't" ? Both of these would probably be equivalent to HTTP 400 (BAD_REQUEST), in which case a request should not be retried with being modified. While I can see the distinction between a missing, invalid, or unsupported, the end result is the same - human intervention is needed. In the case where the service knows the semantics, but hasn't implemented it, the HTTP 501 (NOT_IMPLEMENTED) would be suitable.
ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId"
Maybe "CONNECTION_EXISTS" or "CONNECTION_CONFLICT" as name. This would be equivalent to HTTP 409 (CONFLICT).
DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId"
Maybe "CONNECTION_NONEXISTENT". This would be equivalent to HTTP 404 (NOT_FOUND)
MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials"
The termin "Missing security is highly misleading. I strong suggest something else, perhaps: "UNUATHORIZED". Would be equivalent to HTTP 401 (UNAUTHORIZED).
TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database" TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA"
3 or 4 consecutive nouns following each other make a rather poor error name IMHO. Also topology and NSI are so interwoven, that we don't really need the topology word. How about "UNKNOWN_STP"? Do we expect the latter error to ever come up (we know the stp, but not the nsa for it - i would call this a topology description error). These would correspond to HTTP 422 (UNPROCESABLE_ENTITY).
PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation
Do we really need to say path twice? How about "NO_PATH_FOUND". For http this would probably also be 422, though this one does not have a clear fit.
INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message"
Invalid state has a bad ring to it. How about "INVALID_TRANSITION. For http this would be 422, though 405/406 could be misued for them.
INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure"
Would correspond to 500.
INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure"
The distinction between NSA and NRM is an artificial one, and in some cases they are the same (e.g., OpenNSA can speak directly to JunOS boxes). For the client, the result is the same: "The thing in the other didn't work". For humans/operators the distinction is important, but I would say the error code is for clients, and the error string for humans.
STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use"
I would call this "STP_UNAVALABLE", as we are dealing with a time span for the reservation. The "In use" reflects a current sitauation, which is rarely the case for us. The message should be something like "Specified STP not available in specified time span". In HTTP this one is a bit tricky, but 422 is probably the best fitting.
BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation"
Would people like to add to the list?
Maybe something for a connection which used to exist, but is now terminated or no longer available. I know this could fall under "DOES_NOT_EXIST", or "INVALID_STATE", but none of these actually capture what happened. This would be equivalent to 410 (GONE). Furthermore, something stating the the resource is not available for the specified user could be appropiate (corresponding to 401 (UNAUTHORIZED) in http. I've given mappings to HTTP status code the error codes. Most mappins are straightforward, but a couple are a bit edge and can be discussed. While not perfect, http codes are well understood by many developers, have clear semantics for request retry and modification, and have been well tested over a significant amount of time. Why do we need to invent our own? Of course we would only adapt the 400/500 class codes as the other classes does not make sense with our current protocol model. Best regards, Henrik Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility.