On 2011-12-21, at 5:44 AM, Henrik Thostrup Jensen wrote:

What does the SVC prefix stand for? (and why a prefix at all, and why not "ERR" or "ERROR", which would be somewhat more intuituve.

I had hoped that SVC would stand for service errors, POL for policy errors, etc.  This is how it was done in the 3GPP ParlayX web services specification.  The first three characters providing the error classification, and the remain the error number within that classification.  Of course, we could just enumerate all of the errors linearly without a specific classification.


I think we also need a plan with the error codes and their classification. Do we provide the errors in order to tell a user went wrong (in which case a string will suffice), or do we provide error codes so a client can intelligently handle some cases, or both?

The answer is probably the latter, with some semantics for errorId, which can enable the client to automatically classify and potentially recover from the error. The distinction between text and variables are somewhat artifial and only makes sense for missing of invalid parameters. If we assume that the error string is for humans only, the distincition between:

text: Missing parameters: Start time, Dest STP.

and

text: Missing parameters
variables: ["Start time", "Dest STP"]

Is just unneeded complexity. If a client is missing a parameter, it probably won't be able to change the request and fill it out automatically by looking at the error response. It should just be fixed and send the parameter in the first place.

What could make sense is that the NSI agent replies that it understands the request, but could for some reason not fulfill it, e.g., a path could not be found. In this case the client could retry the request elsewhere. However the client should not care about why the request could not be fulfilled as it highly unlikely that it would usefull anyway (it would however be usefull to provide back to the user, if a second or third request fails consecutively).


We need to consider two primary uses for the error information we are providing.  The first is debugging of message content errors, messages out of order, timing, etc.  A programmer should be able to integrate against our protocol implementations and have the high runner errors covered by error return codes, and not the classic "have a look in the debug file on the remote system."   This is not scalable from a support perspective, and providing generalized access to debug files violate Jerry number one rule about privacy :-)

The second use is for runtime recovery from errors.  I would not expect to be able to dynamically recover from a syntactic error like missing startTime, but I should be able to recover from runtime type errors like "bandwidth unavailable."


MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter"
UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be
processed"

How is this different from "invalid" in the previous? Is this a "i know this should be supported, but it isn't" ?

The unsupported option is meant to identify things that can be part of the protocol but not supported by this specific implementation.  For example, if someone passes UNIDIRECTIONAL connection requests when you only support BIDIRECTIONAL, PROTECTED when you only support UNPROTECTED, or perhaps an unsupported TechnologySpecificAttributesType, etc.

Both of these would probably be equivalent to HTTP 400 (BAD_REQUEST), in which case a request should not be retried with being modified. While I can see the distinction between a missing, invalid, or unsupported, the end result is the same - human intervention is needed.

Obviously, using HTTP error codes would make sense if this was a RESTful interface implementation and we were using HTTP errors to respond, however, we do not need to restrict ourselves to the limited HTTP error codes within our NSI implementation.


In the case where the service knows the semantics, but hasn't implemented it, the HTTP 501 (NOT_IMPLEMENTED) would be suitable.

NOT_IMPLEMENTED is a good suggestion.


ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId"

Maybe "CONNECTION_EXISTS" or "CONNECTION_CONFLICT" as name.

Strings like "MISSING_PARAMETER" or "UNSUPPORTED_OPTION" are the internal constant names within my implementation and I did not expect them to be used outside of my code base.  Really, only the SVC/POL string and error string were to be used.  If we want to replace the SVC/POL error strings with strings such as "CONNECTION_EXISTS" I am okay with this.


This would be equivalent to HTTP 409 (CONFLICT).

DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId"

Maybe "CONNECTION_NONEXISTENT".

This would be equivalent to HTTP 404 (NOT_FOUND)

MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials"

The termin "Missing security is highly misleading. I strong suggest something else, perhaps: "UNUATHORIZED".

Good suggestion.


Would be equivalent to HTTP 401 (UNAUTHORIZED).

TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database"
TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA"

3 or 4 consecutive nouns following each other make a rather poor error name IMHO. Also topology and NSI are so interwoven, that we don't really need the topology word. How about "UNKNOWN_STP"?

Sounds good.

Do we expect the latter error to ever come up (we know the stp, but not the nsa for it - i would call this a topology description error).

Having implemented a number of protocols in production I have become a paranoid defensive programmer.  So yes, it will come up ;-)


These would correspond to HTTP 422 (UNPROCESABLE_ENTITY).

PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation

Do we really need to say path twice? How about "NO_PATH_FOUND".

I want to make sure it sinks in!  Actually, I was intending to use "PATH_COMPUTATION_" as a prefix to more path computation related errors. 
 

For http this would probably also be 422, though this one does not have a clear fit.

INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message"

Invalid state has a bad ring to it. How about "INVALID_TRANSITION.

For http this would be 422, though 405/406 could be misued for them.

INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure"

Would correspond to 500.

INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure"

The distinction between NSA and NRM is an artificial one, and in some cases they are the same (e.g., OpenNSA can speak directly to JunOS boxes). For the client, the result is the same: "The thing in the other didn't work". For humans/operators the distinction is important, but I would say the error code is for clients, and the error string for humans.

STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use"

I would call this "STP_UNAVALABLE", as we are dealing with a time span for the reservation. The "In use" reflects a current sitauation, which is rarely the case for us. The message should be something like "Specified STP not available in specified time span".

In HTTP this one is a bit tricky, but 422 is probably the best fitting.

BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation"

Would people like to add to the list?

Maybe something for a connection which used to exist, but is now terminated or no longer available. I know this could fall under "DOES_NOT_EXIST", or "INVALID_STATE", but none of these actually capture what happened.

This would be equivalent to 410 (GONE).

Furthermore, something stating the the resource is not available for the specified user could be appropiate (corresponding to 401 (UNAUTHORIZED) in http.


I've given mappings to HTTP status code the error codes. Most mappins are straightforward, but a couple are a bit edge and can be discussed. While not perfect, http codes are well understood by many developers, have clear semantics for request retry and modification, and have been well tested over a significant amount of time. Why do we need to invent our own? Of course we would only adapt the 400/500 class codes as the other classes does not make sense with our current protocol model.


   Best regards, Henrik

Henrik Thostrup Jensen <htj at ndgf.org>
NORDUnet / Nordic Data Grid Facility.
_______________________________________________
nsi-wg mailing list
nsi-wg@ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg