Post-Rio dessimination of all things NSI
Hi everyone (long email, get coffee) With the demo in Rio done, it is time to reflect a bit over the current protocol and state before moving. I've collected a list of issues over the last month or so, but have intentionally not communicated them as it was more important to have things ready for Rio. Protocol: The schedule has a start time, end time and duration. - AFAICT we do not need duration. Can anyone explain what we need it for? Is there a rationale for the minimum and maximum bandwidth? (I entered the NSI community a bit late, so humor me). The callback model makes it very non obvious to handle failures. - In general, dealing with network problems has not been thought through. - This becames fairly clear when we had problems establishing connection. There is nothing specified on how to continue from there (i.e., who carries the responsibility for propagating state updates). The TechnologySpecificAttributes does not seem to have any usage, except "future compatability", but are there no examples or use cases for them. - I suggest removal, and then adding specific fields if we need them later. The reservationConfirmed message includes all the reservation details. Is this really necessary? Couldn't it just be a simple acknowledgement. I would consider adding a state to indicate that the connection failed for some reason. The terminated state is a bit broad in what it describes. Having two ways of querying seems like one too many. Remember that both have to implemented. Is there a reason we cannot just have one? The term "NsiExceptionType" seem to have gotten into the spec. It should be called serviceException. The messageId in the serviceException should either have a number of options or not be there. - It should probably be called errorId as well (it doesn't identify a message). - A possiblity could be to adopt the HTTP error codes. The text and variables solution in serviceException seems like overkill. Why not just have the text there? Quite often it is not clear what fields are required in a message and which are not. WSDL: XML Schema has a value "Terminateing" for ConnectionState. xsd:dateTime allows value without a timezone, which is problematic. - I suggest that the protocol dictates that all protocol timestamps should be in zulu time (which is really the only sensible thing to send over a wire IMHO) WSDL specifies a reservation.reservation, which is somewhat unfortunate. I suggestion reservation.reservationInfo connectionId is enforced as a UUID, which is not tune with the protocol spec. which specifies that the connectionId only has to be unique within the requester NSA scope. ServiceException should probably be called "serviceException" to follow the naming convention. I've also compiled a list of issues which have been confusing people. The purpose of this list is simply have a spec which is easy(er) to implement, which IMHO is very important quality of standard (and one we are far from). - The URN prefixes - Requester / provider role fields - replyTo / addressing in general - Distributed development - Reordering of messages from "logical order" - Bad error messages from SOAP/WSDL stacks (and probably other things as well) - XML/WSDL namespaces - Security I have some comments/suggestions as well: The URN prefixes are just prefixes. They do not add any value, and have been a source of confusion. I suggest we remove them. I'm not sure the requester/provider role fields are really necessary. It should be clear from the security context (I'll get back to that), who it is one is communicating with. The replyTo fields seems to me like being a surrogate for dealing with lack of addressing for not having topology done. On the other hand they also make it a lot easier to make clients as a client does not have to exist in the topology to be able to comminicate with an NSA. I suggest we think a bit over how we want this to work, and how we want to support potentially short-lived clients creating connections (because something needs to initiate connection creation). The distributed collaboration with developing NSI agents was initally a bit fuzzy and hindered by some barriers. The skype room improved this a lot by bringing down the latency in developer communication. Reordering of messages from "logical order". I still think the protocol design is a bit clumsy, especially when combined with the lack of how to handle network errors (unavailable hosts, etc.) and short-lived clients. I've been thinking a bit about it, but haven't really come up with any substantial. Some people (me included, if not especially) have struggled quite a bit with their SOAP/WSDL stacks, and the lack of checks and bugs in there. However several people have also been puzzled with the error messages from them .e.g., getting an integer parsing error when an element was simply missing. Somewhat similar, I had an issue where my SOAP stack used the wrong namespace on an element. The issue here was no so much the bug in the SOAP stack (it happens), but that I could not figure out what was right and wrong by looking at the WSDLs. Security. Let me be very clear here. HTTP basic and some SAML attributes have nothing to with security. It did not provide integrity, confidentiality, or assurance. I am also puzzled by the choice of SAML, as SAML is intended for communication between identity providers and services, but there are no identity providers in NSI. Also, I don't think that anyone in this group actually understands SAML (I don't). My suggestion for security is to use TLS with certificates (from a recognized CA) in each end, and nothing more. It is not the most trivial thing in the world (but isn't really difficult either), but it fairly well understood and has widespread support. Lastly, I think the group is suffering heavily from having thought too much and having constructued to little. This has of somewhat changed after Rio, but I fear that we are now so far into the process with writing the standard, that it is difficult to have any big changes done, as people do not want to change what has already been made. The project group also still operates in very ad-hoc fashion. This can work great to a certain extent, but I think that limit has been crossed some time ago. We need to get better organized, but this is not really my area of expertise, so I won't suggest something. I'm on vacation until monday, so take your time replying :-) Best regards, Henrik Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility.
Comments in line. Before I start I do what to make a generalized comment about the complexity of the NSI protocol. Going through the implementation this last couple of weeks really opened my eyes. I had been grumbling about it when writing the reference WSDL, but I would like to have a discussion at the weekly NSI call (or maybe dial into OGF next week) so we can make sure some of the requirements driving the complexity are really not just nice to haves. I an not talking about namespaces or topology, I am referring to requirements that have forced us into long duration operations and my currently very least favourite provision/release operations.
Protocol:
The schedule has a start time, end time and duration. - AFAICT we do not need duration. Can anyone explain what we need it for?
We had it in Fenius and many NRMs support it as well. Programmatically it can be achieved with endTime unless we decide it has a specific behaviour for reservations that start now, when now make take a while to setup. This would be extremely complicated to coordinate so i think we should probably take it out.
Is there a rationale for the minimum and maximum bandwidth? (I entered the NSI community a bit late, so humor me).
Some services being offered by networks have flexible bandwidth capabilities such as bursting capabilities to higher bandwidth when other circuits are idle. Providing a minimum would put a floor (committed) on the reservation bandwidth.
The callback model makes it very non obvious to handle failures. - In general, dealing with network problems has not been thought through. - This becames fairly clear when we had problems establishing connection. There is nothing specified on how to continue from there (i.e., who carries the responsibility for propagating state updates).
Yes, the callback model is much more complex but was required based on the input requirements. We have discussed a number of strategies to handle these failure scenarios. First off we need to have retries on sending for requests, confirmed, and failed messages. Obviously, if we are having authentication failures no number of retries will fix the problem. If your retry timers run out, then you have not option but to toss the message. In this situation the requestingNSA may reissue the request again, or do a query operation to determine if the state of the state machine on the providerNSA. Similarly, the providerNSA may query the requesterNSA to see if their states match for the connection in question.
The TechnologySpecificAttributes does not seem to have any usage, except "future compatability", but are there no examples or use cases for them. - I suggest removal, and then adding specific fields if we need them later.
There are there specifically to let service providers add service specific attributes such as frame size, QoS, SLRG, etc parameters into a request. The protocol would not need to change, but the underlying implementation would to support the parameters. We did not want to revise the protocol every time a new service parameter was defined.
The reservationConfirmed message includes all the reservation details. Is this really necessary? Couldn't it just be a simple acknowledgement.
The reason we included the reservation details in the message is that the original reservation request is a "space" that can be more fully qualified by the providerNSA when satisfying the request. For example, if I specify a number of mandatory and desired parameters in the reservation request, the reservationConfirmed will hold the parameters satisfied by the reservation. It is more work but is needed.
I would consider adding a state to indicate that the connection failed for some reason. The terminated state is a bit broad in what it describes.
Yes, I had originally proposed that as well. I would also like to formalize the minimum length a reservation will remain the in the NSA when in the terminated state so that it can be queried.
Having two ways of querying seems like one too many. Remember that both have to implemented. Is there a reason we cannot just have one?
There is only a single query message, however, it can go from requester to provider, or provider to requester. This was needed for some of the error handling cases. Is this what you mean?
The term "NsiExceptionType" seem to have gotten into the spec. It should be called serviceException.
Service is a really overloaded term. Do you mean an NSI protocol exception or a service reservation exception?
The messageId in the serviceException should either have a number of options or not be there.
The agreement was that we would enumerate through the implmentation but not force a set into the XSD so we have flexibility.
- It should probably be called errorId as well (it doesn't identify a message).
Good suggestion.
- A possiblity could be to adopt the HTTP error codes.
That is a possibility.
The text and variables solution in serviceException seems like overkill. Why not just have the text there?
Here is an example that shows the additional flexibility. One generic "Invalid or missing parameter" error message and the parameters causing the issue in the variables. <messageId>SVC0001</messageId> <text>Invalid or missing parameter</text> <variables> <Attribute Name="replyTo" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic"> <AttributeValue xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><null></AttributeValue> </Attribute> </variables>
Quite often it is not clear what fields are required in a message and which are not.
Yes, we definitely need more documentation round these.
WSDL:
XML Schema has a value "Terminateing" for ConnectionState.
Thank you - I will fix that.
xsd:dateTime allows value without a timezone, which is problematic. - I suggest that the protocol dictates that all protocol timestamps should be in zulu time (which is really the only sensible thing to send over a wire IMHO)
I did look up the dateType specification and am sorry I missed this. I always use the Java XMLGregorianCalendarImpl class which puts the timezone on by default. We will put this on the list of items to resolve.
WSDL specifies a reservation.reservation, which is somewhat unfortunate. I suggestion reservation.reservationInfo
Yes. This will be done.
connectionId is enforced as a UUID, which is not tune with the protocol spec. which specifies that the connectionId only has to be unique within the requester NSA scope.
Yes, we will change this. Local uniqueness only causes implementation issues with zero upside value. You then need to maintain tuples for uniqueness. NSI is complicated enough without this added complexity.
ServiceException should probably be called "serviceException" to follow the naming convention.
I bounced back and forth on this during definition. I need to remember why.
I've also compiled a list of issues which have been confusing people. The purpose of this list is simply have a spec which is easy(er) to implement, which IMHO is very important quality of standard (and one we are far from).
- The URN prefixes
We need to use namespaces to allow for flexibility when other namespaces are needed to be used. We must remember that a good protocol can be used flexibly.
- Requester / provider role fields
I was concerned with these originally as well. If we remove the replyTo and place it in topology definition as the csRequesterEndpoint then we will at least need the requesterNSA attribute. Based on the current spec we should rename these fields if they do not hold an NSA URN.
- replyTo / addressing in general
As above. I think we need to maintain the flexibility to support both one and two endpoints.
- Distributed development - Reordering of messages from "logical order"
Does logical order refer to protocol order or order operations were issues. In any distributed highly parallel system message ordering is hard to maintain when multiple thread processing is involved. There are queuing strategies to handle some of this but the best mechanism is the requester serializing :-)
- Bad error messages from SOAP/WSDL stacks (and probably other things as well)
Please provider some examples.
- XML/WSDL namespaces
You need to get your stack fixed :-)
- Security
Definitely. I locked a lot of people out.
I have some comments/suggestions as well:
The URN prefixes are just prefixes. They do not add any value, and have been a source of confusion. I suggest we remove them.
Namespaces are needed for flexibility. If we remove them then NSI can only work with the naming structures we define. I really want to avoid this if possible. You should not even be looking into them anyways. String match only. I proposed a label for display name in the topology file so we can have both uniqueness and something for people to display in GUIs.
I'm not sure the requester/provider role fields are really necessary. It should be clear from the security context (I'll get back to that), who it is one is communicating with.
This does need discussion.
The replyTo fields seems to me like being a surrogate for dealing with lack of addressing for not having topology done. On the other hand they also make it a lot easier to make clients as a client does not have to exist in the topology to be able to comminicate with an NSA. I suggest we think a bit over how we want this to work, and how we want to support potentially short-lived clients creating connections (because something needs to initiate connection creation).
It was mirroring the capabilities of WS-Addressing, but I am okay if we address it through NSA topology.
The distributed collaboration with developing NSI agents was initally a bit fuzzy and hindered by some barriers. The skype room improved this a lot by bringing down the latency in developer communication.
The only better way is to get everyone in a single room with beer.
Reordering of messages from "logical order". I still think the protocol design is a bit clumsy, especially when combined with the lack of how to handle network errors (unavailable hosts, etc.) and short-lived clients. I've been thinking a bit about it, but haven't really come up with any substantial.
Please clarify "logical ordering" and we can discuss.
Some people (me included, if not especially) have struggled quite a bit with their SOAP/WSDL stacks, and the lack of checks and bugs in there. However several people have also been puzzled with the error messages from them .e.g., getting an integer parsing error when an element was simply missing.
Might be an issue with a mandatory field not being provided and the error handling of your stack being coded by a bunch of monkeys :-)
Somewhat similar, I had an issue where my SOAP stack used the wrong namespace on an element. The issue here was no so much the bug in the SOAP stack (it happens), but that I could not figure out what was right and wrong by looking at the WSDLs.
Takes an experienced eye for these things. Sorry I couldn't help earlier to identify the problem.
Security. Let me be very clear here. HTTP basic and some SAML attributes have nothing to with security. It did not provide integrity, confidentiality, or assurance. I am also puzzled by the choice of SAML, as SAML is intended for communication between identity providers and services, but there are no identity providers in NSI. Also, I don't think that anyone in this group actually understands SAML (I don't). My suggestion for security is to use TLS with certificates (from a recognized CA) in each end, and nothing more. It is not the most trivial thing in the world (but isn't really difficult either), but it fairly well understood and has widespread support.
There are two solutions required here. The first is NSA-to-NSA security and the second is end user "session" security. Each as different requirements and solutions. The proposed security solution we have discussed and agreed upon for NSA-to-NSA security is: 1. TLS with mutual authentication for encryption and confidentiality (and transport authentication). 2. HTTP Basic authentication for authentication. Yes this seems like double the effort, but BASIC is supported in software stacks via JAAS. TLS certificates are not typically supported for application level security, however, there are ways to get access to it. 3. SOAP digital signatures for message integrity. The sessionSecurityAttr hold authentication and authorization information for the end user. There is a security proposal document that describes the use of this element, they types of roles supported, and how certificates could be past for integration into existing security solutions for the end user. We do not use these fields for NSA authentication.
Lastly, I think the group is suffering heavily from having thought too much and having constructued to little. This has of somewhat changed after Rio, but I fear that we are now so far into the process with writing the standard, that it is difficult to have any big changes done, as people do not want to change what has already been made. The project group also still operates in very ad-hoc fashion. This can work great to a certain extent, but I think that limit has been crossed some time ago. We need to get better organized, but this is not really my area of expertise, so I won't suggest something.
Well, some of the people in the NSI working group are a bit overworked and this was before implementations. I had offered previously to coordinate development efforts, and I think we need to get a focused implementors group together with dedicated mailing lists and resources. The GLIF had been looking to form a working group to do this, but I think we can run something very informal. I will offer it up again. It is never too late to change the protocol. This effort was a proof point for the protocol so we could see that needs improvement before official publication of an endorsed protocol. People know the current version is not final, and I have no issue changing it if needed. There was a lot of stressing in the last day, so there were some complaints flying, but I think everyone did a great job. One issue I heard pop up was the excessive chatter on the skype IM session from people, many of who were not actually testing implementations. People were excited and wanted to help however they could, so I can't fault anyone, but it forced a few of us to break off into individual chat sessions to focus on the task at hand. We will need to be careful of this in the future. We need to push on to SuperComputing with a better coordinated effort. I know many people are taking a deep breath and will spend the next week catching up on work they ignored over the last couple of weeks. I will try to kick off the more organized effort next week.
I'm on vacation until monday, so take your time replying :-)
Lucky you.
Best regards, Henrik
Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility. _______________________________________________ nsi-wg mailing list nsi-wg@ogf.org http://www.ogf.org/mailman/listinfo/nsi-wg
Hi, thanks for answering. Many/most of these replies here are for the group as a whole, I am not targeting John :-). On Wed, 14 Sep 2011, John MacAuley wrote:
Before I start I do what to make a generalized comment about the complexity of the NSI protocol. Going through the implementation this last couple of weeks really opened my eyes. I had been grumbling about it when writing the reference WSDL, but I would like to have a discussion at the weekly NSI call (or maybe dial into OGF next week) so we can make sure some of the requirements driving the complexity are really not just nice to haves. I an not talking about namespaces or topology, I am referring to requirements that have forced us into long duration operations and my currently very least favourite provision/release operations.
Well, sure. But I think we have to differentiate between complexity from the problem area (provisioning of network connections), and "accidental" complexity arising from the protocol we create to solve the problem. The latter should be kept to a minimum :-).
Protocol:
The schedule has a start time, end time and duration. - AFAICT we do not need duration. Can anyone explain what we need it for?
We had it in Fenius and many NRMs support it as well. Programmatically it can be achieved with endTime unless we decide it has a specific behaviour for reservations that start now, when now make take a while to setup. This would be extremely complicated to coordinate so i think we should probably take it out.
I think taking it out is the right think. We don't want ambiguity in how to define when something should start and end.
Is there a rationale for the minimum and maximum bandwidth? (I entered the NSI community a bit late, so humor me).
Some services being offered by networks have flexible bandwidth capabilities such as bursting capabilities to higher bandwidth when other circuits are idle. Providing a minimum would put a floor (committed) on the reservation bandwidth.
OK, I can see how it can provide a more flexible approach to delivering bandwidth. I must admit I doubt it will be used, but I am not a network expert.
The callback model makes it very non obvious to handle failures. - In general, dealing with network problems has not been thought through. - This becames fairly clear when we had problems establishing connection. There is nothing specified on how to continue from there (i.e., who carries the responsibility for propagating state updates).
Yes, the callback model is much more complex but was required based on the input requirements. We have discussed a number of strategies to handle these failure scenarios. First off we need to have retries on sending for requests, confirmed, and failed messages. Obviously, if we are having authentication failures no number of retries will fix the problem. If your retry timers run out, then you have not option but to toss the message. In this situation the requestingNSA may reissue the request again, or do a query operation to determine if the state of the state machine on the providerNSA. Similarly, the providerNSA may query the requesterNSA to see if their states match for the connection in question.
Ultimately it is the requester which is interested in the information, so I think it would make sense to have the main retry logic there. I am not sure the message retry would accomplish a lot except implementation complexity. I am essentially suggestion a fallback to polling, but almost every message based system (which we are somewhat emulating with the callback) falls back to that for error recovery. I'm just not overly thrilled with the aspect of having multiple ways for state propagation.
The TechnologySpecificAttributes does not seem to have any usage, except "future compatability", but are there no examples or use cases for them. - I suggest removal, and then adding specific fields if we need them later.
There are there specifically to let service providers add service specific attributes such as frame size, QoS, SLRG, etc parameters into a request. The protocol would not need to change, but the underlying implementation would to support the parameters. We did not want to revise the protocol every time a new service parameter was defined.
OK. Are the parameters then optional to understand, or must they all be understood. The latter would make sense, but the semantics is still unclear. (and some examples in the spec. would be nice).
The reservationConfirmed message includes all the reservation details. Is this really necessary? Couldn't it just be a simple acknowledgement.
The reason we included the reservation details in the message is that the original reservation request is a "space" that can be more fully qualified by the providerNSA when satisfying the request. For example, if I specify a number of mandatory and desired parameters in the reservation request, the reservationConfirmed will hold the parameters satisfied by the reservation. It is more work but is needed.
I am not convinced. I think Jerry hit the nail with the argument that we are trying to create something more high-level. When creating a connection a number of requirements are filled out. Either these requirements can be fulfilled or the reservation can be completed. E.g., when reserving a hotel room, the room number is not returned to you (sure you get it later for practical reasons), but you are interested in the service of having a place to sleep, not the number on the door.
I would consider adding a state to indicate that the connection failed for some reason. The terminated state is a bit broad in what it describes.
Yes, I had originally proposed that as well. I would also like to formalize the minimum length a reservation will remain the in the NSA when in the terminated state so that it can be queried.
I agree, having that information available for some time is necessary for this to make sense.
Having two ways of querying seems like one too many. Remember that both have to implemented. Is there a reason we cannot just have one?
There is only a single query message, however, it can go from requester to provider, or provider to requester. This was needed for some of the error handling cases. Is this what you mean?
No, I was referring to the Details/Summary distinction, which seems superflorous to me.
The term "NsiExceptionType" seem to have gotten into the spec. It should be called serviceException.
Service is a really overloaded term. Do you mean an NSI protocol exception or a service reservation exception?
In the NSI document, the term "NsiExceptionType" is mentioned, which sounds an awfull lot like something from the WSDL. Other places in the document it is referred to as serviceException.
The messageId in the serviceException should either have a number of options or not be there.
The agreement was that we would enumerate through the implmentation but not force a set into the XSD so we have flexibility.
OK, I can see the sense in not having it in the WSDL, but it should be in spec. at least.
- A possiblity could be to adopt the HTTP error codes.
That is a possibility.
Or least use them as a base. We might want some connection provision specific indicators.
The text and variables solution in serviceException seems like overkill. Why not just have the text there?
Here is an example that shows the additional flexibility. One generic "Invalid or missing parameter" error message and the parameters causing the issue in the variables.
<messageId>SVC0001</messageId> <text>Invalid or missing parameter</text> <variables> <Attribute Name="replyTo" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic"> <AttributeValue xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><null></AttributeValue> </Attribute> </variables>
OK, but wouldn't the messageId just be the code for the "Invalid or missing parameter", and hence just having one of those fields redundant.
WSDL:
xsd:dateTime allows value without a timezone, which is problematic. - I suggest that the protocol dictates that all protocol timestamps should be in zulu time (which is really the only sensible thing to send over a wire IMHO)
I did look up the dateType specification and am sorry I missed this. I always use the Java XMLGregorianCalendarImpl class which puts the timezone on by default. We will put this on the list of items to resolve.
Yes, it is certainly not ideal for something that is supposed to send over the wire. I would suggest we do not allow time zones either, and just deal with everything in UTC / zulu time over the wire.
connectionId is enforced as a UUID, which is not tune with the protocol spec. which specifies that the connectionId only has to be unique within the requester NSA scope.
Yes, we will change this. Local uniqueness only causes implementation issues with zero upside value. You then need to maintain tuples for uniqueness. NSI is complicated enough without this added complexity.
Sorry, change what to what. Change the WSDL to match the spec. or vice versa.
I've also compiled a list of issues which have been confusing people. The purpose of this list is simply have a spec which is easy(er) to implement, which IMHO is very important quality of standard (and one we are far from).
- The URN prefixes
We need to use namespaces to allow for flexibility when other namespaces are needed to be used. We must remember that a good protocol can be used flexibly.
Maybe we can just agree on allowing direct SSH access to everyones network equipment and we won't even have to do the protocol :-). It will be very, very flexible :-). OK, more serious. URNs are typically used to denote a "what" instead of "where" (URLs), and are typically used to either decouple location (URN -> URL resolving), or to have it possible to mix different types of resources. However we are usually very clear when denoting resources, e.g.,: <stpId>urn:ogf:network:stp:Martinique:M1</stpId> Do we plan on having anything else than an STP in the stpId element? I hope not.
- Requester / provider role fields
I was concerned with these originally as well. If we remove the replyTo and place it in topology definition as the csRequesterEndpoint then we will at least need the requesterNSA attribute.
Based on the current spec we should rename these fields if they do not hold an NSA URN.
- replyTo / addressing in general
As above. I think we need to maintain the flexibility to support both one and two endpoints.
Yearh, this pretty much falls into the "think about addressing" category. Though I am not really sure we need the two endpoint flexiblity.
- Reordering of messages from "logical order"
Does logical order refer to protocol order or order operations were issues. In any distributed highly parallel system message ordering is hard to maintain when multiple thread processing is involved. There are queuing strategies to handle some of this but the best mechanism is the requester serializing :-)
Yearh sure. But some abstractions are easier to work with than others :-)
- Bad error messages from SOAP/WSDL stacks (and probably other things as well)
Please provider some examples.
I send to reserve request to the AutoBAHN implementation (I think), end they got an "Error parsing BigInteger" or similar. Turned out that a bandwidth parameter was missing.
- XML/WSDL namespaces
You need to get your stack fixed :-)
Well, yes. The problem is that once one moves outside the Java world, good SOAP/WSDL stacks become very very sparse (perhaps C# should be included). In fact many languages does not have one. I think this is a big problem. SOAP and WSDL are certainly better than a lot of the alternatives (especially custom binary protocols), and the merits can be discussed endlessly (once sure does get a lot almost free when it is working / but the position is the opposite when it doesn't). It does however the set entry bar for NSI at an ackward position.
I have some comments/suggestions as well:
The URN prefixes are just prefixes. They do not add any value, and have been a source of confusion. I suggest we remove them.
Namespaces are needed for flexibility. If we remove them then NSI can only work with the naming structures we define. I really want to avoid this if possible. You should not even be looking into them anyways. String match only. I proposed a label for display name in the topology file so we can have both uniqueness and something for people to display in GUIs.
See comment further up. I still don't see how they add value (now or later).
I'm not sure the requester/provider role fields are really necessary. It should be clear from the security context (I'll get back to that), who it is one is communicating with.
This does need discussion.
An interesting remark: A lot of people confused these with networks instead of NSA agents. I think the fields could be replaced with the networks. But this raises the issues if they are even needed. What is needed is a way to identify the entity calling you, but this should really be a security thing. If someone/something contacts manages to contact you, but intended something else; chances are they won't get the provider field correct either :-).
Reordering of messages from "logical order". I still think the protocol design is a bit clumsy, especially when combined with the lack of how to handle network errors (unavailable hosts, etc.) and short-lived clients. I've been thinking a bit about it, but haven't really come up with any substantial.
Please clarify "logical ordering" and we can discuss.
I am referring to the sitation where a reservationConfirmed is received before the reservation ACK is received and similar situations.
Some people (me included, if not especially) have struggled quite a bit with their SOAP/WSDL stacks, and the lack of checks and bugs in there. However several people have also been puzzled with the error messages from them .e.g., getting an integer parsing error when an element was simply missing.
Might be an issue with a mandatory field not being provided and the error handling of your stack being coded by a bunch of monkeys :-)
Well, that would certainly explain a lot of things. The error message in question came from one of the common stacks in Java (the one the AutoBAHN people are using), not the one I was using. I am not just taking up situations which have troubled me (some of these hasn't at all), but in general, these issues are things I've observed.
Somewhat similar, I had an issue where my SOAP stack used the wrong namespace on an element. The issue here was no so much the bug in the SOAP stack (it happens), but that I could not figure out what was right and wrong by looking at the WSDLs.
Takes an experienced eye for these things. Sorry I couldn't help earlier to identify the problem.
It is not really your fault, but more the combination of WSDL complexity mixed with too few experts. However it could have been remedied very easily if there was examples of the message payloads or similar available.
Security. Let me be very clear here. HTTP basic and some SAML attributes have nothing to with security. It did not provide integrity, confidentiality, or assurance. I am also puzzled by the choice of SAML, as SAML is intended for communication between identity providers and services, but there are no identity providers in NSI. Also, I don't think that anyone in this group actually understands SAML (I don't). My suggestion for security is to use TLS with certificates (from a recognized CA) in each end, and nothing more. It is not the most trivial thing in the world (but isn't really difficult either), but it fairly well understood and has widespread support.
There are two solutions required here. The first is NSA-to-NSA security and the second is end user "session" security. Each as different requirements and solutions. The proposed security solution we have discussed and agreed upon for NSA-to-NSA security is:
1. TLS with mutual authentication for encryption and confidentiality (and transport authentication).
Yes please.
2. HTTP Basic authentication for authentication. Yes this seems like double the effort, but BASIC is supported in software stacks via JAAS. TLS certificates are not typically supported for application level security, however, there are ways to get access to it.
HTTP Basic will not really provide adequate security for anything. It does not protect against tampering or replay and does not provide confidentiality. Once a single request with the Authorization header has been issued is provides exactly as much "security" as if it wasn't there at all.
3. SOAP digital signatures for message integrity.
Kill me now :-).
The sessionSecurityAttr hold authentication and authorization information for the end user. There is a security proposal document that describes the use of this element, they types of roles supported, and how certificates could be past for integration into existing security solutions for the end user. We do not use these fields for NSA authentication.
OK. But is more than the user identity needed, and is it needed at all. The important question is if we trust another NSA to make a reservation or not. Sure we can include some metadata, but could we call it something without the security word :-).
We need to push on to SuperComputing with a better coordinated effort. I know many people are taking a deep breath and will spend the next week catching up on work they ignored over the last couple of weeks. I will try to kick off the more organized effort next week.
Sounds good. Best regards, Henrik Henrik Thostrup Jensen <htj at ndgf.org> NORDUnet / Nordic Data Grid Facility.
participants (2)
-
Henrik Thostrup Jensen
-
John MacAuley