Hi I've accumulated a bunch of issues that I've found with the NSI2 CS protocol and NML toplogy, as I've worked with over the last years. * Provision/Release/Terminate At a point in time we removed the provisionFailed, releaseFailed, and terminateFailed messages. I think this was with the reason that these actions weren't really allowed to fail. This is mostly true, with the exception that with security checks the requests can fail. There is no way to signal this back to the client AFAICT. * State machine Due to the above issue, aggregators can be stuck in "Provisioning" and "Releasing", with agents further down not wanting to move into provisioning due to missing credentials. I have solved this by adding loop transitions, so that the following are possible: Provisioning -> Provisioning, on provision request Releasing -> Releasing, on release request A second issue is that on abort/timeout it is possible for the state machine to enter a state where it is not possible to see if resources are allocated or not (rather interesting, as this should really be the main purpose of the state machine - I think we focussed to much in messages, and not resource lifecycle). I solved this by adding an internal flag to indicate if resources are allocated or not. * Security/Identity flaws The NSA identities are not tied to subject names in X.509 certificates, and there is no way to authenticate them. Inventing new ways of identifying services/hosts that isn't strictly tied into a CA or DNS is a bad idea. The whole user/group/organization SAML thing we have is pretty much useless as there is not good way to authenticate them. The only proper security mechanisms I can see is X.509 certificates and tokens ala OAuth2. * No notification mechanism Notification are only provided the requester. All other have to issue query calls. This makes it impossible to get continous updates (long poll / callbacks). This is very relevant for adminstration tools / portals. * Security headers Currently an NSI agent currently has to keep the body of the messages for future inspection (query). Messages (mainly the http header or soap header), can contain credentials such as tokens which should not be kept after they have been verified. We need a statement saying that the HTTP and SOAP header should be discarded. Ideally we just have a log of events, and when they occured and who did it, instead of keeping full message bodies. * Single-label source/destination (and STPS) The idea that source and destination should be modelled as STPs is probably wrong. There is some impedance mismatch at least. If a flow comes in with ethernet+vlan+mpls it should be modelled as such. Not just as having a VLAN xor MPLS. This also makes adaptation quite tricky. * Static Port List The idea that we can list all ports in a network (NML or not) isn't realistic. In GTS/GVS, VMs are spawned dynamically, creating a new interface on the host machine. This is a new STP. Sure it will probably get a VLAN on the circuit of the VM host, but the endpoint is a logical interface, and can have labels on it, e.g. Q-in-Q where vlans of the VM are connected to difference places. This means that topology should not try and model all possible endpoints (because it can't), but only nodes/domains and the connectivity between these. * Unidirectional Circuits I've helped a lot of sites getting OpenNSA up and running, and have typically helped out with configuration, to get them up and runnign. The most recurring issue is the topology configuration. In particular getting the unidirectional ports right is the thing that almost always eats the most time. This combined with the fact that no one uses undirectional circuits means that we have a lot of complexity with zero benefits. * Callbacks This isn't really a design issue, but having SOAP and callbacks is just a huge amount complexity for protocol and state keeping. It is immensely error prone, and makes client implementations quite complicated which is hinder usage tremendously. I've implemented a REST interface in OpenNSA. It doesn't have 100% feature parity, but it implements additional functionality like long-polling for noticiations and auto provision/commit. The protocol stack is roughly 1/10 lines of code compared to the SOAP interface. Best regards, Henrik Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
Hello Team, Do we need a call to go through these points? Guy -----Original Message----- From: nsi-wg [mailto:nsi-wg-bounces@ogf.org] On Behalf Of Henrik Thostrup Jensen Sent: 02 February 2017 13:12 To: NSI Working Group <nsi-wg@ogf.org> Subject: [Nsi-wg] List of NSI2 Issues Hi I've accumulated a bunch of issues that I've found with the NSI2 CS protocol and NML toplogy, as I've worked with over the last years. * Provision/Release/Terminate At a point in time we removed the provisionFailed, releaseFailed, and terminateFailed messages. I think this was with the reason that these actions weren't really allowed to fail. This is mostly true, with the exception that with security checks the requests can fail. There is no way to signal this back to the client AFAICT. * State machine Due to the above issue, aggregators can be stuck in "Provisioning" and "Releasing", with agents further down not wanting to move into provisioning due to missing credentials. I have solved this by adding loop transitions, so that the following are possible: Provisioning -> Provisioning, on provision request Releasing -> Releasing, on release request A second issue is that on abort/timeout it is possible for the state machine to enter a state where it is not possible to see if resources are allocated or not (rather interesting, as this should really be the main purpose of the state machine - I think we focussed to much in messages, and not resource lifecycle). I solved this by adding an internal flag to indicate if resources are allocated or not. * Security/Identity flaws The NSA identities are not tied to subject names in X.509 certificates, and there is no way to authenticate them. Inventing new ways of identifying services/hosts that isn't strictly tied into a CA or DNS is a bad idea. The whole user/group/organization SAML thing we have is pretty much useless as there is not good way to authenticate them. The only proper security mechanisms I can see is X.509 certificates and tokens ala OAuth2. * No notification mechanism Notification are only provided the requester. All other have to issue query calls. This makes it impossible to get continous updates (long poll / callbacks). This is very relevant for adminstration tools / portals. * Security headers Currently an NSI agent currently has to keep the body of the messages for future inspection (query). Messages (mainly the http header or soap header), can contain credentials such as tokens which should not be kept after they have been verified. We need a statement saying that the HTTP and SOAP header should be discarded. Ideally we just have a log of events, and when they occured and who did it, instead of keeping full message bodies. * Single-label source/destination (and STPS) The idea that source and destination should be modelled as STPs is probably wrong. There is some impedance mismatch at least. If a flow comes in with ethernet+vlan+mpls it should be modelled as such. Not just as having a VLAN xor MPLS. This also makes adaptation quite tricky. * Static Port List The idea that we can list all ports in a network (NML or not) isn't realistic. In GTS/GVS, VMs are spawned dynamically, creating a new interface on the host machine. This is a new STP. Sure it will probably get a VLAN on the circuit of the VM host, but the endpoint is a logical interface, and can have labels on it, e.g. Q-in-Q where vlans of the VM are connected to difference places. This means that topology should not try and model all possible endpoints (because it can't), but only nodes/domains and the connectivity between these. * Unidirectional Circuits I've helped a lot of sites getting OpenNSA up and running, and have typically helped out with configuration, to get them up and runnign. The most recurring issue is the topology configuration. In particular getting the unidirectional ports right is the thing that almost always eats the most time. This combined with the fact that no one uses undirectional circuits means that we have a lot of complexity with zero benefits. * Callbacks This isn't really a design issue, but having SOAP and callbacks is just a huge amount complexity for protocol and state keeping. It is immensely error prone, and makes client implementations quite complicated which is hinder usage tremendously. I've implemented a REST interface in OpenNSA. It doesn't have 100% feature parity, but it implements additional functionality like long-polling for noticiations and auto provision/commit. The protocol stack is roughly 1/10 lines of code compared to the SOAP interface. Best regards, Henrik Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet _______________________________________________ nsi-wg mailing list nsi-wg@ogf.org https://www.ogf.org/mailman/listinfo/nsi-wg
Hi Just to keep this list updated: On Thu, 2 Feb 2017, Henrik Thostrup Jensen wrote:
* Provision/Release/Terminate
At a point in time we removed the provisionFailed, releaseFailed, and terminateFailed messages. I think this was with the reason that these actions weren't really allowed to fail. This is mostly true, with the exception that with security checks the requests can fail. There is no way to signal this back to the client AFAICT.
* State machine
Due to the above issue, aggregators can be stuck in "Provisioning" and "Releasing", with agents further down not wanting to move into provisioning due to missing credentials. I have solved this by adding loop transitions, so that the following are possible:
Provisioning -> Provisioning, on provision request Releasing -> Releasing, on release request
Terminating -> Terminating is also needed, so that terminate() can be re-issued in case of failures. It is the exact same issue as with provisioning/releasing. I have a user who have been seeing these occasionally and we got them solved today by allowing this transition, so that terminate() could be re-issued. Much easier than mucking around in the database and cleaning up routers manually. Best regards, Henrik Henrik Thostrup Jensen <htj at nordu.net> Software Developer, NORDUnet
participants (2)
-
Guy Roberts
-
Henrik Thostrup Jensen