Topology & Path Finding for SURFnet & NORDUnet BOD Production Service

* Overview

The topology core is still NML, but a few things are added to the NSI-NML
extension and a number of restrictions are introduced. This builds up a system,
where path finding/connection construction is done in a similar way to how
OSPF/BGP works. The core path finding algorithm relies on chaining and mutual
trust between NSAs to setup connections.

The examples shown does not necessarily map to the real world, but are kept
relatively close for illustrative purposes. Schema compatibility of the
examples have not yet been checked.


* Peering

Path finding must - for obvious reasons - go along the data plane. Trust goes
along the control plane - again for obvious reasons. The model presented here,
asserts that data and control plane peering goes hand in hand. While two
networks, that are not adjacent could trust each other, there is no need for it
in the model presented. The model does not expose peerings through topology,
but instead advertises reachability of topology. The core idea being that
reachability of ports and topology is what matters, and not specific control
plane peerings.


* Ports & Aggregation

A rather interesting issue that came out of driving NML into NSI is the way
ports are addressed. NSI uses the STP concept, which is a combination of a
network id (which is globally unique) and a local id (which is scoped inside
the network id).

NML on the other hands assigns a global id to a port. Ports are part of a
topology, which is a grouping of ports. There is no way to infer the topology
id from the port, other than acquiring and parsing the topology representation.
This becomes problematic as NML describes port-to-port relationship for
demarcation. Hence the path finder must know the topology that contains the
remote port. Furthermore unlisted ports become problematic as they cannot be
associated with any topology. As there can be a very high number of ports
listing them all can be impractical; we cannot use a model that assumes
knowledge of all ports. Hence we want to a topology model that does not
requires announcement of individual ports, and can work without knowing the
global set of ports.

To deal with these issues, we restrict the naming of port and topology
identifiers in such a way that a topology name can be matched against the port
name: The topology id most be a prefix of the port id. Like this:

<nml:Topology id="urn:ogf:network:nordu.net:topology">
  <nml:PortGroup id="urn:ogf:network:nordu.net:topology:ps-in" />

And not like this (which has been the norm so far):

<nml:Topology id="urn:ogf:network:nordu.net:topology">
  <nml:PortGroup id="urn:ogf:network:nordu.net:ps-in" />

It is possible to announce multiple topologies if required:

<nml:Topology id="urn:ogf:network:nordu.net:europe">
  <nml:PortGroup id="urn:ogf:network:nordu.net:europe:sunet-fre" />
<nml:Topology id="urn:ogf:network:nordu.net:usa">
  <nml:PortGroup id="urn:ogf:network:nordu.net:usa:starlight" />

This strategy only shifts the problem though. From knowing the complete port ->
topology mapping, the NSA most know the topology of port, and which of its
peerings NSAs can provide a connection to the topology. Note that the topology
id cannot be inferred from a port id. The NSA most know the topology id (this
issue is handled in the next section).

If an NSA does not know the topology in question, it can either reject the
request, or set up a circuit towards a default provider, and let the NSA of
default provider setup the connection. I.e., Jive might use SURFnet as default
provider, and SUNET use NORDUnet. This concept is similar to that of a default
route.

I suggest the following further restrictions:

Prefix matching is limited to one level. Topology ids should be constructed in
a way that they will not overlap, such that a topology id does not become a
prefix of another topology id. The main argument for this is simplicity.

Using the URI naming restriction suggested in "Validating URNs in NML" email,
send to the nsi-wg list on 28 Oct 2013, and further removing the year/date
constraint, as there is no practical way to enforce it.


* Reachability & Cost

Having created a way to match port ids against topology ids, we create a
mechanism for signalling topology reachability. Further the notion of cost is
introduced as a way of choosing the "cheapest" path when multiple paths are
available. A lower cost value should be preferred over a large one. It is not
allowed to add negative values or zero values. This prevents negative or same
cost cycles in the routing graph. A cost value is integer, and alway more than
zero. Cost is not necessarily linked any monetary cost, but is intended as a
mechanism to indicate which links should be chosen over others. Also note that
cost is not necessarily symmetric.

Reachability and cost are exposed in the topology document, like this:

<nsi:NSA id="urn:ogf:network:nordu.net:nsa" ... >
    ...
    <snt:TopologyReachability>
        <nml:Topology id="urn:ogf:network:sunet.se:topology"  snt:cost=5>
        <nml:Topology id="urn:ogf:network:deic.dk:topology"   snt:cost=10>
    </snt:Relation>
    ...
    <nml:Topology id="urn:ogf:network:nordu.net:topology>
        ...
    </nml:Topology>
    ...
</nsi:NSA>

The NORDUnet and SURFnet NSA peers, which means they have demercation link, and
that they allow their NSAs to create circuits at each other (specific ports are
still subject to authZ policy). The SURFnet NSA retrieves the NORDUnet NSI
topology, and in turn announces reachability. Further a cost of 5:

<nsi:NSA id="urn:ogf:network:surfnet.nl:nsa" ... >
    ...
    <snt:TopologyReachability>
        <nml:Topology id="urn:ogf:network:nordu.net.nl:topolgy" snt:cost=5>
        <nml:Topology id="urn:ogf:network:sunet.se:topology"    snt:cost=10>
        <nml:Topology id="urn:ogf:network:deic.dk:topology"     snt:cost=15>
    </snt:Relation>
    ...
</nsi:NSA>

This allows NSAs peering with the SURFnet to discover reachability of the DeIC,
SUNET, and NORDUnet topology. Note that cost is not necessarely bidirectional
as the NORDUnet NSA could use something else for SURFnet, but it might be a
good idea to agree on a value when setting up a peering. If an NSA can reach a
topology through multiple paths, it should advertise the lowest option, and
should prefer that when setting up a connection.


* Path Finding Example

Consider the following topology (add proper figure sometime):

   SUNET
     |
  NORDUnet \
     |     GEANT
  SURFNET  /
     |
    JIVE

A user at JIVE wants to setup a link from SUNET to JIVE, and makes a request to
the local NSA at JIVE. The NSA at JIVE is (in this example) a relatively simple
one, that does not concern itself with outside topology, and hence does not
know a path to SUNET. However JIVE is only connected through SURFnet, so the
NSA reserves a local connection and forwards the request to the SURFNET NSA
(with a new port). This corresponds to default route.

The SURFnet NSA peers with NORDUnet and GEANT, and can reach SUNET through both
of them. The connection with NORDUnet is a private link whereas GEANT is a
transit network. Hence the SURFnet has most likely added a higher value to the
GEANT topology, and will therefore choose the path to NORDUnet, as it has a
lower cost. The SURFnet NSA reserves a connection from the JIVE demarcation
port to a NORDUnet demarcation port, and forwards the request to the NORDUnet
NSA.

The NORDUnet NSA only has a single route to SUNET, and sets up the connection
from SURFnet port to a SUNET port, and forwards the request to the SUNET NSA.
Finally the SUNET NSA connects the NORDUnet demarcation port to the endpoint.
After the completed reservation is relayed back to the JIVE NSA, the connection
reservation can be committed.

Note that only the SUNET NSA knows if endpoint actually exists. The other NSAs
use topology reachability information to forward the request. Also note that
access to the SUNET endpoint is likely to require certain credentials in the
NSI header. These credentials are forwarded during the path finding. In general
a network will allow transit from a peer, without any specific credentials.
This is covered in details in the AAI document.

A potential downside of chaining, is that the structure becomes less
transparent. Hence it is important that NSAs relays error information and
source, so a user can figure out what went wrong in case of an error.


* Summary of Changes to the NSI-NML extension:

- Restrict topology ids and port ids,such that topology ids are always a prefix
  of the port name

- Add topology reachability and cost elements.

- As a side issue, the NSA peering entry is now redundant and can be removed.

Further suggestions:

- Limit prefix matching to one level.

- Restrict URI naming as suggested in "Validating URNs in NML" email.
  Remove the year/date constrain in ORGID for URNs.


* Outstanding Issues (NML & Usability)

Note: Terminology is different for NSI and NML:

     NSI          NML
  network id = topology id
  local id   = port id

We did not touch this at the meeting, so we have no consensus about this, but
I'll try to present the problem. OpenNSA is used by several other sites, and
hence I have had the pleasure of explaining them how to specify STPs, which is
currently significantly more complicated than it needs to be. For an STP, one
will have to specify topology and port ids, which can be somewhat long, e.g.:

network id : urn:ogf:network:nordu.net:2013:topology
local id   : urn:ogf:network:nordu.net:2013:ps

The "urn:ogf:network:" prefix can be added automatically, so we are down to:

network id : nordu.net:2013:topology
local id   : nordu.net:2013:ps

With the prefix requirement it would be something like:

network id : nordu.net:2013:topology
local id   : nordu.net:2013:topology:ps

With the prefix matching, the network id becomes more or less redundant,
however it cannot be inferred from the local id. If we restrict the local id to
not contain ':', the network id can be inferred from the local id (by splitting
on the last ':'). This enables user to only specify
"nordu.net:2013:topology:ps-in" and have the tool fill out the network id
Further the year can be dropped, as suggested earlier and on the NMl and NSI
mailing list. This reduces the user input to "nordu.net:topology:ps".