
On Sep 22, 2008, at 9:58 AM, Jeroen van der Ham wrote:
Hi Aaron,
Aaron Brown wrote:
Martin, Jeroen, Aaron, John, Victor, others: what do you think? For the class definitions themselves, I think it makes sense to use URIs a la namespaces so we could put some documentation at the specified URL.
Just to make this absolutely clear: by "URIs a la namespaces" you mean to use a URL, abbreviated as a namespace? (So "nml:Node" where the namespace nml has been defined previously). It's a little bit confusing because both URNs and URLs are URIs...
Correct, the URL style. mistyped when I wrote it.
For the identifiers for individual instances, I think the URNs make more sense since it doesn't imply a specific method of access to get information about the element.
Now this I do not agree with, because it would mean that the OGF starts administrating its urn:ogf namespace, and handing out specific subsets to domains, with all associated registration and possible squatting problems. Domains already have a domain name, so why not use that?
We do use the domain name. The URN identifiers take the form: urn:ogf:network:domain=glif.is:[domain-specific-chunk] This does minor standardization to allow domain scoping while giving flexibility to the domain to define their own identifiers. The instance identifiers is probably all be out of scope for the NML. The choice of an identifier scheme is really about the lookup and distribution of the topologies. Since we're focused on just describing the topologies themselves, and not specifically concerned about distribution, we can probably just leave it at "use URIs for identifiers", and let the higher-level groups like the like the NMC or the DICE-WG or whoever, decide on the actual structure of the identifiers. Cheers, Aaron

Aaron Brown wrote:
urn:ogf:network:domain=glif.is:[domain-specific-chunk]
This is an identifier of an *instance*, I was talking about identifier of *classes*. E.g. in NDL: "http://www.science.uva.nl/research/sne/ndl/#Device" is an identifier of a class, while "http://uva.netherlight.nl/#Lighthouse" is an identifier of an instance (of a Domain in this case). Again, for class identifier, I don't have a preference for either URL or URN in some OGF namespace (urn:ogf:network, or http://ogf.org/ns/network, or similar). As for instance identifiers, I have two major objections against "urn:ogf:network:domain=glif.is:3267" 1. This looks like a *query* to me, not an *identifier*. I associate "domain=glif.is:3267" with a SELECT clause. This confuses me. I really like queries though, but don't see how that will fit here. Imagine I want to query for this identifier. Should I write "WHERE identifier=urn:ogf:network:domain\=glif.is:3267"? Better is: "urn:ogf:network:domain:glif.is:3267" 2. I don't see the need for the prefix "urn:ogf:network". The actual identifying part is "glif.is:3267". The "urn:ogf:network:domain:" is only there to set the type. However, the type (network in this case) is probably clear from the context. Imagine that all DNS identifier had to be prefixed with "dns:". So, we wou;d have to type "protocol:http://dns:www.google.com.path:/" (or worse: "urn.ietf.url:protocol=http:dns=www.google.com.:path=/"). Better: remove the type information: "glif.is:3267" (see the resemblance with Internet2' GRI identifiers here?) So the best choice in my view is to use "glif.is:3267" as an identifier. It is short. It is unique. It is transparent to a program (a unique string). It is easy to query ("WHERE domain STARTS WITH "glif.is:". It is human readable -- in short, it is all an identifier has to be. Why make it more complex? Regards, Freek

On Sep 23, 2008, at 4:41 PM, Freek Dijkstra wrote:
Aaron Brown wrote:
urn:ogf:network:domain=glif.is:[domain-specific-chunk]
This is an identifier of an *instance*, I was talking about identifier of *classes*.
Right. I thought Jeroen was talking about instances in what I was responding to.
Again, for class identifier, I don't have a preference for either URL or URN in some OGF namespace (urn:ogf:network, or http://ogf.org/ns/network , or similar).
Nor I.
As for instance identifiers, I have two major objections against "urn:ogf:network:domain=glif.is:3267 "
1. This looks like a *query* to me, not an *identifier*. I associate "domain=glif.is:3267" with a SELECT clause. This confuses me. I really like queries though, but don't see how that will fit here. Imagine I want to query for this identifier. Should I write "WHERE identifier=urn:ogf:network:domain\=glif.is:3267"? Better is: "urn:ogf:network:domain:glif.is:3267"
2. I don't see the need for the prefix "urn:ogf:network". The actual identifying part is "glif.is:3267". The "urn:ogf:network:domain:" is only there to set the type. However, the type (network in this case) is probably clear from the context. Imagine that all DNS identifier had to be prefixed with "dns:". So, we wou;d have to type "protocol:http://dns:www.google.com.path:/" (or worse: "urn.ietf.url:protocol=http:dns=www.google.com.:path=/"). Better: remove the type information: "glif.is:3267" (see the resemblance with Internet2' GRI identifiers here?)
So the best choice in my view is to use "glif.is:3267" as an identifier. It is short. It is unique. It is transparent to a program (a unique string). It is easy to query ("WHERE domain STARTS WITH "glif.is:". It is human readable -- in short, it is all an identifier has to be. Why make it more complex?
There are reasons for a bulkier, less context-sensitive identifier scheme, but I'm not sure the NML list is the right place to hash this out since the identifier schemes are relevant more for lookup and distribution than basic description. For the sake of NML, i'd prefer to leave it at "identifiers are globally unique strings". Cheers, Aaron

Hi all,
Again, for class identifier, I don't have a preference for either URL or URN in some OGF namespace (urn:ogf:network, or http://ogf.org/ns/network , or similar).
I agree with URLs to identify classes. As Aaron said, we'd only been thinking of the URN style as instance identifiers.
As for instance identifiers, I have two major objections against "urn:ogf:network:domain=glif.is:3267 [...] So the best choice in my view is to use "glif.is:3267" as an identifier. It is short. It is unique. It is transparent to a program (a unique string). It is easy to query ("WHERE domain STARTS WITH "glif.is:". It is human readable -- in short, it is all an identifier has to be. Why make it more complex?
The real reason to type it is to you can refer to different types. In the GLIF proposal, the thing after the domain (which you didn't include in your example, but which is there if I'm reading it right) is a circuit identifier, and only that. With types, and edge can refer to a distant node using as much information as it has. One side might only know the domain of the other side, or it might know the node and port IDs. That was one of the use cases that motivated the attribute/value style in the URN.
There are reasons for a bulkier, less context-sensitive identifier scheme, but I'm not sure the NML list is the right place to hash this out since the identifier schemes are relevant more for lookup and distribution than basic description. For the sake of NML, i'd prefer to leave it at "identifiers are globally unique strings".
I think that it is in scope for NML. We might end up with more than one way to do it, but I think that leaving it as just a string limits what we can do. best, martin

On Sep 23, 2008, at 4:11 PM, Martin Swany wrote:
There are reasons for a bulkier, less context-sensitive identifier scheme, but I'm not sure the NML list is the right place to hash this out since the identifier schemes are relevant more for lookup and distribution than basic description. For the sake of NML, i'd prefer to leave it at "identifiers are globally unique strings".
I think that it is in scope for NML. We might end up with more than one way to do it, but I think that leaving it as just a string limits what we can do.
First, I'm against coming up with multiple ways to do it, unless the multiple ways are based on completely different uses of the schema that will never interact. (Or unless we absolutely can't agree - which would seem a shame.) Lets not make interoperability any harder than we have to. Second, I agree that it is in-scope for NML. However, it is just as in- scope for the other groups Aaron mentioned. At a bare minimum the other groups Aaron mentioned (and probably others) should be invited in on the discussion. jeff

On Sep 23, 2008, at 6:47 PM, Jeff W. Boote wrote:
First, I'm against coming up with multiple ways to do it, unless the multiple ways are based on completely different uses of the schema that will never interact. (Or unless we absolutely can't agree - which would seem a shame.) Lets not make interoperability any harder than we have to.
You're presuming a lot regarding our ability to agree. If we need to have one way to do it for NDL and GLIF compatibility and one way for perfSONAR and DCN, and there are ways to tell them apart and to translate between them, if necessary, then I don't see the harm. Agreeing on a small set is better than no agreement at all.
Second, I agree that it is in-scope for NML. However, it is just as in-scope for the other groups Aaron mentioned.
At a bare minimum the other groups Aaron mentioned (and probably others) should be invited in on the discussion.
They're been invited to the discussion as far as I know! If they haven't been, then let's invite them! martin

Aaron Brown wrote:
There are reasons for a bulkier, less context-sensitive identifier scheme, but I'm not sure the NML list is the right place to hash this out since the identifier schemes are relevant more for lookup and distribution than basic description.
What do you mean by distribution in this case?
For the sake of NML, i'd prefer to leave it at "identifiers are globally unique strings".
I compeletely agree with that. I've tried to hammer this point down at a GLIF meeting last year as well. I am all for leaving the form of the identifiers up to the people creating them. There are lots and lots of ways to create a globally unique identifier and everybody has their preference. I really don't care what they choose as long as it is *globally unique*. But, I do have to add one restriction to that clause. The globally unique string should be just an identifier, nothing more. That means no implicit type information, no implicit location information, no implicit source information, nothing. The globally unique string must identify a resource, about which more things can be stated using NML. Jeroen. -- My email address has changed to <vdham@uva.nl> (The science has disappeared from my address, but I'm still doing it)

On Sep 24, 2008, at 2:08 AM, Jeroen van der Ham wrote:
Aaron Brown wrote:
There are reasons for a bulkier, less context-sensitive identifier scheme, but I'm not sure the NML list is the right place to hash this out since the identifier schemes are relevant more for lookup and distribution than basic description.
What do you mean by distribution in this case?
If this is going to be used in a global context, and not just from a local one - then the information that it 'exists' may need to be propagated to other places. (Consider path finding.)
For the sake of NML, i'd prefer to leave it at "identifiers are globally unique strings".
I compeletely agree with that. I've tried to hammer this point down at a GLIF meeting last year as well. I am all for leaving the form of the identifiers up to the people creating them. There are lots and lots of ways to create a globally unique identifier and everybody has their preference. I really don't care what they choose as long as it is *globally unique*.
They become much more useful if everyone is using the same algorithms to produce them.
But, I do have to add one restriction to that clause. The globally unique string should be just an identifier, nothing more. That means no implicit type information, no implicit location information, no implicit source information, nothing. The globally unique string must identify a resource, about which more things can be stated using NML.
Here I disagree 100%. That is like saying that FQDN's should not have any structure or implicit type information etc... If a circuit exists, but no one can find it, is it useful? Obviously you could come up with a central repository of all circuit id's and relate them to location information. But, that has pretty annoying scaling issues. Especially, when we have a fairly clear topological model available for how to distribute this information. jeff

Jeff W. Boote wrote:
Here I disagree 100%. That is like saying that FQDN's should not have any structure or implicit type information etc...
Ah! So what actually underlies this whole discussion is the issue of naming and addressing. Please, take the time to read: http://ana-3.lcs.mit.edu/~jnc/tech/ien/ien19.txt The important point there is:
The 'name' of a resource indicates *what* we seek,
an 'address' indicates *where* it is, and
a 'route' tells us *how to get there*.
But to come back to FQDNs, yes, they do have a very slight amount of implicit information. They contain exactly enough information so that, given a root server, you can resolve the *address* that is associated with that. Using the addres, you can then figure out a route to it, and actually get some useful information. Now, with FQDNs there is no choice, the lookup information has to be contained within that same term. I argue that in NML there is no need for this. The lookup part, or any other kind of implicit information, can be given using a separate property. The thing is, you are not going to see NML identifiers being handled out of context. Computer programs are going to need the context to make sense of it all. Jeroen.

On Sep 24, 2008, at 11:00 AM, Jeroen van der Ham wrote:
Jeff W. Boote wrote:
Here I disagree 100%. That is like saying that FQDN's should not have any structure or implicit type information etc...
Ah! So what actually underlies this whole discussion is the issue of naming and addressing. Please, take the time to read: http://ana-3.lcs.mit.edu/~jnc/tech/ien/ien19.txt
Not at all. I think you misunderstand my point. I do not for a moment want to combine names and addresses. That is a useful and needed layer of indirection that I want to make use of. From example #1 in that document - the first step is to look up the name in the phone book. My point is that I don't want a single phone book because that does not scale well and does not allow each 'publisher' the ability to control who they share addresses with. Therefore, the name needs to have enough information to direct you to the correct phone book. This is not about finding the address (yet), it is about finding the correct phone book.
The important point there is:
The 'name' of a resource indicates *what* we seek, an 'address' indicates *where* it is, and a 'route' tells us *how to get there*.
But to come back to FQDNs, yes, they do have a very slight amount of implicit information. They contain exactly enough information so that, given a root server, you can resolve the *address* that is associated with that. Using the addres, you can then figure out a route to it, and actually get some useful information.
Now, with FQDNs there is no choice, the lookup information has to be contained within that same term. I argue that in NML there is no need for this. The lookup part, or any other kind of implicit information, can be given using a separate property. The thing is, you are not going to see NML identifiers being handled out of context. Computer programs are going to need the context to make sense of it all.
With FQDNs, the important thing to find out is the authoritative-NS. That is what the structure of the names gives you. Then you can query to find the address. (I ignore recursive queries in this because what is really important is who controls the information, not specifically where the client actually queries.) The kind of structure I'm looking for in the names (circuit identifiers) is the ability to find the correct publisher of the information. The ability to know *where* to look up the address.* We can of course create a global directory (in fact, we are). However, without some structure in the names - you loose the ability to allow specific institutions to more tightly control the flow of information. For example, with no structure in the names you either need a DHT or a single global directory. This means if you want to publicize the name using this structure, you have no control over *where* that information flows to. However, if you add structure to the name. You can say: This name was allocated by entity X and you have to go to the directory that holds the name<->address mappings for entity X. This allows entity X to implement policy controls on queries for that information(name/address). Another possibility here is of course to add *another* layer of indirection. Basically use a DHT to publish the phone book for each identifier. But, that is adding a few too many RTT's and infrastructure for my liking. At the end of the day, this should all be about coming up with the best set of engineering trade-offs. jeff * We will also need to make reverse mapping queries as well: given the full definition of a circuit, come up with the identifier. Because of this, it is likely that some of the same structure that defines the circuit can be useful in defining the identifier. i.e. We will want to point queries at the same phone book for forward queries as reverse ones since the publisher is the same entity. Just because some of the structure looks similar doesn't mean we are not maintaining the distinction between name and address.

Jeff W. Boote wrote:
On Sep 24, 2008, at 11:00 AM, Jeroen van der Ham wrote:
Jeff W. Boote wrote:
Here I disagree 100%. That is like saying that FQDN's should not have any structure or implicit type information etc...
Ah! So what actually underlies this whole discussion is the issue of naming and addressing. Please, take the time to read: http://ana-3.lcs.mit.edu/~jnc/tech/ien/ien19.txt
Not at all. I think you misunderstand my point. I do not for a moment want to combine names and addresses. That is a useful and needed layer of indirection that I want to make use of.
From example #1 in that document - the first step is to look up the name in the phone book. My point is that I don't want a single phone book because that does not scale well and does not allow each 'publisher' the ability to control who they share addresses with. Therefore, the name needs to have enough information to direct you to the correct phone book. This is not about finding the address (yet), it is about finding the correct phone book.
Okay, I agree that finding the correct phone book is the next step once you have an identifier. However, unlike FQDNs, we can give the location of the phonebook along with the identifier that we're sending. There is no necessity to encode that information in the identifier itself. Note that this also allows you to have two completely different identifiers, each defined by their own domain, yet pointing to the same thing (and they'll probably have an equality relation to each other as well). Jeroen.

On Sep 24, 2008, at 3:45 PM, Jeroen van der Ham wrote:
Jeff W. Boote wrote:
Jeff W. Boote wrote:
Here I disagree 100%. That is like saying that FQDN's should not have any structure or implicit type information etc...
Ah! So what actually underlies this whole discussion is the issue of naming and addressing. Please, take the time to read: http://ana-3.lcs.mit.edu/~jnc/tech/ien/ien19.txt Not at all. I think you misunderstand my point. I do not for a moment want to combine names and addresses. That is a useful and needed layer of indirection that I want to make use of. From example #1 in that document - the first step is to look up the name in the phone book. My point is that I don't want a single
On Sep 24, 2008, at 11:00 AM, Jeroen van der Ham wrote: phone book because that does not scale well and does not allow each 'publisher' the ability to control who they share addresses with. Therefore, the name needs to have enough information to direct you to the correct phone book. This is not about finding the address (yet), it is about finding the correct phone book.
Okay, I agree that finding the correct phone book is the next step once you have an identifier. However, unlike FQDNs, we can give the location of the phonebook along with the identifier that we're sending. There is no necessity to encode that information in the identifier itself.
Note that this also allows you to have two completely different identifiers, each defined by their own domain, yet pointing to the same thing (and they'll probably have an equality relation to each other as well).
This would in effect be adding that additional level of indirection I was trying to avoid. (Where do you find these equality relationships?) I agree this can work. But, I'm curious what functionality are you trying to preserve by 'not' including some structure in the identifier? Just the ability of individual entities to define the identifiers in their own way? Can you tell me why this is important? As I said in the previous message, I think this is about making appropriate engineering trade-offs. For a global distributed directory, I think performance would be better if locally defined names were mapped to a global name with this 'context' before using it in the distributed system. This method would also preserves local names since you only need to convert to the global identifier when interacting with the 'global' infrastructure. But, I recognize that there may be other constraints on these identifiers in other spaces. So, I ask - what are those *engineering* constraints? jeff

Jeff W. Boote wrote:
But, I recognize that there may be other constraints on these identifiers in other spaces. So, I ask - what are those *engineering* constraints?
If the owner is part of the identifier (as I just suggested in my previous mail; e.g. identifier=glif.is:2678), and there is another mapping from owner (glif.is) to phonebook (e.g. https://idc.internet2.edu/ws/status.cgi), then each domain can only have one phone book. In Jeroen's proposal, each identifier is just a string, and there is a direct mapping from identifier to each phone book. So without the intermediate mapping of the identifier to owner to phonebook. The advantage is that a domain can have two phonebooks. E.g. the phonebook for glif.is:2678 may be https://idc.internet2.edu/ws/status.cgi. while the phonebook for glif.is:2679 may be http://glif.is/ws/status.cgi. So Jeroen's proposal is even more flexible. (Jeroen, correct me if I'm wrong) Regards, Freek

Hi Freek,
If the owner is part of the identifier (as I just suggested in my previous mail; e.g. identifier=glif.is:2678), and there is another mapping from owner (glif.is) to phonebook (e.g. https://idc.internet2.edu/ws/status.cgi), then each domain can only have one phone book.
I think that in both cases (domain + opaque local ID and attribute/value pairs including domain=) the owner is part of the ID. That seems to be necessary to avoid another level of indirection.
In Jeroen's proposal, each identifier is just a string, and there is a direct mapping from identifier to each phone book. So without the intermediate mapping of the identifier to owner to phonebook. The advantage is that a domain can have two phonebooks. E.g. the phonebook for glif.is:2678 may be https://idc.internet2.edu/ws/status.cgi. while the phonebook for glif.is:2679 may be http://glif.is/ws/status.cgi.
Something has to know that information for all of glif,is, right?
So Jeroen's proposal is even more flexible.
I don't agree that untyped, flat identifiers are more flexible. In our scheme, the opaque version could be: domain=glif.is, id=2678 But one could also have domain=glif.is, gole=foo domain=glif.is, node=rembrandt domain=glif, subdomain=something The attributed syntax allows you more flexibility to encode semantic info. martin

Martin Swany wrote:
I don't agree that untyped, flat identifiers are more flexible. In our scheme, the opaque version could be: domain=glif.is, id=2678
But one could also have domain=glif.is, gole=foo domain=glif.is, node=rembrandt domain=glif, subdomain=something
The attributed syntax allows you more flexibility to encode semantic info.
No it is less flexible. Consider: path identifier glif.is:2678 glif.is:2678 node rembrandt (Two statements) And: path identifier "domain=glif.is, node=rembrandt" In the second case, if the node changes (while the path remains), the identifier has to change as well. In the first case, the identifier can remain the same, while a property of the path changes. Therefor, the first solution is more flexible. Regards, Freek

On Sep 25, 2008, at 9:14 AM, Freek Dijkstra wrote:
Consider: path identifier glif.is:2678 glif.is:2678 node rembrandt (Two statements)
And: path identifier "domain=glif.is, node=rembrandt"
In the second case, if the node changes (while the path remains), the identifier has to change as well. In the first case, the identifier can remain the same, while a property of the path changes.
Therefor, the first solution is more flexible.
A path doesn't actually have node= in its ID in the current scheme we're using. It would just be domain=X, path=Y and it would not need to change. martin

This is a good discussion. Jeff W. Boote wrote:
My point is that I don't want a single phone book because that does not scale well and does not allow each 'publisher' the ability to control who they share addresses with.
Fascinating that we both have this same premises (do not create a global phonebook) we end up with different rules for identifiers.
Therefore, the name needs to have enough information to direct you to the correct phone book. This is not about finding the address (yet), it is about finding the correct phone book.
So beside the unique identifier itself, we need information where to get more information about what the identifier means, who owns it, and where some sort of status information about this thing can be retrieved. Which of this additional information should be part of the identifier, and which can be sent along with that information (in the same message, but in a different field, and not formally part of the identifier, as Jeroen suggested). 1) type information (is it is link, or a path, or a domain?) 2) owner name? (the domain prefix) 3) publisher name? (is that the same as the owner?) (*) 4) location of the phonebook (http://glif.is/webservice/status.cgi?) 5) type of interface for the phonebook (webservice?) Surely, we want to avoid an identifier which include all, like "urn:ogf:network:identifier:type=path:domain=glif.is:sequence=2678:statustype=webservice:statusurl=https://idc.internet2.edu/ws/status.cgi" If I would follow the (otherwise excellent) article Jeroen sent, I would have an identifier like "JtsGuE5NBhHmlj6LuhC4" which requires messages like <message><path><identifier>JtsGuE5NBhHmlj6LuhC4</identifier><identifier><domain>glif.is</domain><status><type>webservice</type><url>https://idc.internet2.edu/ws/status.cgi</url></status></path></message> or a least <message><path><identifier>JtsGuE5NBhHmlj6LuhC4</identifier><identifier><domain>glif.is</domain></path></message> In my view, a good compromise would be to include the owner, but not the location of the phone book in an identifier. And nothing else either. Thus the identifier would be "glif.is:2678", which signifies "glif.is" as the owner. Then there is an indirection saying that glif.is has a lookup service of type webservice, located at https://idc.internet2.edu/ws/status.cgi. I really like this indirection to be explicit (not derived from the syntax of the owner). This additional information can either be given along with the identifier (eliminating an additional RTT) or it can be retrieved once and cached. The important thing is that both are possible without changing the anatomy of the identifier. Regards, Freek (*) I always like to make a distinction between whoever "owns" a policy; the policy maker, and the "operator", the policy implementer. I'm not sure if it relevant here, but I can imagine that owner delegates the information publishing service to someone else.
participants (5)
-
Aaron Brown
-
Freek Dijkstra
-
Jeff W. Boote
-
Jeroen van der Ham
-
Martin Swany