Re: [Nml-wg] Identifiers

8 Nov 2010

      Freek/All;

On 11/7/10 2:16 PM, Freek Dijkstra wrote:
...
Hi all,
After writing our ideas on identifiers down, I still have a five smaller
questions.
Quotes are from the meeting notes
(http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.nml-wg/doc...):
...
Rough consensus on:
- http://schemas.ogf.org/nml/base/2013/10/ (Jason's proposal)
Question 1. Should the schema end with a / or #?
a) http://schemas.ogf.org/nml/base/2013/10   (common for XML)
b) http://schemas.ogf.org/nml/base/2013/10/  (current proposal)
c) http://schemas.ogf.org/nml/base/2013/10#  (common for RDF)
For XML I don't think it makes any difference; for RDF, I think it
should be b or c. (We may decide on a different namespace for XML and
RDF, but I propose not to do that unless there are compelling reasons to
do so).
b) is what we use currently in the perfSONAR/NMC world.  Ex:

https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-LookupService/...
...
Further recap from the meeting notes:
...
In Catania NML decided on Instance identifiers format: urn:ogf:network:<domain part>:<local part>
<local>  is opaque only processed by end parts
GLIF also agreed to use this format.
Richard&  Freek did put together a doc for the IETF RFC to define the URI
Freek has translated to xml but he needs to consult Joel on web site details
Case insensitive:
RFC says have to specify case sensitive/insensitive
So need to define urn:ogf:  at OGF level
Then :network: and the rest case insensitive.
i.e. have to define the lexical equivalence.
Rough consensus on:
- Different objects eg link and port MUST have different identifiers
- instance identifiers are case insensitive
- instance identifiers are non-international (thus an URI instead of IRI)
- URI are not restricted by length, other than possible restrictions
   by RFC 2141 (the current GLIF recommendation is max 48 to 80 bytes)
I forgot to mention in the notes that we a discussion how to refer to
identifiers.
(see slides 14-18 in http://forge.gridforum.org/sf/go/doc16081)
- RDF uses the attributes rdf:about and rdf:resource
- NM-WG uses the attributes id and idref
- The BUILT-IN XML ID and IDREF attributes can not be used,
   since they only work within a document.
We had a discussion if we should re-use the id and idref from the NM
working group (formally: re-use the attributes in the
http://ggf.org/ns/nmwg/base/2.0/ namespace) or are to redefine these
attributes again.
I forgot what the consensus was.
Question 2. What attributes to use for references in XML?
a) existing id and idref in NM-WG namespace
b) redefine id and idref in NML namespace
c) create dedicated namespace for just id and idref
As I stated in person (but will restate for this list) its uncommon to 
try and associate attributes with a specific namespace other than what 
is associated with the parent element.  E.g.:

   <ns:element attribute="something" />

Implies that 'attribute' is in the 'ns' namespace.  It is uncommon to 
see this:

   <ns:element ns2:attribute="something" />

But it is possible.

I think b) makes the most sense; we do this in NM/NMC now.
...
We decided on the urn:ogf:network:example.net:opaque-identifier syntax.
We have not yet defined what characters should be allowed in the opaque
identifier part. We have the following options:
Allowed characters:
GLIF:       A-Z a-z 0-9 - .
RFC2141:    A-Z a-z 0-9 - . _ ( ) + , : = @ ; $ ! * ' %hex
pchar:      A-Z a-z 0-9 - . _ ~ ( ) + , : = @ ; $ ! * '&  %hex
unreserved: A-Z a-z 0-9 - . _ ~
where %hex is a percentage-encoding. E.g. %2E.
- unreserved and pchar are definitions from RFC 3986, which defines URIs
- GLIF is what is defined in the GLIF working group. This is extremely
limited (: and _ are not allowed).
- RFC 2141 is what is currently allowed in a URN. (this list excludes 4
"reserved" characters which are in the definition for future use.)
- RFC 2141 is currently being revised. It is very likely that&  and ~
will be allowed, making the definition equal to that of pchar.
- unreserved is similar to the current GLIF list.
- Note that the following characters are NEVER allowed:
   % / ? # [ ] \ "<  >  [ ] ^ ` { | }
Question 3. What characters are allowed in<opaque string>?
a) GLIF:       A-Z a-z 0-9 - .
b) unreserved: A-Z a-z 0-9 - . _ ~
c) RFC2141:    A-Z a-z 0-9 - . _ ( ) + , : = @ ; $ ! * ' %hex
d) pchar:      A-Z a-z 0-9 - . _ ~ ( ) + , : = @ ; $ ! * '&  %hex
I believe we should use the approach that is going to be supported the 
most widely, in parsing tools/libraries and what is most closely matched 
to GLIF and other standards bodies.
...
The current schema states that ALL Network Objects MUST have an identifier.
This is very strict. For example, even a network object that is never
referenced MUST still have an ID. Thus the following is NOT allowed:
<nml:bidirectionallink id="urn:ogf:network:es.net:bilink_A-C">
   <nml:link>
     <nml:relation type="serialcompound">
       <nml:link idRef="urn:ogf:network:es.net:link_A_to_B"/>
       <nml:link idRef="urn:ogf:network:es.net:link_B_to_C"/>
     </nml:relation>
   </nml:link>
   <nml:link>
     <nml:relation type="serialcompound">
       <nml:link idRef="urn:ogf:network:es.net:link_C_to_B"/>
       <nml:link idRef="urn:ogf:network:es.net:link_B_to_A"/>
     </nml:relation>
   </nml:link>
</nml:bidirectional>
Instead, everything MUST be named, like so:
<nml:bidirectionallink id="urn:ogf:network:es.net:bilink_A-C">
   <nml:link id="urn:ogf:network:es.net:link_A_to_C">   <!-- ADDED id -->
     <nml:relation type="serialcompound">
       <nml:link idRef="urn:ogf:network:es.net:link_A_to_B"/>
       <nml:link idRef="urn:ogf:network:es.net:link_B_to_C"/>
     </nml:relation>
   </nml:link>
   <nml:link id="urn:ogf:network:es.net:link_C_to_A">   <!-- ADDED id -->
     <nml:relation type="serialcompound">
       <nml:link idRef="urn:ogf:network:es.net:link_C_to_B"/>
       <nml:link idRef="urn:ogf:network:es.net:link_B_to_A"/>
     </nml:relation>
   </nml:link>
</nml:bidirectional>
Question 4. MUST all object have an id?
a) All Network Objects MUST have an identifier.
a) All Network Objects SHOULD have an identifier.
"SHOULD" means that an identifier may be left out, but only if it is
clear what the consequences are (in this case: the result can not be
referred to.)
As a parallel to the perfSONAR/NMC world - all first order objects have 
an ID field (e.g. data, metadata, subject, parameters, key).  Some do 
not (eventType, 'parameter' [lives inside of parmeters], datum, time 
formats).

I do not have a strong opinion on this, but I think that if you plan on 
ever referencing an object (e.g. in your 2nd example above creating the 
serial compund A_C out of A_B and B_C) it should have an ID.  If the 
relationship is temporal and will never be referenced it won't need the 
ID, but it doesn't seem like a stretch to just give it one anyway.

I suppose I would prefer a) to be safe, but won't defend it to the death.
...
The current schema states that the Syntax of the identifier MUST follow
the urn:ogf:network syntax.
This might make future compatibility harder (e.g. when trying to combine
it with other protocols; I can imagine that in the future other naming
schema's may be developed).
Question 5. MUST urn:ogf:network syntax be used?
a) All identifiers MUST follow the urn:ogf:network syntax
b) All identifiers MUST be a URI, and SHOULD follow the urn:ogf:network
syntax
c) All identifiers MUST be a unique, and MAY follow the urn:ogf:network
syntax
(some more variants are possible)
No strong preference.  I think that using the urn syntax helps to 
guarantee uniqueness, but I would need to see examples of when it would 
be impossible to assign this type of ID to a given object.

-jason

Re: [Nml-wg] Identifiers

Jason Zurawski