XML syntax for NML relations

Hi all, Let me follow up on a technical part of yesterday's discussion.
At the end of the session, we had a discussion on how to encode the relation name (here: "serialcompound") in XML. E.g. each relation as a different element, a different attribute, or a different value of a fixed attribute.
So far, the relation type is encoded in the value of a type attribute (e.g. <nml:relation type="serialcompound">). Freek pointed out that this makes it harder to create a meaningfull schema for validation. For example, there is a difference syntax associated with next relations and serialcompound relations. <nml:relation type="serialcompound"> may contain 0 or more children, while <nml:relation type="next"> must contain exactly one child. The RNC schema does provide a means to specify this difference. Jeff confirmed this and pointed out that it is of course still possible to check the syntax using other logic, it is just not possible to create a detailed schema. (Freek pointed out that NMC has the same problem, which is the reason that no non-trivial WSDL was created for perfsonar, and no software used a SOAP library, despite that most NMC messages are embedded in a SOAP envelope). The encoding of the type of relation in a element (e.g. <nml:serialcompound>) allows more meaningful validation. However, this has another drawback: if a client does not know about a particular relation, it does not know if the unknown element is subclass of a relation and can or can not be ignored. In fact, it can not even do basic syntax checking (as if it was just a unknown NML relation), since it does not know it is a NML relation without parsing the schema definition (which will list the new element as a subclass of the base element). So there seem to conflicting requirements: * ability to do extended validation * know from just looking at XML (but not the schema) that something is a NML relation It may be possible to find a solution that caters to both requirements. For example, in NM, subclasses use the same name (a trick known as chameleon namespace in RNC). E.g. A subclass of e.g. http://schema.ogf.org/nm/base/port is also called "port", e.g. http://schema.ogf.org/nm/layer2/port. A similar trick is enforce that all relations must always be defined in the same namespace (which may only contain relations). E.g. everything that starts with http://schema.ogf.org/nml/relation/ is a relation. I think we should discuss this further. I know the above is a difficult technical discussion, so I appreciate that you read so far. Please provide feedback if this was clear, and if you have any opinion or thoughts on this topic. Regards, Freek

Hi Freek; On 7/16/11 4:39 PM, thus spake Freek Dijkstra:
Hi all,
Let me follow up on a technical part of yesterday's discussion.
At the end of the session, we had a discussion on how to encode the relation name (here: "serialcompound") in XML. E.g. each relation as a different element, a different attribute, or a different value of a fixed attribute.
So far, the relation type is encoded in the value of a type attribute (e.g.<nml:relation type="serialcompound">). Freek pointed out that this makes it harder to create a meaningfull schema for validation. For example, there is a difference syntax associated with next relations and serialcompound relations.<nml:relation type="serialcompound"> may contain 0 or more children, while<nml:relation type="next"> must contain exactly one child. The RNC schema does provide a means to specify this difference. Jeff confirmed this and pointed out that it is of course still possible to check the syntax using other logic, it is just not possible to create a detailed schema. (Freek pointed out that NMC has the same problem, which is the reason that no non-trivial WSDL was created for perfsonar, and no software used a SOAP library, despite that most NMC messages are embedded in a SOAP envelope).
The encoding of the type of relation in a element (e.g. <nml:serialcompound>) allows more meaningful validation. However, this has another drawback: if a client does not know about a particular relation, it does not know if the unknown element is subclass of a relation and can or can not be ignored. In fact, it can not even do basic syntax checking (as if it was just a unknown NML relation), since it does not know it is a NML relation without parsing the schema definition (which will list the new element as a subclass of the base element).
Just to confirm ... do you mean to say '<serialcompound:relation>' where the 'serialcompound' namespace is something defined in addition to the nml base namespace? This would foster use of the same element 'relation' in both the base and subsequent namespaces (for reference the concept is described here: http://books.xmlschemata.org/relaxng/relax-CHP-11-SECT-5.html). In your example above you have created a new element, which would imply a special case instead of using the relation element at all. Since this is a complicated construct I do want to be sure we are all on the same page. Thanks; -jason
So there seem to conflicting requirements: * ability to do extended validation * know from just looking at XML (but not the schema) that something is a NML relation
It may be possible to find a solution that caters to both requirements. For example, in NM, subclasses use the same name (a trick known as chameleon namespace in RNC). E.g. A subclass of e.g. http://schema.ogf.org/nm/base/port is also called "port", e.g. http://schema.ogf.org/nm/layer2/port. A similar trick is enforce that all relations must always be defined in the same namespace (which may only contain relations). E.g. everything that starts with http://schema.ogf.org/nml/relation/ is a relation.
I think we should discuss this further. I know the above is a difficult technical discussion, so I appreciate that you read so far. Please provide feedback if this was clear, and if you have any opinion or thoughts on this topic.
Regards, Freek

Hi, I've been thinking about the relation syntax. So far, we have seen these two proposals: <nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link> and: <nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link> The advantage of the first syntax is that it is very easily extendable, and it is still obvious for a parser to understand that it is some kind of nml:relation, even if the particular type of relation is not known by the parser. The advantage of the second syntax is that it is easy to create a meaningful validator for each specific nml:relation. I dislike both syntaxes, and was hoping for a syntax that would provide both benefits. If I'm correct, the following syntax will do just that: <nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relations> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:relations> </nml:link> This adds a parent element to the relation elements, signifying that <nmlserialcompound:relation> is indeed a nml:relation. So even a parser that has no knowledge about this particular nml:relation still knows it's base syntax, while a parser that understands the details can still use an meaningful syntax validator (such as XSD) to make sure the syntax is correct. Would this do, and is this syntax acceptable to all? Regards, Freek

W dniu 2011-08-16 12:09, Freek Dijkstra pisze:
Hi,
Hi Freek,
I've been thinking about the relation syntax.
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
and:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
The advantage of the first syntax is that it is very easily extendable, and it is still obvious for a parser to understand that it is some kind of nml:relation, even if the particular type of relation is not known by the parser.
The advantage of the second syntax is that it is easy to create a meaningful validator for each specific nml:relation.
I dislike both syntaxes, and was hoping for a syntax that would provide both benefits.
If I'm correct, the following syntax will do just that:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relations> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:relations> </nml:link>
This adds a parent element to the relation elements, signifying that <nmlserialcompound:relation> is indeed a nml:relation. So even a parser that has no knowledge about this particular nml:relation still knows it's base syntax, while a parser that understands the details can still use an meaningful syntax validator (such as XSD) to make sure the syntax is correct.
The solution with namespaces gives you that (nmlserialcompound:relation inherits from the base nml:relation). nml:relations only complicates the xml structure without giving too much. Cheers, Roman
Would this do, and is this syntax acceptable to all?
Regards, Freek _______________________________________________ nml-wg mailing list nml-wg@ogf.org http://www.ogf.org/mailman/listinfo/nml-wg

Comments follow original text: Roman Łapacz wrote:
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
and:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
The advantage of the first syntax is that it is very easily extendable, and it is still obvious for a parser to understand that it is some kind of nml:relation, even if the particular type of relation is not known by the parser.
The advantage of the second syntax is that it is easy to create a meaningful validator for each specific nml:relation.
I dislike both syntaxes, and was hoping for a syntax that would provide both benefits.
If I'm correct, the following syntax will do just that:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relations> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:relations> </nml:link>
This adds a parent element to the relation elements, signifying that <nmlserialcompound:relation> is indeed a nml:relation. So even a parser that has no knowledge about this particular nml:relation still knows it's base syntax, while a parser that understands the details can still use an meaningful syntax validator (such as XSD) to make sure the syntax is correct.
The solution with namespaces gives you that (nmlserialcompound:relation inherits from the base nml:relation). nml:relations only complicates the xml structure without giving too much.
You mean you prefer the following?: <nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link> How should the parser know that nmlserialcompound:relation inherits from the base nml:relation? I can think of two things: - Because the parser has knowledge of the schema definition - Because the parser assumes that all elements named "relation" are subclasses of nml:relation. The problem with the first is that it requires all parsers known all schemas beforehand. I see a risk with backward compatibility after future extensions if that is required. The problem with the second is that this fails if some schema (for whatever reason) includes a namespace where relation has a different meaning. Eg: family:relation or work:relation. For this reason, including the extra nesting with <nml:relations> seems to me a relative simple solution to solve these problems. Regards, Freek

Hi Freek/All; On 8/16/11 8:32 AM, thus spake Freek Dijkstra:
Comments follow original text:
Roman Łapacz wrote:
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
and:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
The advantage of the first syntax is that it is very easily extendable, and it is still obvious for a parser to understand that it is some kind of nml:relation, even if the particular type of relation is not known by the parser.
The advantage of the second syntax is that it is easy to create a meaningful validator for each specific nml:relation.
I dislike both syntaxes, and was hoping for a syntax that would provide both benefits.
If I'm correct, the following syntax will do just that:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relations> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:relations> </nml:link>
This adds a parent element to the relation elements, signifying that <nmlserialcompound:relation> is indeed a nml:relation. So even a parser that has no knowledge about this particular nml:relation still knows it's base syntax, while a parser that understands the details can still use an meaningful syntax validator (such as XSD) to make sure the syntax is correct.
The solution with namespaces gives you that (nmlserialcompound:relation inherits from the base nml:relation). nml:relations only complicates the xml structure without giving too much.
You mean you prefer the following?:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
I would prefer there to be just 'nml:relation' because it is still not clear to me what you intend to put in the 'nmlserialcompound' relationship that is special enough to be in a different namespace. Do you have examples of what you intend to do with this relationship? In any event, what I think doesn't matter in this case, because your sub-namespace (nmlserialcompound) would derive from the base namespace (nml). Which is the entire point of doing things in this manner. Even though its a special namespace, it can be reduced to the base which should make services happy.
How should the parser know that nmlserialcompound:relation inherits from the base nml:relation? I can think of two things:
- Because the parser has knowledge of the schema definition
Yes, its written in the schema implicitly. If the parser for some service was loaded with the 'nmlserialcompound' version of the schema, that schema has to have in it somewhere that 'nmlserialcompound:relation' is really just an 'nml:relation' that has been extended. This service is intelligent enough to parse both 'nml' and 'nmlserialcompound' versions. The converse is that a service that has no knowledge of the 'nmlserialcompound' relationship may not be able to read the elements. It would only be able to interpret the relations as 'nml' - if it is able to interpret them at all. Strict validation may force the 'nmlserialcompound' namespaced elements to be ignored or rejected (depends on the semantic rules).
- Because the parser assumes that all elements named "relation" are subclasses of nml:relation.
I dont think you can make this assumption. It is true that some non-strict checking services may take this path, but semantic rules can be enforced if so desired.
The problem with the first is that it requires all parsers known all schemas beforehand. I see a risk with backward compatibility after future extensions if that is required.
This is a little extreme - note that the 'standard' use case would be that all services would have knowledge of the base. This is the 'english' (so to speak) of web services. Since all elements derive from this base dialect, it is assumed that services will communicate and be assured that their messages are semantically correct. If you implement a service that absolutely requires a special namespace (e.g. 'nmlserialcompound' for some reason) this encoding will *only* make sense to other services that understand it. E.g. the services are now speaking a different language. Here is an example: <nmlserialcompound:relation freeks_special_attribute="true" /> If this was passed to a service that doesn't understand 'nmlserialcompound', the service could simply reject the entire element. It also could accept the foreign namespace, and attempt to parse the element with the only version of relation it happens to know about (nml). This would mean that the foreign attribute would most likely be ignored, but it still allows the message to be parsed. As I have said in previous exchanges - special dialects are useful if you intend to have specialized services and clients. From the perfSONAR/NMC world we are constructing the 'base' protocol that is really useless by itself - no services speak 'base'. We are also making a Measurement Archive protocol which is an extension of the base and uses concepts that only this type of service would care about. No services will speak this explicitly. Lastly there would be specific forms of data that a service would be expected to speak (e.g. throughput data, one way delay data, snmp data, etc.). These are the dialects that the services would be comfortable speaking. You can't simply send a request to utilization to a service that only speaks one way delay - even though the messages have the same structure, the semantics of what is being requested are slightly different. The service can attempt to parse the message in a base dialect, but most likely will reject things outright.
The problem with the second is that this fails if some schema (for whatever reason) includes a namespace where relation has a different meaning. Eg: family:relation or work:relation.
If you don't know the schema, you fail. This is how it has to be.
For this reason, including the extra nesting with<nml:relations> seems to me a relative simple solution to solve these problems.
After your entire mail, I am not exactly sure how you reached this conclusion. Lets say I had this: <nml:relations> <jason:relation> <!-- other things ... --> </jason:relation> </nml:relations> I am still in the situation as you described above, and now I have an extra element that really doesn't help me solve any problem. If you are still unclear on the concept of the namespaces I am happy to try to explain them more, but the exact things you are trying to solve with the addition of more elements can be done through namespaces and inheritance, NMC has done this for a while and it has worked well in practice in perfSONAR. Thanks; -jason

Jason Zurawski wrote:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
I would prefer there to be just 'nml:relation' because it is still not clear to me what you intend to put in the 'nmlserialcompound' relationship that is special enough to be in a different namespace.
I meant it as a generic example of an extension to the base relations. It could be just as well: <nmlversion2:some_future_relation_we_havent_thought_of_yet> or in your preferred syntax: <nml:relation type="some_future_relation_we_havent_thought_of_yet"> I meant nothing special by choosing this namespace in the example.
In any event, what I think doesn't matter in this case, because your sub-namespace (nmlserialcompound) would derive from the base namespace (nml). Which is the entire point of doing things in this manner. Even though its a special namespace, it can be reduced to the base which should make services happy.
What do you mean with "special" namespace? I'm currently not making any assumptions about the namespace. (I only make the assumption that there is a schema that defines the element nmlserialcompound:relation as a subclass of nml:relation)
How should the parser know that nmlserialcompound:relation inherits from the base nml:relation? I can think of two things:
- Because the parser has knowledge of the schema definition
Yes, its written in the schema implicitly. If the parser for some service was loaded with the 'nmlserialcompound' version of the schema, that schema has to have in it somewhere that 'nmlserialcompound:relation' is really just an 'nml:relation' that has been extended. This service is intelligent enough to parse both 'nml' and 'nmlserialcompound' versions.
What I'm talking about is a service which does understand the generic nml:relation concept, and some nml:relation subclasses, but not all nml:relation subclasses. I'm thinking about some some future extension. Let's say we create a NML version 2 in 5 years time. How should a parser which knows about NML 1 handle a NML version 2 message? Fail completely, or is there a way to make NML extensible from the start. E.g. by allowing someone to create a new nml:relation subclass, without completely rewriting the NML base. Thus: would it be possible to create a new schema that extends NML base, and let old parsers (who are not aware of this new schema) know that the message contains some additional relations based on this new extension? It is my hope that a parser would be able to read the NML it does known, and ignore the new schema additions.
If you don't know the schema, you fail. This is how it has to be.
You mean in the above scenario the parser should fail completely? Or only ignore the unknown extensions?
- Because the parser assumes that all elements named "relation" are subclasses of nml:relation.
I dont think you can make this assumption.
I'm pleased to hear that -- it was not clear to me from Roman's email if he made that assumption or not, so I was polling for that.
<nmlserialcompound:relation freeks_special_attribute="true" />
If this was passed to a service that doesn't understand 'nmlserialcompound', the service could simply reject the entire element.
That sound good to me. It should reject the element, not the entire message. I think we agree on that (good!).
It also could accept the foreign namespace, and attempt to parse the element with the only version of relation it happens to know about (nml). This would mean that the foreign attribute would most likely be ignored, but it still allows the message to be parsed.
I think we're on the right track here. I like the above idea, that a service may attempt to parse the element using some basic knowledge about the base. However, I don't like the idea of a parser that comes across any unknown element and simply tries to parse it using a base element, _without knowing if this unknown element is actually a subclass of this base element_. So I want to include some clue in the XML telling the parser just that ("hey, here is a nmlserialcompound:relation, and if you don't know what that is, simply treat it as a generic nml:relation") The original syntax: <nml:relation type="serialcompound"> did this perfectly. However, as I tried to explain, this particular syntax has a drawback if you want to do meaningful syntax validation. Hence my proposal.
For this reason, including the extra nesting with <nml:relations> seems to me a relative simple solution to solve these problems.
After your entire mail, I am not exactly sure how you reached this conclusion.
Sorry, I left out a crucial bit in the email (no point in having this clear in my head if I don't make it clear here): All child elements of nml:relations MUST be a subclass of nml:relation.
Lets say I had this:
<nml:relations> <jason:relation> <!-- other things ... --> </jason:relation> </nml:relations>
I am still in the situation as you described above, and now I have an extra element that really doesn't help me solve any problem.
So <jason:relation> must be a nml:relation subclass because of the above requirement.
If you are still unclear on the concept of the namespaces I am happy to try to explain them more, but the exact things you are trying to solve with the addition of more elements can be done through namespaces and inheritance, NMC has done this for a while and it has worked well in practice in perfSONAR.
I understand it, and I agree for most part that it works well, with the exception of the syntax validation requirement. I was trying to come up with a solution that has all the benefits that the NMC gives, but also has this benefit. My first attempt at that (<nmlserialcompound:relation>) failed, well, miserably (sorry, I'm not that smart). I think (hope?) my second attempt (<nml:relations><nmlserialcompound:relation>) does a better job at that. Let me also try to clarify a point you made in your follow up email:
It is a mistake to assume that a parser by itself is capable of rich semantic interpretation.
I agree, that is not possible. However, I do hope that a parser is -by only knowing a base schema and a few (but not necessarily all!) schema extensions- able to validate the SYNTAX as good as it can. That is currently not the case in the current perfSONAR implementations, and I regret seeing that. I hope that by at least allowing easy and meaningful SYNTAX validation, the parser code can concentrate on what is important: the SEMANTICS. Thus the application specific stuff. I think that is where the real fun is, and I hope to use some library for the boring SYNTAX checking stuff. I DO think that SYNTAX checking is important (albeit boring); if we do it well, we can can specify how parsers should treat syntactically invalid messages, and avoid a slurry of implementation incompatibility problems later. Regards, Freek

Hi Freek; I really believe we need to take a step back here, there were a lot of emails exchanged this morning and we are talking past each other on many of these key issues. If there is a fundamental misunderstanding on either end, I would suggest a call instead of continuing via email. This is a very long and involved email, and if we are not on the same page I fear this is really not a constructive exercise. This being said I will try to answer your further concerns below: On 8/16/11 10:19 AM, thus spake Freek Dijkstra:
Jason Zurawski wrote:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
I would prefer there to be just 'nml:relation' because it is still not clear to me what you intend to put in the 'nmlserialcompound' relationship that is special enough to be in a different namespace.
I meant it as a generic example of an extension to the base relations. It could be just as well: <nmlversion2:some_future_relation_we_havent_thought_of_yet>
or in your preferred syntax: <nml:relation type="some_future_relation_we_havent_thought_of_yet">
I meant nothing special by choosing this namespace in the example.
If there is a 'new' item that needs to be modeled in the base, than you make a new version of the schema - this is perfectly normal. Extension to different namespaces implies something different. This usually means that the base is insufficient for some reason, and the new namespace will include special elements that are too specific for the base or model a behavior of a service that will be using the dialect.
In any event, what I think doesn't matter in this case, because your sub-namespace (nmlserialcompound) would derive from the base namespace (nml). Which is the entire point of doing things in this manner. Even though its a special namespace, it can be reduced to the base which should make services happy.
What do you mean with "special" namespace? I'm currently not making any assumptions about the namespace. (I only make the assumption that there is a schema that defines the element nmlserialcompound:relation as a subclass of nml:relation)
To review XML for one second: <prefix:element atrribute="something" xmlns:prefix="http://something.net" /> - prefix = shorthand notation for a namspace specified in the 'xmlns:' definition - element = xml element. When 'prefix' precedes the element name, the element is assumed to live (and take the definition from) the namespace. Lack of a prefix implies the element is in the default namespace for a given instance document - attribute = attribute of the xml element. Assumed to live in the same or default namespace. - schema = formal syntactic definition of the above Each time you start your examples with 'nmlserialcompound:relation', this to me means you are proposing a new namespace that will include elements that go above and beyond what is included in 'nml:relation'. If you do not intend to do this, then you should use different notation to be clear about what you are trying to do. In out experience, schema definitions are produced to describe the denizens of a specific namespace. It is possible to use multiple namespaces in a single schema, but so far you examples are born out of instance documents, not actual schema. I apologize if I am taking your examples the wrong way, but you seem to be confusing several different concepts at once.
How should the parser know that nmlserialcompound:relation inherits from the base nml:relation? I can think of two things:
- Because the parser has knowledge of the schema definition
Yes, its written in the schema implicitly. If the parser for some service was loaded with the 'nmlserialcompound' version of the schema, that schema has to have in it somewhere that 'nmlserialcompound:relation' is really just an 'nml:relation' that has been extended. This service is intelligent enough to parse both 'nml' and 'nmlserialcompound' versions.
What I'm talking about is a service which does understand the generic nml:relation concept, and some nml:relation subclasses, but not all nml:relation subclasses.
I'm thinking about some some future extension. Let's say we create a NML version 2 in 5 years time. How should a parser which knows about NML 1 handle a NML version 2 message? Fail completely, or is there a way to make NML extensible from the start. E.g. by allowing someone to create a new nml:relation subclass, without completely rewriting the NML base.
You are still confusing two very different concepts, versioning and extension are not the same thing and really should not be treated as such. I would not expect there to be explicit backwards compatibility in either method (new schema vs extension into a new version); backwards compatibility at the semantic level can easily be done but it is very hard to attempt this syntactic constructs. Consider these poorly drawn pictures in tree form: v1 -> sub namespace 1 -> sub sub namespace 1 -> sub namespace 2 -> sub namespace 3 -> v2 -> sub namespace 4 ... Note that adding in v2 as a extension of the first separates all of the sub-namespaces that are already in use. I would claim this is cleaner, but still doesn't completely offer backward compatibility: v1 -> sub namespace 1 -> sub sub namespace 1 -> sub namespace 2 -> sub namespace 3 v2 -> sub namespace 4 Authors of the old subnamespaces would have to adapt and re-package in either case. As an example, NM v1 is not compatible at all with NM v2. And as a furthering to that example, NM v2 has been in place for nearly 7 years. Do you have a large concern that NML will be creating new versions quiet frequently? I really don't believe this will be the case considering how long it has taken to get the first version 'right'.
Thus: would it be possible to create a new schema that extends NML base, and let old parsers (who are not aware of this new schema) know that the message contains some additional relations based on this new extension? It is my hope that a parser would be able to read the NML it does known, and ignore the new schema additions.
I am very confused as to why you want this behavior. Rarely will it be the case that an entire service can simply be given a new schema and work flawlessly. The introduction of new semantic concepts via the syntax will force underlying logical changes in old services. It is unrealistic to assume that all old services will continue to function as they have before. They may still 'work', in that they can downcast to the last known schema they happen to know about, but I would not exepct a service author to implement this sort of behavior anyway. Its an additional layer of 'permissiveness' that does not get you much functionality.
If you don't know the schema, you fail. This is how it has to be.
You mean in the above scenario the parser should fail completely? Or only ignore the unknown extensions?
If you are relying solely on syntactic validation, which is what you seem to be proposing, yes. If the instance doesn't match the schema, it is rejected at the parser level immediately. This is why in our experience we avoid syntactic validation and rely on semantic rules (outside of the parser, and inside of the application) instead. It is much more flexible to make a semantic decision which can allow for a richer set of rules than simply relying on a 'yes or no' from the parser.
- Because the parser assumes that all elements named "relation" are subclasses of nml:relation.
I dont think you can make this assumption.
I'm pleased to hear that -- it was not clear to me from Roman's email if he made that assumption or not, so I was polling for that.
<nmlserialcompound:relation freeks_special_attribute="true" />
If this was passed to a service that doesn't understand 'nmlserialcompound', the service could simply reject the entire element.
That sound good to me.
It should reject the element, not the entire message. I think we agree on that (good!).
Again, it depends on the nature of your validation. Parser syntactical validation will toss the entire instance. Semantic validation can be constructed to skip unknown things.
It also could accept the foreign namespace, and attempt to parse the element with the only version of relation it happens to know about (nml). This would mean that the foreign attribute would most likely be ignored, but it still allows the message to be parsed.
I think we're on the right track here.
I like the above idea, that a service may attempt to parse the element using some basic knowledge about the base. However, I don't like the idea of a parser that comes across any unknown element and simply tries to parse it using a base element, _without knowing if this unknown element is actually a subclass of this base element_. So I want to include some clue in the XML telling the parser just that ("hey, here is a nmlserialcompound:relation, and if you don't know what that is, simply treat it as a generic nml:relation")
The original syntax: <nml:relation type="serialcompound">
did this perfectly.
However, as I tried to explain, this particular syntax has a drawback if you want to do meaningful syntax validation.
Hence my proposal.
And I will re-iterate that you will never be able to have all of the "meaningful syntax validation" you seem to want, and get full extensibility. Its a trade off that needs to be made - either extremely expressive syntax that can be fully validated by the parser (only), or the ability to extend to additional use cases by using general concepts. perfSONAR/NMC has gone with the latter, and I believe this has been very successful in allowing extensibility to other use cases beyond the base schema. You have pointed out that you dislike that a human can encode something wrong and things 'fail', I stand by my statement from in Salt Lake City that this is a necessary evil - if you encode things wrong, it will fail. I prefer <nml:relation type="serialcompound">, and based on the results of the meeting the audience seemed to agree that this was all that was needed. I find it strange that we must keep going around on this, because the minimal method gives the system and the human everything they need to encode the data.
For this reason, including the extra nesting with<nml:relations> seems to me a relative simple solution to solve these problems.
After your entire mail, I am not exactly sure how you reached this conclusion.
Sorry, I left out a crucial bit in the email (no point in having this clear in my head if I don't make it clear here):
All child elements of nml:relations MUST be a subclass of nml:relation.
Lets say I had this:
<nml:relations> <jason:relation> <!-- other things ... --> </jason:relation> </nml:relations>
I am still in the situation as you described above, and now I have an extra element that really doesn't help me solve any problem.
So<jason:relation> must be a nml:relation subclass because of the above requirement.
If you are still unclear on the concept of the namespaces I am happy to try to explain them more, but the exact things you are trying to solve with the addition of more elements can be done through namespaces and inheritance, NMC has done this for a while and it has worked well in practice in perfSONAR.
I understand it, and I agree for most part that it works well, with the exception of the syntax validation requirement.
I guess we are going to have to agree to disagree on how useful this is. At this point, the conversation is between us, and I doubt we will convince each other to move from the current positions. If you wish to instantiate a vote so we can move forward, this may be the best option.
I was trying to come up with a solution that has all the benefits that the NMC gives, but also has this benefit. My first attempt at that (<nmlserialcompound:relation>) failed, well, miserably (sorry, I'm not that smart). I think (hope?) my second attempt (<nml:relations><nmlserialcompound:relation>) does a better job at that.
Let me also try to clarify a point you made in your follow up email:
It is a mistake to assume that a parser by itself is capable of rich semantic interpretation.
I agree, that is not possible. However, I do hope that a parser is -by only knowing a base schema and a few (but not necessarily all!) schema extensions- able to validate the SYNTAX as good as it can.
You are confusing semantics and syntax again. Parsers only know what they are told, and to my knowledge a parser can only verify against a single schema at any given time. A parser must be told to verify an instance against a schema, and the schema itself has to have the tooling to 'include' the other possibilities that may be derived from other sources. This is what I have been pushing all along - use the derivative namespaces when possible, but put most of the trust that the service will simply 'do the right thing' in at the semantic level. Thanks; -jason
That is currently not the case in the current perfSONAR implementations, and I regret seeing that.
I hope that by at least allowing easy and meaningful SYNTAX validation, the parser code can concentrate on what is important: the SEMANTICS. Thus the application specific stuff. I think that is where the real fun is, and I hope to use some library for the boring SYNTAX checking stuff.
I DO think that SYNTAX checking is important (albeit boring); if we do it well, we can can specify how parsers should treat syntactically invalid messages, and avoid a slurry of implementation incompatibility problems later.
Regards, Freek

Hi Jason, Let me be the first to say that I very much appreciate your feedback, despite the disagreements we sometimes have. I for one still gladly invite you for a cup of coffee (or any other beverage) next OGF :) The discussion this afternoon was indeed a bit heated, but I think we both can separate feelings from the technical discussion. Perhaps a phone call may help. I know there is an NMC call scheduled this Thursday, otherwise I would have proposed an NML call at that time. There is still a lot to say regarding your previous mail, I may try to summarise it in a separate mail, but I'll gladly postpone that for now if you feel a short break serves the discussion. There's one off-hand remark that caught my eye:
to my knowledge a parser can only verify against a single schema at any given time.
To my knowledge it is possible for a parser to validate against multiple schema at the same time. I fear this alone may be responsible for quite a lot of our misunderstandings in this discussion. For example, to me that meant I saw no functional difference between an element in the same namespace and an element in a different namespace, while to you they are clearly very different (and I now understand my they are different to you; I previously didn't understand that). I also fear that the whole discussion at the first half of your previous email was just an exponent of this misunderstanding. At least, reading back I can now understand where your argument was coming from. Let's hope we can find the other underlying differences in assumptions we sure have (preferably with less lengthy mails ;) ). If you know a magical way to find these differences, let me know ... All the best, Freek

Hi Freek/All; On 8/16/11 2:41 PM, thus spake Freek Dijkstra:
Hi Jason,
Let me be the first to say that I very much appreciate your feedback, despite the disagreements we sometimes have. I for one still gladly invite you for a cup of coffee (or any other beverage) next OGF :)
The discussion this afternoon was indeed a bit heated, but I think we both can separate feelings from the technical discussion.
heated discussions are needed to get things done, but sometimes too much information on a mailing list is a bad thing. if things are getting too technical to describe in email we should move to phone.
Perhaps a phone call may help. I know there is an NMC call scheduled this Thursday, otherwise I would have proposed an NML call at that time.
NMC will need to meet this week, perhaps NML can meet next.
There is still a lot to say regarding your previous mail, I may try to summarise it in a separate mail, but I'll gladly postpone that for now if you feel a short break serves the discussion.
There's one off-hand remark that caught my eye:
to my knowledge a parser can only verify against a single schema at any given time.
To my knowledge it is possible for a parser to validate against multiple schema at the same time.
In my experience (libxml, some older Java libraries) a single schema is loaded into the parser. It is possible to reference schema from each other, e.g. in relax:
include "something.rnc" { # include things ... }
Trying to validate the same instance against different schemata simultaneously does not seem like a very fruitful exercise for a parser, unless there are multiple parsing passes being applied. If the latter is true, I would argue that more time is being spent in syntax checking than in the real guts of semantic evaluation. If you have real world examples I am happy to be proven wrong. Thanks; -jason
I fear this alone may be responsible for quite a lot of our misunderstandings in this discussion.
For example, to me that meant I saw no functional difference between an element in the same namespace and an element in a different namespace, while to you they are clearly very different (and I now understand my they are different to you; I previously didn't understand that). I also fear that the whole discussion at the first half of your previous email was just an exponent of this misunderstanding. At least, reading back I can now understand where your argument was coming from.
Let's hope we can find the other underlying differences in assumptions we sure have (preferably with less lengthy mails ;) ).
If you know a magical way to find these differences, let me know ...
All the best, Freek

Hi Freek/All; On 8/16/11 7:36 AM, thus spake Roman Łapacz:
W dniu 2011-08-16 12:09, Freek Dijkstra pisze:
Hi,
Hi Freek,
I've been thinking about the relation syntax.
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
I think you mean this instead: <nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nml:relation> </nml:link> Your example above would not parse correctly.
and:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:link>
The advantage of the first syntax is that it is very easily extendable, and it is still obvious for a parser to understand that it is some kind of nml:relation, even if the particular type of relation is not known by the parser.
The advantage of the second syntax is that it is easy to create a meaningful validator for each specific nml:relation.
I dislike both syntaxes, and was hoping for a syntax that would provide both benefits.
If I'm correct, the following syntax will do just that:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relations> <nmlserialcompound:relation> ... </nmlserialcompound:relation> </nml:relations> </nml:link>
This adds a parent element to the relation elements, signifying that <nmlserialcompound:relation> is indeed a nml:relation. So even a parser that has no knowledge about this particular nml:relation still knows it's base syntax, while a parser that understands the details can still use an meaningful syntax validator (such as XSD) to make sure the syntax is correct.
The solution with namespaces gives you that (nmlserialcompound:relation inherits from the base nml:relation). nml:relations only complicates the xml structure without giving too much.
I agree with Roman, the use of the 'relations' element is really not necessary here. I am still not clear why you believe this element is necessary. It is a 'grouping' concept from what I can tell, but this does not add any inheritance into the sub elements except that of parent/child. The concept of namepsaces gives you the inheritance that I think you want. -jason
Cheers, Roman
Would this do, and is this syntax acceptable to all?
Regards, Freek

Jason Zurawski wrote:
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
I think you mean this instead:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nml:relation> </nml:link>
Yes, you are right of course -- thanks for catching that.
I agree with Roman, the use of the 'relations' element is really not necessary here.
I am still not clear why you believe this element is necessary. It is a 'grouping' concept from what I can tell, but this does not add any inheritance into the sub elements except that of parent/child. The concept of namepsaces gives you the inheritance that I think you want.
The base element (<nml:relation type="serialcompound">) has the problem that it is hard to create a meaningful syntax validator. The subelement (<nmlserialcompound:relation>) has the problem all parser would need to know about the nmlserialcompound schema in advance, which hinders extensibility. (*) The subelement with extra parent element (<nml:relations><nmlserialcompound:relation>) does not have either of these two problems. Which of these 3 statements do you disagree with? Freek (*) I'm aware about some subtleties regarding my statement on the disadvantage of <nmlserialcompound:relation> -- a parser may still ignore it, not knowing it is a relation subclass -- I'm most happy to elaborate on that if you think that <nmlserialcompound:relation> is a better option than <nml:relations><nmlserialcompound:relation>.

Maybe it helps to put my requirements for the "relation" XML syntax on the table again, without the examples. 1. Be extensible 2. It should be possible to create a specific validator for each relation type. 3. Parsers should be able to recognise an unknown relation type as a relation subclass (rather then simply an unknown element) Hope this helps a bit. Freek

Hi Freek; On 8/16/11 9:20 AM, thus spake Freek Dijkstra:
Maybe it helps to put my requirements for the "relation" XML syntax on the table again, without the examples.
1. Be extensible
Defining a base namespace, and using extension namesapes is the most straightforward way to accomplish this. See the 5+ years of examples from perfsSONAR/NMC if you need examples of how these work in practice.
2. It should be possible to create a specific validator for each relation type.
See above.
3. Parsers should be able to recognise an unknown relation type as a relation subclass (rather then simply an unknown element)
Parsers are not able to do this alone. Parsers are stupid. They simply walk the instance tree, and if they are validating, compare the extracted results vs the schema/syntactic definition that they are supplied. It is a mistake to assume that a parser by itself is capable of rich semantic interpretation. If you really want to go this route, you can make a beautifully complex schema where all choices and meanings are explicitly stated. This schema would apply to your single use case, and it would be absolute hell to extend to others. This is a decision that perfsSONAR/NMC made a long time ago with regards to the data format and message exchange - extensible vs complete. I believe that the extension mechanisms from namespaces will get you both, but there needs to be a cut point established regarding what belongs in the base and what belongs in extended namespaces. Thanks; -jason

Hi Freek; On 8/16/11 9:06 AM, thus spake Freek Dijkstra:
Jason Zurawski wrote:
So far, we have seen these two proposals:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nmlserialcompound:relation> </nml:link>
I think you mean this instead:
<nml:link id="urn:ogf:network:example.net:link_A-to-C"> <nml:relation type="serialcompound"> ... </nml:relation> </nml:link>
Yes, you are right of course -- thanks for catching that.
I agree with Roman, the use of the 'relations' element is really not necessary here.
I am still not clear why you believe this element is necessary. It is a 'grouping' concept from what I can tell, but this does not add any inheritance into the sub elements except that of parent/child. The concept of namepsaces gives you the inheritance that I think you want.
The base element (<nml:relation type="serialcompound">) has the problem that it is hard to create a meaningful syntax validator.
What syntax do you need to validate that would be drastically different than 'nml:relation'. If you can cite examples, we would all be happy to read and critique them.
The subelement (<nmlserialcompound:relation>) has the problem all parser would need to know about the nmlserialcompound schema in advance, which hinders extensibility. (*)
See my prior mail - if you rely only on syntactic validation yes, it would be rejected. If you have a rich set of semantic validation, you can insert rules that help you in this regard. E.g. the semantic rules can be constructed so that the hierarchy is well defined, and the service knows that 'nmlserialcompound:relation' is just a specific form of 'nml:relation', thus some of the meaning can be extracted. So far it appears that you are trying to get all of the benefits of semantic validation directly into the schema level. I will point out that this absolutely destroys extensibility. If the goal is to make a schema that can be extended into other use cases beyond what our tiny minds can imagine, we need to be sure that we are not locked in to a schema that has too many rules that would hinder the ability to alter it for other use cases.
The subelement with extra parent element (<nml:relations><nmlserialcompound:relation>) does not have either of these two problems.
I still do not understand how you are able to draw the conclusion that by adding in one more element, you eliminate all of the ills you note above. You do not defeat needing to be aware of the schema beforehand, and you do not add a meaningful semantic rule into the schema. This really just seems like the addition of a new element to add an illusion of semantic meaning. Thanks; -jason
Which of these 3 statements do you disagree with?
Freek
(*) I'm aware about some subtleties regarding my statement on the disadvantage of<nmlserialcompound:relation> -- a parser may still ignore it, not knowing it is a relation subclass -- I'm most happy to elaborate on that if you think that<nmlserialcompound:relation> is a better option than<nml:relations><nmlserialcompound:relation>.
participants (3)
-
Freek Dijkstra
-
Jason Zurawski
-
Roman Łapacz