
Hi Freek; Answers inline: On 8/23/11 5:36 AM, thus spake Freek Dijkstra:
Jason Zurawski wrote:
Last week's mail conversation drifted from XML syntax for NML relations to the use of namespaces in NML messages.
An important difference in view was identified. Jason assumed that a single NML messages would only contain one namespace.
I never said nor implied this in any way
Sorry if you feel I jumped to conclusions. You indeed only wrote:
to my knowledge a parser can only verify against a single schema at any given time.
Perhaps we still need to take a few steps back.
Do you think that a NML messages may contain multiple namespaces?
Do you agree with the following requirement I wrote earlier: 1. Be extensible 2. It should be possible to create a specific validator for each relation type. 3. Parsers should be able to recognise an unknown relation type as a relation subclass (rather then simply an unknown element)
If you have time to phone today, that would be great.
You are conflating several concepts, and using them interchangeably. I believe this is what is bringing in confusion. To be clear, I am going to ask once again that you please (*please*) attempt to read some of the prior art from NMC/perfSONAR. The reason I keep bringing this up is two fold: a) the examples are short, and easy to understand. Instead of going around and around on email we could make up a lot of ground starting from known examples. b) it is working in practice today, and mimics the needs of NML in the extensibility space Consider this "schema file": https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/sch... It represents the construction of one type of message (e.g. the "SetupDataRequest" message, specifically for utilization data). Note some interesting things about it: - It represents a single 'schema', e.g. it is one file that contains the definitions to verify one specific message type only. - It incorporates several other 'schema' definitions through the methods of inclusion (e.g. 'include xxx { ... }' ) - It features *several* namespaces, and elements in this same 'schema' file (or the other files) may use these namespaces - Example instances that can be verified against this schema can be found here: https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/req... https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/req... https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/req... https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/req... https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/req... To address your concerns above: a parser, and when I say parser I am imagining something like libxml, is allowed to verify an instance against one "schema file" at at time. This schema file may feature 'includes', thus expanding the available definition space, but there are not options (at least in my experience) that allow the programmer to give the parser some set of files and let the parser know that *"any"* of the possible files may contain the correct definition. In my opinion this would really defeat the purpose of syntactic checking if there were multiple options given. If we are going to play the 'cut and paste' game using prior statements, here is the entire context of what I said regarding this topic, so that you can see that this is what I said before as well:
On 8/16/11 4:54 PM, thus spake Jason Zurawski: [snip]
to my knowledge a parser can only verify against a single schema at any given time. To my knowledge it is possible for a parser to validate against multiple schema at the same time. In my experience (libxml, some older Java libraries) a single schema is loaded into the parser. It is possible to reference schema from each other, e.g. in relax:
include "something.rnc" { # include things ... } Trying to validate the same instance against different schemata simultaneously does not seem like a very fruitful exercise for a parser, unless there are multiple parsing passes being applied. If the latter is true, I would argue that more time is being spent in syntax checking than in the real guts of semantic evaluation.
To address your final concerns:
1. Be extensible
Yes, and the methods of NMC/perfSONAR we have been talking about all along enable this.
2. It should be possible to create a specific validator for each relation type.
Schema is schema, you can construct whatever type of validation system you wish to implement. I would question how far you would want to take this exercise because there are tradeoffs that sacrifice other desirable qualities. My statement from prior conversation still stands - if you wish to do strict syntactic validation, to the point of trying to use the parser as a semantic analyzer as well, you give up a portion of #1; this is the tradeoff that must be considered. For example: a) <relation type="something"> <link /> <link /> </relation> vs. b) <somethingrelation> <link /> <link /> </somethingrelation> vs c) <something:relation type="something"> <link /> <link /> </something:relation> I would argue that a) is our base, it is generic and minimal. It allows the construction of any number of relationship types that are required for most situations. Someone who needs something different/special, that cannot be done in the base, has 2 choices: b) or c). The b) option is the creation of a new element, something that *does not* derive from the base, and therefore cannot be cast into something different. This is not extension. For the simultaneous strict syntactic/semantic checking done by the parser alone, this allows someone to claim that the 'somethingrelation' is very much different than the 'relation', and perhaps this is what they need. The c) option is an extension namespace of the a) element. There is the opportunity to try and downcast this into the original element and the ability to add 'new' things that were not thought of in the base. Syntactic checking has the ability to add *some* semantics in this case, perhaps not as much as b). This is much more extensible, and I would claim desirable, for NML. It is what is used in NMC/pS today.
3. Parsers should be able to recognise an unknown relation type as a relation subclass (rather then simply an unknown element)
Every parser is different in this respect, and I am not going to be able to give you a concrete answer. This is the exact reason why perfSONAR does not do strict syntactic checking at the parser level, and favors the use of semantic checks in the service itself. Relying on a strict schema that mandates syntax does not foster extensibility. There are two outcomes when an 'unknown' element comes in: a) Strict syntactic checking in most cases will reject the entire instance without comment. E.g. you have constructed your schema, and the parser knows of some number of elements, each having a possible namespace (or namespaces, depending on how the schema is constructed). If an unknwon element comes in, many parsers will simply reject the entire document. Certain types of event driven parsers may be able to panic parse around something like this, but I do not have much experience with them. I would estimate more time will be spent constructing a special parser in this case just to work with the strict schema than is healthy. b) Semantic checking, what we have the most experience with, takes all documents as is, does some combination of syntactic/semantic checking within the service itself, and can be made as permissive as required for certain situations. E.g. an unknown namespace on a common element (e.g. relation) can be rejected, or it can be downcast into the base schema - we normally do the latter). Hope this all helps; -jason