
Hi Freek; Answers inline. If you need a faster reference to RELAX, consider reading the online documentation: http://books.xmlschemata.org/relaxng/page2.html On 6/20/11 11:36 AM, thus spake Freek Dijkstra:
Hi,
Today, I was trying to create and improve an example topology file based on the RNC schema.
Unfortunately, the current RNC schemata do not validate when used with a stricter parser. We tried last week with Jing-Trang, and that gave no errors. Today, I tried with http://validator.nu/ and got a few more errors.
Look into using MSV (works for many different schema languages): http://msv.java.net/ We use this along with Trang/Jing. I have never used the website you speak of, so can't comment on if its useful or not. Typically I have found that its best to use Trang to convert the RNC schema into different forms (RNG, and then XSD) and then use one of the other schema languages for instance verfication. I believe the workflow looks like this: Trang -> RNC to RNG Trang -> RNG to XSD MSV -> validate XML against RNG or XSD MSV -> validate XML against RNG or XSD Validating against the RNC can sometimes produce ambiguous parse errors for some of the items you note below (e.g. anyElement); converting can strengthen the meaning of the schema to remove ambiguous paths in the grammar.
Could someone answer my noob questions on RNC? (Either on-list or off-list).
1) What is the difference between Lifetime = element lifetime { StartTime, (EndTime | Duration)? } and Lifetime = element lifetime { StartTime & (EndTime | Duration)? } and which one should I use? The goal is a lifetime element with a start element (defined in the StartTime rule) and optionally an end OR duration element (respectively defined in the EndTime and Duration rules).
& = joining things, and not caring about the order. , = joining things and enforcing ordering. In your 1st example above you would only be able to do: <lifetime> <startTime></startTime> <endTime></endTime> </lifetime> The second would allow something like this (wherein the first would view this as out of order): <lifetime> <endTime></endTime> <startTime></startTime> </lifetime> In our experience we try to avoid the use of the comma when we can, enforcing ordering in an XML document places a lot of emphasis on the tools (or humans) creating the XML exactly in the order the schema mandates instead of allowing the XML to be 'structured' via nesting elements, and not caring about the specific order that sibling elements would appear. My personal opinion would be to use '&' always, and avoid the ordering attempt since I still do not believe a 'list' element is required.
2) I read on http://relaxng.org/compact-tutorial-20030326.html that the order is relevant in RNC. Thus <location><latitude>51.5155</latitude><longitude>-0.0922</longitude></location> is different from <location><longitude>-0.0922</longitude><latitude>51.5155</latitude></location>. Is there a way to specify in the RNC schema that this order is irrelevant in the XML?
See above, the use of & and ,
3) The NML Group is -by it's current definition- recursive: A group is a NML NetworkObject, and a Group can contain NML NetworkObjects, thus including other groups. I have a problem with such recursive definitions in RNC. At least the validator complains about patterns defined later on in the document. Can't I do that, or am I just doing something wrong (I'm happy to provide offlist the URLs of RNC schema and example topology file I'm currently working on, so you can see the errors for yourself)
I dont understand this question/problem. Is this a problem of not being able to validate something or is this just a perception problem where a recursive definition personally bothers you? Messages from the parser (and which parser is being used/how it was invoked) would be helpful.
4) In the current RNC schema, extensibility was ensured using the "anyElement" rule. E.g. BasePortContent = NetworkObject & element capacity { xsd:float }? & anyElement* Unfortunately, the validator complained about this.
Was it a 'warning' or an 'error', both have different implications. For example, a common warning that we have seen is 'choice between attributes and children cannot be represented; approximating' is caused by the use of anyElement frequently. This will sometimes result in an ambiguous XSD being generated, but it can still be used to validate instance documents.
When checking a document, it is unclear if a "location" element should be parsed according the second rule (element capacity { xsd:float }?) or third rule (anyElement*). When reading about this, it was suggested to remove the anyElement* from the BasePortContent, since it is possible to still add new allowed element in the following method: BasePortContent = NetworkObject & element capacity { xsd:float }? # later extension: BasePortContent&= element my_extension { xsd:string }?
This is one of the dangers/benefits to using anyElement. In practice define it as 'late' as possible in any rulset and the parser is smart enough to choose the longest match (e.g. location) first. Without seeing what you have done, in terms of calling the tools/displaying error messages I won't be able to comment further. Thanks; -jason
I have some more questions, but these were the most important ones. If some RNC expert could help me out or point me in the right way, GREAT!
Freek