choosing XML Document structure for GLUE 2.0 rendering

Dear all, I've set up a poll to know your preference for the GLUE 2.0 structure as XML document. Please, express your opinion by December 12: http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t Cheers, Sergio -- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

Sergio Andreozzi ha scritto:
I've set up a poll to know your preference for the GLUE 2.0 structure as XML document. Please, express your opinion by December 12:
http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t
this is a gentle reminder for the voting about XML document structure. Please, express your opinion by December, 12. If you choose for option 3. or 6. you are invited to send your alternative as well. Cheers, Sergio -- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

Sergio, There is 3rd option, similar to B but simpler, where the AdminDomainID appears directly in each Service: <Grid> <AdminDomain> <ID>A</ID> <Name>...</Name> ... </AdminDomain> <Service> <ID>B</ID> <AdminDomainID>A<AdminDomainID> <Name>..</Name> <Endpoint> <ID>F</ID> </Endpoint> </Service> <Service> <ID>C</ID> <AdminDomainID>A<AdminDomainID> <Name>..</Name> <Endpoint> <ID>E</ID> </Endpoint> </Service> </Grid> Could this be considered? Regards, JP Navarro On Dec 10, 2007, at 10:43 PM, Sergio Andreozzi wrote:
Sergio Andreozzi ha scritto:
I've set up a poll to know your preference for the GLUE 2.0 structure as XML document. Please, express your opinion by December 12:
http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t
this is a gentle reminder for the voting about XML document structure. Please, express your opinion by December, 12. If you choose for option 3. or 6. you are invited to send your alternative as well.
Cheers, Sergio
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg

Hi JP, JP Navarro ha scritto:
Sergio,
There is 3rd option, similar to B but simpler, where the AdminDomainID appears directly in each Service:
<Grid> <AdminDomain> <ID>A</ID> <Name>...</Name> ... </AdminDomain> <Service> <ID>B</ID> <AdminDomainID>A<AdminDomainID> <Name>..</Name> <Endpoint> <ID>F</ID> </Endpoint> </Service> <Service> <ID>C</ID> <AdminDomainID>A<AdminDomainID> <Name>..</Name> <Endpoint> <ID>E</ID> </Endpoint> </Service> </Grid>
Could this be considered?
sure we can consider this. What is the advantage of this vs. the other options that you like? And also, can I ask you to investigate what is the most MSD4-friendly aggregation level option? What does happen in case of aggregation-level strategies named A/B/yours? (http://forge.ogf.org/sf/wiki/do/viewPage/projects.glue-wg/wiki/GLUE2XMLSchem...) Cheers, Sergio
Regards,
JP Navarro
On Dec 10, 2007, at 10:43 PM, Sergio Andreozzi wrote:
Sergio Andreozzi ha scritto:
I've set up a poll to know your preference for the GLUE 2.0 structure as XML document. Please, express your opinion by December 12:
http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t
this is a gentle reminder for the voting about XML document structure. Please, express your opinion by December, 12. If you choose for option 3. or 6. you are invited to send your alternative as well.
Cheers, Sergio
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

On Dec 11, 2007, at 6:18 PM, Sergio Andreozzi wrote:
sure we can consider this. What is the advantage of this vs. the other options that you like?
It is simpler and less complex for one entity to reference another, than to create a 3rd entity that describes the relationship. Creating a 3rd entity that describes a relationship between two other entities makes the most sense when you can't alter the relating entities (because they're already defined or someone else owns them), or they are generally unrelated except in a special case of limited interest. If the intention is that Services should generally be associated to AdminDomains it's more straightforward to describe this as service attributes.
And also, can I ask you to investigate what is the most MSD4- friendly aggregation level option? What does happen in case of aggregation-level strategies named A/B/yours? (http://forge.ogf.org/sf/wiki/do/viewPage/projects.glue-wg/wiki/ GLUE2XMLSchema)
MDS4-friendlyness wasn't a factor in the above suggestion. I believe MDS4 is neutral about how entities relate to each other. Regards, JP

Hi Sergio, rest-of-list, On Tuesday 11 December 2007 06:43:49 Sergio Andreozzi wrote:
http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t
this is a gentle reminder for the voting about XML document structure. Please, express your opinion by December, 12. If you choose for option 3. or 6. you are invited to send your alternative as well.
Sorry I'm new to this discussion, but the current proposals don't make much sense to me. Looking at the discussion[1] further confuses me. At the risk of disrupting this process, I'd like to ask some questions... [1] http://forge.ogf.org/sf/wiki/do/viewPage/projects.glue-wg/wiki/GLUE2XMLSchem... First, I see that one of the rules is ID is an element. Why is this? It seems we're reinventing the wheel here: XML already defines the concept of an ID (see [2] and [3]), which is used in related standards ([4], [5], etc...). Why not just use this? [2] http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-attribute-types [3] http://www.w3.org/TR/2005/REC-xml-id-20050909 [4] http://www.w3.org/TR/1999/REC-xslt-19991116 [5] http://www.w3.org/TR/1999/REC-xpath-19991116 Is the plan to render (nearly) everything as elements rather than attributes? GLUE has many items have "required" (1) or "optional" (0..1) cardinality and contain no further markup, so I feel they would, for the most part, be better rendered as an XML attributes. Is the proposed GLUE/XML intended to be used by services when they publish information about themselves for site-level aggregate? If so, the current proposal (the rule that One-to-Many relationships are represented as parent-child) looks as if a Service must know site-level information (such as AdminDomain). This is undesirable for data normalisation. As an alternative, suppose One-to-Many relationships be represented as either an XML element hierarchy or (for top-level elements, only) as an attribute ("parent", say) that has type URI and contains the URI of the containing element's ID. A service could publish its information and only have to know the parent element's URI. Finally (just as a general comment) my impression is that there is too great an emphasis on XML Schema; because of this, the GLUE/XML rendering appears hampered by limitations of XSD and the rules are designed as if the XML is to fit what XSD supports (e.g., the "extensible enumerations" section). If so, I feel this is "putting the cart before the horse": I feel the XML should convey precise and compact representation of the schema, whilst being easy to parse and comprehend. "Hacks" to support extensibility in the XSD, like <State> vs <RunningState>, obfuscate the XML in favour of XML passing XSD validation checks. (I'm in favour of providing a validation mechanism, but does the validation needs to be strong? If it's a choice between having a simple XML design that can only be validated weakly via XSD or a complex XML that can be strongly validated, I'd perfer the former.) Cheers, Paul.

Hello Paul, Paul Millar ha scritto:
Hi Sergio, rest-of-list,
On Tuesday 11 December 2007 06:43:49 Sergio Andreozzi wrote:
http://www.doodle.ch/participation.html?pollId=sg4v8qvy3h4h6d9t
this is a gentle reminder for the voting about XML document structure. Please, express your opinion by December, 12. If you choose for option 3. or 6. you are invited to send your alternative as well.
Sorry I'm new to this discussion, but the current proposals don't make much sense to me. Looking at the discussion[1] further confuses me. At the risk of disrupting this process, I'd like to ask some questions...
[1] http://forge.ogf.org/sf/wiki/do/viewPage/projects.glue-wg/wiki/GLUE2XMLSchem...
First, I see that one of the rules is ID is an element. Why is this? It seems we're reinventing the wheel here: XML already defines the concept of an ID (see [2] and [3]), which is used in related standards ([4], [5], etc...). Why not just use this?
[2] http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-attribute-types [3] http://www.w3.org/TR/2005/REC-xml-id-20050909 [4] http://www.w3.org/TR/1999/REC-xslt-19991116 [5] http://www.w3.org/TR/1999/REC-xpath-19991116
the previous version of the XML rendering proposal had the ID as attribute, then after a discussion in the last telecon we agreed to change it to element. I actually do not have a strong opinion on this. As regards your references, the most interesting to me is [3]. I would say that we are not reinventing the wheel because we are doing something different. [3] defines a way to attach a unique ID (unique within an XML document) to an XML element. We are defining a property of a Grid concept (ID) which is supposed to be globally unique and is a URI.From a semantical viewpoint they are different. They sit in different namespaces, therefore there should be no problem for that (if you see problems, please let me know).
Is the plan to render (nearly) everything as elements rather than attributes?
in the last telecon, we agreed that we'll use attributes only for metadata-like properties (basically CreationTime and Validity, see Sec. 4.1 of the spec), while all the rest will be mapped to XML elements.
GLUE has many items have "required" (1) or "optional" (0..1) cardinality and contain no further markup, so I feel they would, for the most part, be better rendered as an XML attributes.
given my experience, this choice is mainly a matter of style. Attributes can be only of simple types and single-value. Going for elements gives more flexibility for future changes and also is probably more usable (people don't have to remember which properties are single value, i.e. attributes or multi-value. i.e. elements when writing queries).
Is the proposed GLUE/XML intended to be used by services when they publish information about themselves for site-level aggregate? If so, the current proposal (the rule that One-to-Many relationships are represented as parent-child) looks as if a Service must know site-level information (such as AdminDomain). This is undesirable for data normalisation.
the proposal is intended to be used by both primary services (e.g., OGSA-BES, SRM) which want to advertise their characteristics and by information services (both primary publishers and aggregators). For primary services, the only constraint is to know the ID of their AdminDomain. That's all. They are not supposed to publish other AdminDomain attributes. The AdminDomain ID will be used to perform the aggregation at the higher-level. The reason for which I prefer Option A is because it looks easier to make queries by AdminDomain (no need for join). And at the aggregation level, you have all info under a certain AdminDomain aggregated under a single element. I don't know how MDS 4 performs aggregations at higher level and if this is compatible with its strategies. This is something to be investigated.
As an alternative, suppose One-to-Many relationships be represented as either an XML element hierarchy or (for top-level elements, only) as an attribute ("parent", say) that has type URI and contains the URI of the containing element's ID. A service could publish its information and only have to know the parent element's URI.
yep, this is an option as well. Many options are available. Probably, we should make one step back and clarify what we want to optimize. In my opinion, we should concetrate on giving the final user the easiest and more intuitive way to query the properties. For sure, we need more experience on this with a number of queries to be written for different approaches. One advantage that I like of option A. is that a query would remain valid if you query either the primary source of information or the aggregated layer. Consider this for instance. A simple XPath to ask for a service which type is org.glite.wms part of a certain adminDomain: /glue:Grid/AdminDomain[ID='urn:admindomain:t1.infn.it']/Service[Type='org.glite.wms'] this query works both at the primary source level and aggregated level and is also quite simple to me. Of course, we need a larger set of queries to be used for evaluation.
Finally (just as a general comment) my impression is that there is too great an emphasis on XML Schema; because of this, the GLUE/XML rendering appears hampered by limitations of XSD and the rules are designed as if the XML is to fit what XSD supports (e.g., the "extensible enumerations" section). If so, I feel this is "putting the cart before the horse": I feel the XML should convey precise and compact representation of the schema, whilst being easy to parse and comprehend. "Hacks" to support extensibility in the XSD, like <State> vs <RunningState>, obfuscate the XML in favour of XML passing XSD validation checks.
we are trying to find the right balance and mainly preserving easy of use. In the rules, I mentioned the option of SubstitutionGroups for completeness, but this is not the current selected option. At the moment, we prefer to go for the annotation option
(I'm in favour of providing a validation mechanism, but does the validation needs to be strong? If it's a choice between having a simple XML design that can only be validated weakly via XSD or a complex XML that can be strongly validated, I'd perfer the former.)
yep, me too. Thanks for your constructive feedback. I hope we can dedicate one more call before XMas to XML rendering so that we can refine all these choices and align about the rationale behind them. Please, keep contributing as opinion from different perspectives help us to make better choices. Cheers, Sergio
Cheers,
Paul.
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

Hi Sergio, I've interleaved my comments below. For the most part, the comments are mildly in favour of using xml:id; but I'm concerned that the primary information XML will be unnecessarily hard for the information providers. On Wednesday 12 December 2007 01:11:15 Sergio Andreozzi wrote:
Paul Millar ha scritto:
First, I see that one of the rules is ID is an element. [....] [3] http://www.w3.org/TR/2005/REC-xml-id-20050909
the previous version of the XML rendering proposal had the ID as attribute, then after a discussion in the last telecon we agreed to change it to element. I actually do not have a strong opinion on this.
Yes, I too don't feel this is a big issue. I think there's an opportunity to use an existing standard. There might get some leverage if GLUE/XML uses the attribute-based xml:id ID, but it's certainly not essential.
As regards your references, the most interesting to me is [3]. I would say that we are not reinventing the wheel because we are doing something different. [3] defines a way to attach a unique ID (unique within an XML document) to an XML element.
True, although I think the emphasis with xml:id is providing a unique reference point in a schema-type invariant way. GLUE could define an attribute within its namespace (via XSD, as glue:ID, for example). But, by using xml:id, a XML parser (that supports xml:id) can infer the attribute has type unique-ID without having to understand the definition in the DTD / XSD / RelaxNG etc. Because of this, simple (non-validating) parsers can still identify xml:id as a "global document identifier" and treat it accordingly. The benefit for us is we don't have to provide DTD and XSD and RelaxNG and ... for an XML parser to understand what xml:id "means". The GLUE/XML implementation(s) may choose to provide these, but it's optional.
We are defining a property of a Grid concept (ID) which is supposed to be globally unique and is a URI.From a semantical viewpoint they are different. They sit in different namespaces, therefore there should be no problem for that (if you see problems, please let me know).
I'm not sure I completely follow you here. The two are separate namespaces, but have similar properties. So, isn't mapping the GLUE ID as xml:id a choice GLUE/XML is free to make? The GLUE Schema's ID attribute ("GLUE-ID" to prevent confusion) is a globally unique URI: a unique "name" within any aggregation of valid GLUE. (Presumably it's a URI to allow easy delegation of the namespace within a distributed community.) The ID attribute is a required value for certain GLUE components (Service, UserDomian, AdminDomain, ...) The current GLUE/XML mapping (as far as I understand it) provides an XML element for major GLUE grid component; in particular, those components that require a GLUE-ID are represented as XML elements. The XML attribute xml:id describes a globally unique string: a unique "name" within any aggregation of valid XML. So, one can injectively map GLUE-ID into xml:id; i.e., any valid GLUE-ID can be written as a unique, valid xml:id. Whilst there is no requirement for GLUE/XML to use xml:id (as you say, the two are separate), there's also no reason not to. GLUE/XML mapping is free to define that xml:id is to be used or (as currently) to use a schema-specific declaration: the ID element. Here is a list of the advantages and disadvantages I could think of: Advantages of using xml:id o it's the W3C recommended way of doing "this sort of thing." o ID-like semantics are built into parsers that support xml:id (which might not support more general validation), o potential "reuse" of GLUE-ID with other XML software and standards, o There is not GLUE-specific behavior when combining different GLUE XML files: no need to hard-coded the value or derive behavior from some DTD/XSD/... o ..others? .. Disadvantages of using xml:id: o the mapping between GLUE-ID and xml:id is no surjective: there are valid xml:id values that are not valid GLUE-ID values (does this matter?) o xml:id is an attribute rather than an element. o some issues with Canonical XML (although xml:id considers xml-c14n to be broken in this and some other respects) o .. others? ..
Is the plan to render (nearly) everything as elements rather than attributes?
in the last telecon, we agreed that we'll use attributes only for metadata-like properties (basically CreationTime and Validity, see Sec. 4.1 of the spec), while all the rest will be mapped to XML elements.
[Maybe section 4.2 ("metadata"), rather than 4.1.]
GLUE has many items have "required" (1) or "optional" (0..1) cardinality and contain no further markup, so I feel they would, for the most part, be better rendered as an XML attributes.
given my experience, this choice is mainly a matter of style. Attributes can be only of simple types and single-value. Going for elements gives more flexibility for future changes and also is probably more usable (people don't have to remember which properties are single value, i.e. attributes or multi-value. i.e. elements when writing queries).
Sure, this isn't a big deal and is largely a matter of style. Always using elements does tend to inflate the document size, which may matter when providing a large amount of information. There are some GLUE attributes that could probably be rendered as XML attributes, but it's no big deal.
[Problem with primary producer having to know too much]
the proposal is intended to be used by both primary services (e.g., OGSA-BES, SRM) which want to advertise their characteristics and by information services (both primary publishers and aggregators). For primary services, the only constraint is to know the ID of their AdminDomain. That's all. They are not supposed to publish other AdminDomain attributes.
OK, but the example primary document "A" (P.A option, when voting) contained more information that this: it showed a complete hierarchy, as if the service were alone in the Grid.
The AdminDomain ID will be used to perform the aggregation at the higher-level.
The reason for which I prefer Option A is because it looks easier to make queries by AdminDomain (no need for join). And at the aggregation level, you have all info under a certain AdminDomain aggregated under a single element.
N.B. Here, I'm referring to my option P.O [4], where the primary information is presented as a sub-tree of the full GLUE/XML. This is analogous to how DocBook provides aggregation where files may (individually) contain a Book (or Article), Part, Chapter, and so on. Aggregation happens through "other means" (with DocBook this is typically via XInclude, with the toy example [4] it is included in the XSLT) [4] http://www.ogf.org/pipermail/glue-wg/2007-December/000249.html I'm not sure I follow how it is easier to make queries: the queries (against the complete, aggregated GLUE/XML infoset) are just as easy. However, the problem I see with this is that if the storage-element were to provide information that is directly queryable (with identical queries as the final GLUE/XML) is the info-provider will needs to know its ancestor hierarchy (parent, parent's parent, etc); specifically, how many domains (and of what type) are "above" it. For example, suppose a Tier-2 site has three AdminDomains within their combined Domain, the final (aggregated) published XML would look like: <Grid> <Domain> <Name>SCOTGRID</Name> <Description>Scotland's distributed grid site</Description> <!-- Further Domain-level information here --> <AdminDomain> <Name>SCOTGRID-GLA</Name> <Description>The ScotGrid site at University of Glasgow</Description> <!-- Further AdminDomain-level information here --> <StorageService> <!-- Further StorageService information here --> <StorageResource> <ID>glue://gla.scotgrid.ac.uk/SE</ID> <Name>ScotGrid-GLA DPM instance</Name> <ImplementationName>DPM</ImplementationName> <!-- ...etc... --> </StorageResource> </StorageService> </AdminDomain> </Domain> </Grid> So, if I've understood the primary information "A" option (P.A.) correctly, the storage service would publish XML like: <Grid> <Domain> <AdminDomain> <StorageService> <!-- Further StorageService information here --> <StorageResource> <ID>glue://gla.scotgrid.ac.uk/SE</ID> <Name>ScotGrid-GLA DPM instance</Name> <ImplementationName>DPM</ImplementationName> </StorageResource> </StorageService> </AdminDomain> </Domain> </Grid> What's bad here is that the info-provider must know its hierarchy: that it inside an AdminDomain, within inside a Domain. This is ugly; it should not need to know this! In contrast, a Tier-1 site might have no containing Domain. A storage service must then publish information like: <Grid> <AdminDomain> <StorageService> <!-- Storage Service info here --> </StorageService> </AdminDomain> </Grid> An alternative (option P.O, see [4]) allows services to provide only the information they know (by directly examining the software) and a hint (the "site-level" GLUE-ID), this can be avoided. In fact the "parent" back-link isn't needed: it just makes configuring the site-level aggregation a little easier. One could configure parent-child links explicitly (e.g. Services within AdminDomains) and avoid having to specify the Parent within the child. To me, this makes much more sense: each service is (genuinely) providing only the information it knows. Admin sites would aggregate (as with site-level BDIIs currently) and Domains then aggregate from multiple AdminSites, as necessary.
I don't know how MDS 4 performs aggregations at higher level and if this is compatible with its strategies. This is something to be investigated.
Yes, it would be interested to compare: I don't know too much about MDS-4
As an alternative, suppose One-to-Many relationships be represented as either an XML element hierarchy [...]
yep, this is an option as well. Many options are available. Probably, we should make one step back and clarify what we want to optimize. In my opinion, we should concetrate on giving the final user the easiest and more intuitive way to query the properties.
OK. I've two additional (friendly) amendments: a. adjust this to: "[easiest and most intuitive way to query] the final, aggregated GLUE/XML Schema." b. also add: "make it easy for components to provide the necessary information."
For sure, we need more experience on this with a number of queries to be written for different approaches. One advantage that I like of option A. is that a query would remain valid if you query either the primary source of information or the aggregated layer.
Whilst I agree this would be nice, do we have a use-case for users querying the primary source of information directly? I skimmed through the use-case document and searched for keywords ("primary", "source", "provider", etc..), but couldn't find any requirement for end-users to query information providers directly. Given the flexible hierarchy by (potentially) nesting an AdminDomain within multiple Domains, this could be difficult to achieve without requiring that primary sources of information know something of the global structure.
Consider this for instance. A simple XPath to ask for a service which type is org.glite.wms part of a certain adminDomain:
/glue:Grid/AdminDomain[ID='urn:admindomain:t1.infn.it']/Service[Type='org.g lite.wms']
[sorry, v. minor point: assuming GLUE provides an XML-namespace, wouldn't the query have to specify the namespace-uri at each level? /glue:Grid/glue:AdminDomain[glue:ID='urn:...']/glue:Service[glue:Type=... ]
this query works both at the primary source level and aggregated level and is also quite simple to me.
Again, do we really need to provide a service where end-users can query the information provided by the primary sources in an identical fashion to the complete (aggregated) resource? I understand it would be nice (mostly for debugging reasons), but I don't see how this can be done without *every* primary info-provider within a Grid knowing (at least something of) the grid structure, in order to provide the correct XML documents. I feel this would be quite an inflexible solution.
Of course, we need a larger set of queries to be used for evaluation.
I suspect that XPath will be sufficient to query the aggregated GLUE/XML: once you get your head around XPath, it's pretty intuitive and friendly.
[XML Schema balance...]
we are trying to find the right balance and mainly preserving easy of use. In the rules, I mentioned the option of SubstitutionGroups for completeness, but this is not the current selected option. At the moment, we prefer to go for the annotation option
[snip: agreement on simple XML design over complicated, strongly validing design]
Thanks for your constructive feedback. I hope we can dedicate one more call before XMas to XML rendering so that we can refine all these choices and align about the rationale behind them. Please, keep contributing as opinion from different perspectives help us to make better choices.
I'll do my best! Cheers, Paul.
participants (3)
-
JP Navarro
-
Paul Millar
-
Sergio Andreozzi