
Hi JP, On Thursday 13 March 2008 15:52:56 JP Navarro wrote:
I'm less familiar with the XML terminology you use, but I would second your suggestion using commoner terminology: we should be able to independently publish subsets of the GLUE schema hierarchy. The ability to develop and run independent info-providers for subsets of information is a very useful design. Did I understand your proposal correctly?
More or less. What I would like to avoid is that storage service infoProviders publish information like: <Grid> <AdminDomain> <AdminDomain> <StorageService> <ImplementationName>foo</ImplementationName> <ImplementationVersion>1.0</ImplementationVersion> <!-- ...etc... --> </StorageService> </AdminDomain> </AdminDomain> </Grid> as this requires the SE publisher to know it's part of a distributed Tier-2 site (hence the two levels of AdminDomain elements). That said, the current XSD doesn't seem to support nested AdminDomain elements, which would be needed to describe distributed sites. An alternative would be for the SE to publish information like: <StorageService> <ImplementationName>foo</ImplementationName> <ImplementationVersion>1.0</ImplementationVersion> <!-- ...etc... --> </StorageService> and have the (site-level) aggregation happen at the site level, which would publish information like: <AdminDomain> <Name>Example Site</Name> <Services> <ComputingService> <!-- CE information goes here --> </ComputingService> <StorageService> <ImplementationName>foo</ImplementationName> <ImplementationVersion>1.0</ImplementationVersion> <!-- ...etc... --> </StorageService> </Services> </AdminDomain> The final aggregation would encapsulate multiple sites within a Grid element. <Grid> <AdminDomain> <!-- Site 1 info --> </AdminDomain> <AdminDomain> <!-- Site 2 info --> </AdminDomain> </Grid> The disadvantage of this approach is one cannot query the primary SE info (the XML provided by the SE info provider) with exactly the same query one would use when querying top-level aggregation. For example, to extract all StorageEndpoints for site Example Site, one could use the XPath: /Grid/**/AdminDomain[Name='Example Site']/Services/StorageService/StorageEndpoint Something like: <xsl:styleshet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:variable name="href" select="http://glue.example.org/grid-glue"/> <xsl:variable name="site" select = "Example Site"/> <xsl:template match="/"> <xsl:copy-of select="document($href)/Grid/**/AdminDomain[Name=$site]/Services/StorageService/StorageEndpoint"/> </xsl:template> </xsl:stylesheet> But, if one substituted the URI of the primary information (via the href variable), this particular query wouldn't work: the primary XML would not have the Grid and all AdminDomain elements. I don't think this is a big deal, though: it makes sense that that query should return no replies when querying the SE info-provider directly, and there are other queries that would work (e.g., select all StorageEndpoints) The advantage to publish with StorageService as the top-level element is that the SE info-provider need know nothing about the above Glue hierarchy. This (should) simplify the info-provider and, at the same time, allow the same information to be (easily) published under different GLUE hierarchies. For example, if a site is a member of more than one Grid. To me, this advantage outweighs the disadvantage.
One question this raises is how one binds or links these separately published subset documents to each other? Would we need to introduce attributes in each subset that binds it to other related subsets?
I believe that, currently, how the documents are merged isn't defined. One approach is to use XSLT to do the merging. There's a (working) toy implementation that demonstrates that here: http://www.ogf.org/pipermail/glue-wg/2007-December/000249.html HTH, Paul.