
David Horat from CERN will start looking at the LDAP schema rendering. Does anyone have any comment/suggestions before he gets started? Laurence

Thanks Laurence and hello to everyone! After taking a deep look at Glue 2.0 official specification and the last LDAP schemas, I have noticed that there are several differences between both of them and they do not reflect the specifications entirely, thus I have decided to start them from scratch. After some research on how to deal with LDAP schemas, how was Glue 1.3 done, several suggestions from Laurence and some ideas I already have, my first draft will be as follow: - Each class in the specifications will have its own schema file, which is much better for development purposes. - One bash/python script will compile them all to create a single schema file, which is much better for production purposes. - I will create one ldif file per object as an example and to use them as unit tests. - All LDAP objects and attributes will be prefixed with the string 'Glue' to ensure some backwards compatibility and make it easier to change the current applications. This can be changed in the future. - I will represent the specification in the schemas as closer as possible to be much easier if changes want to be made or questions arise (as they will). - Thus, I will follow the exact inheritance done in the specifications. This inheritances will be done at the object level, so we don't need to provide several objectClass when we create a new object. I read somewhere that this was better because we ensure inheritance from the specifications, although it can be changed in future versions. - Since our specifications is relational and LDAP is NOT relational, I am still unsure how to implement this properly in LDAP. I am used to implement this in SQL but not in LDAP. I have read that, simply, you shouldn't try to make something relational into LDAP [1]. So if any one have suggestions here, I will be glad to hear them. - Very few attribute types of the specification can be mapped directly to LDAP types, so I will just follow the ones used in the last schema definitions. LDAP is just not prepared for all the types that we use in the specifications and other integrity mechanisms should be used. - The only mandatory attribute that I will specify is the GlueEntityId (aka the 'key') and all the others will be optional as it is implemented in the last schemas. This can be changed in future versions. If you have any comments or suggestions, do not hesitate to do them. If you want to wait until the first draft is out, you may. I expect to have it done in one week, thus by next monday. IMPORTANT: To make it easier for all us to comment on it, I created a wiki page[2] where we can all discuss this issues. Please use it. Links: [1] http://mysqldump.azundris.com/archives/74-LDAP-is-not-relational.html [2] http://forge.gridforum.org/sf/wiki/do/viewPage/projects.glue-wg/wiki/GLUE2LD... PS: Laurence, remember your motivational stickers! ^_^ Regards, David On Fri, Mar 20, 2009 at 4:48 PM, Laurence Field <Laurence.Field@cern.ch>wrote:
David Horat from CERN will start looking at the LDAP schema rendering. Does anyone have any comment/suggestions before he gets started?
Laurence
-- David Horat Software Engineer specialized in Grid and Web technologies IT Department – Grid Deployment Group CERN – European Organization for Nuclear Research » Where the web was born http://davidhorat.com/ http://cern.ch/horat http://www.linkedin.com/in/davidhorat

Hola David, welcome to the club! Your plans make sense to me, but I hope others will comment on the technical aspects as needed (maybe next week after CHEP). One comment inline below.
After taking a deep look at Glue 2.0 official specification and the last LDAP schemas, I have noticed that there are several differences between both of them and they do not reflect the specifications entirely, thus I have decided to start them from scratch.
After some research on how to deal with LDAP schemas, how was Glue 1.3 done, several suggestions from Laurence and some ideas I already have, my first draft will be as follow: - Each class in the specifications will have its own schema file, which is much better for development purposes. - One bash/python script will compile them all to create a single schema file, which is much better for production purposes. - I will create one ldif file per object as an example and to use them as unit tests. - All LDAP objects and attributes will be prefixed with the string 'Glue' to ensure some backwards compatibility and make it easier to change the current applications. This can be changed in the future. - I will represent the specification in the schemas as closer as possible to be much easier if changes want to be made or questions arise (as they will). - Thus, I will follow the exact inheritance done in the specifications. This inheritances will be done at the object level, so we don't need to provide several objectClass when we create a new object. I read somewhere that this was better because we ensure inheritance from the specifications, although it can be changed in future versions. - Since our specifications is relational and LDAP is NOT relational, I am still unsure how to implement this properly in LDAP. I am used to
We probably will have to resort to "tricks" as used in GLUE 1.x. For example, an object can refer to a logical parent via a ChunkKey and to an object with a different ancestry through a ForeignKey. As we are still defining the LDAP rendering, we might invent further such notions as needed or desirable (e.g. for clarity or elegance).
implement this in SQL but not in LDAP. I have read that, simply, you shouldn't try to make something relational into LDAP [1]. So if any one have suggestions here, I will be glad to hear them. - Very few attribute types of the specification can be mapped directly to LDAP types, so I will just follow the ones used in the last schema definitions. LDAP is just not prepared for all the types that we use in the specifications and other integrity mechanisms should be used. - The only mandatory attribute that I will specify is the GlueEntityId (aka the 'key') and all the others will be optional as it is implemented in the last schemas. This can be changed in future versions.

On Wednesday 25 March 2009 16:57:41 Maarten Litmaath wrote:
- Since our specifications is relational and LDAP is NOT relational, I am still unsure how to implement this properly in LDAP. I am used to
We probably will have to resort to "tricks" as used in GLUE 1.x. For example, an object can refer to a logical parent via a ChunkKey and to an object with a different ancestry through a ForeignKey. As we are still defining the LDAP rendering, we might invent further such notions as needed or desirable (e.g. for clarity or elegance).
Please be aware that, for some time now, there is support for "extensible queries" (see RFC 4515). This removes the need for GlueChunkKey. For example, to query all GlueSAPath attributes (of objects with objectClass GlueSA) that are children of SE "se.ngcc.acad.bg" (GlueSEUniqueID=se.ngcc.acad.bg), one could conduct the following query, which uses GlueChunkKey: ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170/ -b o=grid '(&(objectClass=GlueSA)(GlueChunkKey=GlueSEUniqueID=se.ngcc.acad.bg))' GlueSAPath However, the same result is obtained without the ChunkKey by using the ":dn:=" extensible query: ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170/ -b o=grid '(&(objectClass=GlueSA)(GlueSEUniqueID:dn:=se.ngcc.acad.bg))' GlueSAPath IIRC, there was an argument for continuing to support GlueChunkKey in Glue v1.3 to allow legacy queries. Moreover, (afaik) there was never an official document describing the LDAP binding for GLUE v1.3 (so nothing to update ;-) A reason to stop using GlueChunkKey is it's provides the info-providers the opportunity to publish inconsistent information. Instead, the extensible query is always consistent (unless there's a bug in the server software). Since, GLUE v2.0 is not meant to be backward compatible, I'd say we can safely drop GlueChunkKey in favour of the safer ":dn:=" style queries. HTH, Paul.

Nice to know. I will make an entry about it in the wiki. :) On Mon, Mar 30, 2009 at 6:58 PM, Paul Millar <paul.millar@desy.de> wrote:
On Wednesday 25 March 2009 16:57:41 Maarten Litmaath wrote:
- Since our specifications is relational and LDAP is NOT relational, I am still unsure how to implement this properly in LDAP. I am used to
We probably will have to resort to "tricks" as used in GLUE 1.x. For example, an object can refer to a logical parent via a ChunkKey and to an object with a different ancestry through a ForeignKey. As we are still defining the LDAP rendering, we might invent further such notions as needed or desirable (e.g. for clarity or elegance).
Please be aware that, for some time now, there is support for "extensible queries" (see RFC 4515). This removes the need for GlueChunkKey.
For example, to query all GlueSAPath attributes (of objects with objectClass GlueSA) that are children of SE "se.ngcc.acad.bg" (GlueSEUniqueID=se.ngcc.acad.bg), one could conduct the following query, which uses GlueChunkKey:
ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170/ -b o=grid '(&(objectClass=GlueSA)(GlueChunkKey=GlueSEUniqueID=se.ngcc.acad.bg ))' GlueSAPath
However, the same result is obtained without the ChunkKey by using the ":dn:=" extensible query:
ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170/ -b o=grid '(&(objectClass=GlueSA)(GlueSEUniqueID:dn:=se.ngcc.acad.bg))' GlueSAPath
IIRC, there was an argument for continuing to support GlueChunkKey in Glue v1.3 to allow legacy queries. Moreover, (afaik) there was never an official document describing the LDAP binding for GLUE v1.3 (so nothing to update ;-)
A reason to stop using GlueChunkKey is it's provides the info-providers the opportunity to publish inconsistent information. Instead, the extensible query is always consistent (unless there's a bug in the server software).
Since, GLUE v2.0 is not meant to be backward compatible, I'd say we can safely drop GlueChunkKey in favour of the safer ":dn:=" style queries.
HTH,
Paul.
-- David Horat Software Engineer specialized in Grid and Web technologies IT Department – Grid Deployment Group CERN – European Organization for Nuclear Research » Where the web was born Phone +41 22 76 77996 http://davidhorat.com/ http://cern.ch/horat http://www.linkedin.com/in/davidhorat

glue-wg-bounces@ogf.org on behalf of Paul Millar said:
Please be aware that, for some time now, there is support for "extensible queries" (see RFC 4515). This removes the need for GlueChunkKey.
No it doesn't, as I've said several times. For one thing you can't do wildcard queries with the extensible query format, and for another it's often useful to query *for* the chunkey so you can extract the relevant ID, which is a lot easier than extracting it from the DN. Stephen (Typing this in Prague airport, so just a quick reply, I'll get back to this later in the week.) -- Scanned by iCritical.

Hi Stephen, Sorry, this one was delayed whilst I was away (helping run a training workshop, then Easter). On Tuesday 31 March 2009 17:46:47 Burke, S (Stephen) wrote:
[extensible filters instead of GlueChunkKey] For one thing you can't do wildcard queries with the extensible query format,
I spent a little bit of time investigating this. It's actually a complex, somewhat interesting and layered issue. Having read (only the relevant bits of ;-) RFC-4511 I couldn't find anything to suggests that one cannot use substring match rules in an extensible filter. However, from my reading of the RFC, it seems that the people writing LDAPv3 spec. didn't really consider substring-like extensible filters since the ABNF in RFC-4515 removes the ability to specify an ASTERISK character in extensible filters. In contrast, the ABNF in RFC-4511 (sec. 4.5.1) does not contain this restriction. See: http://tools.ietf.org/html/rfc4515#section-3 The restriction in RFC-4515 contradicts RFC-4511 (sec. 4.5.1.7.7). When describing extensible filters, it says: http://tools.ietf.org/html/rfc4511#section-4.5.1.7.7 The matchingRule used for evaluation determines the syntax for the assertion value. [...] So, using a substring-like matching rule should allow substring-like assertion values, in contradiction with the ABNF in RFC-4515. In summary, one should be able to conduct substring query by explicitly specifying the matchingRule in an extensible filter. However, all of this is doesn't matter since OpenLDAP currently doesn't support substring-like extensible filters. Scanning through the src-code of the latest version, it seems to make a tacit assumption that the assertion value is formatted for exact-match-like filters and ASTERISK characters are forbidden. However, even this is immaterial since clients MUST NOT (as in RFC 2119) conduct wildcard searches against the DN. On page 5 of GLUE v2.0 spec: [...]. The ID MUST NOT be interpreted by the user or the system as having any meaning other than an identifier. [...] Currently, we use the object's ID attribute in the RDNs to build the DNs. I believe filtering using a substring-like filter would imply that the ID attributes have some searchable structure, which contradicts the "[not] having any meaning" bit above. So, I interpret this to mean clients MUST NOT use wildcard searches in the DN.
and for another it's often useful to query *for* the chunkey so you can extract the relevant ID, which is a lot easier than extracting it from the DN.
This is certainly true: LDAP provides a very limited ability to search across links (effectively dereferencing a link). As an aside: there was a proposal to add support for cross-querying by adding a new matchingRule; however, the proposal seems to have been shot down due to interop. concerns. With the scheme David is proposing, we could include the upward pointing links. Omitting the objectClass declarations, this could be: dn: GlueStorageShareID=someShare,GlueStorageServiceID=someSE, AdminDomain=aGridSite,o=grid GlueStorageShareID: someShare GlueStorageShareName: example storage share GlueStorageShareStorageService: someSE Cheers, Paul.

Paul Millar [mailto:paul.millar@desy.de] said:
Having read (only the relevant bits of ;-) RFC-4511 I couldn't find anything to suggests that one cannot use substring match rules in an extensible filter.
My experience was more practical, I tried it and observed that it didn't work! However, with more time to think about this I'm not sure it's directly relevant anyway. The current use for the chunkkey is to relate objects with only a LocalID to their parents, but we now have global IDs for everything so I think we can just use the same kind of key for all relations. At a quick look I don't see any case where we have both parent and non-parent type relations between the same object classes, but if we do we should probably treat them as independent relations. (Hmm ... AdminDomains? Sites belong to both EGEE and LCG, but neither of those is a parent of the other, so EGEE -> LCG has a different status to EGEE -> DESY. But looking at the diagram we don't seem to allow for peer relations at all, so EGEE can't relate to LCG :) Anyway as I said earlier this is closely coupled with the question of how we structure the DIT - my personal preference is that we should try to avoid prescribing a particular tree, which would imply that you shouldn't need to do extensible queries. Also the schema is rather more complex than a tree can capture so I think we're bound to end up with relations that go outside it - e.g. Benchmark seems to be a "child" of both ExecutionEnvironment and ComputingManager. What we may perhaps want to think about is whether we might want any "double hop" references. For example, ExecutionEnvironment is not directly related to ComputingService (or DataStore to StorageService). Obviously you can get the link by doing two queries, but if it's likely to be a common thing it could be useful to have it explicitly?
However, even this is immaterial since clients MUST NOT (as in RFC 2119) conduct wildcard searches against the DN.
I take your point, but there are different grades of clients - I often construct ad-hoc queries which use properties I happen to know are true (or maybe just "mostly true") even though they aren't mandated by the schema. That would be naughty in real production code, but can be useful on the fly. And conversely I don't think I've ever used an extensible query in a real-world case, only to test it. Stephen -- Scanned by iCritical.

Since we are using a relational diagram which does not fit in a tree (that is the reason why we need foreign keys), I would also recommend not to force the DIT, but rather make recommendations on how to implement it. Right now Laurence is working on it. Regards, David On Thu, Apr 16, 2009 at 5:40 PM, Burke, S (Stephen) <stephen.burke@stfc.ac.uk> wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
Having read (only the relevant bits of ;-) RFC-4511 I couldn't find anything to suggests that one cannot use substring match rules in an extensible filter.
My experience was more practical, I tried it and observed that it didn't work! However, with more time to think about this I'm not sure it's directly relevant anyway. The current use for the chunkkey is to relate objects with only a LocalID to their parents, but we now have global IDs for everything so I think we can just use the same kind of key for all relations. At a quick look I don't see any case where we have both parent and non-parent type relations between the same object classes, but if we do we should probably treat them as independent relations. (Hmm ... AdminDomains? Sites belong to both EGEE and LCG, but neither of those is a parent of the other, so EGEE -> LCG has a different status to EGEE -> DESY. But looking at the diagram we don't seem to allow for peer relations at all, so EGEE can't relate to LCG :)
Anyway as I said earlier this is closely coupled with the question of how we structure the DIT - my personal preference is that we should try to avoid prescribing a particular tree, which would imply that you shouldn't need to do extensible queries. Also the schema is rather more complex than a tree can capture so I think we're bound to end up with relations that go outside it - e.g. Benchmark seems to be a "child" of both ExecutionEnvironment and ComputingManager.
What we may perhaps want to think about is whether we might want any "double hop" references. For example, ExecutionEnvironment is not directly related to ComputingService (or DataStore to StorageService). Obviously you can get the link by doing two queries, but if it's likely to be a common thing it could be useful to have it explicitly?
However, even this is immaterial since clients MUST NOT (as in RFC 2119) conduct wildcard searches against the DN.
I take your point, but there are different grades of clients - I often construct ad-hoc queries which use properties I happen to know are true (or maybe just "mostly true") even though they aren't mandated by the schema. That would be naughty in real production code, but can be useful on the fly. And conversely I don't think I've ever used an extensible query in a real-world case, only to test it.
Stephen
-- Scanned by iCritical. _______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg
-- David Horat Software Engineer – IT/GD – Grid Deployment Group CERN – European Organization for Nuclear Research » Where the web was born Address: 1211 Geneva - Switzerland, Office: 28/R-003 Phone +41 22 76 77996 Professional Web: http://cern.ch/horat Personal Web: http://davidhorat.com/ Profile: http://linkedin.com/in/davidhorat
participants (5)
-
Burke, S (Stephen)
-
David Horat
-
Laurence Field
-
Maarten Litmaath
-
Paul Millar