StorageSharePath multiplicity requested by Atlas

Hi Paul, some time ago, after collecting the feedback from the Atlas community, you proposed to change the multiplicity of StorageSharePath from "0..1" to "0..*" This is the Stephen's comment about this change: "it still isn't clear to me that multiplicity * makes sense. This is supposed to be a default path, if there are several you don't know which to use. Also if there is any software using this (possibly not) it would expect a unique result and probably break if it got more than one. If this is really wanted I would say add a new multivalued attribute Paths or OtherPaths." I agree with him that adding a specific multivalued attribute would it be better than modifying the multiplicity of the existing one: would you agree as well? cheers, Alessandro -- Dr. Alessandro Paolini Operations Officer - EGI Foundation Science Park 140 1098 XG Amsterdam The Netherlands skype: alessandro.paolini.egi ********************************* "I believe in the power of laughter and tears" "as an antidote to hatred and terror" "A day without laughter" "is a wasted day" >>> Charlie Chaplin

Hi Alessandro, On 07/06/18 10:33, Alessandro Paolini wrote:
some time ago, after collecting the feedback from the Atlas community, you proposed to change the multiplicity of StorageSharePath from "0..1" to "0..*"
Yes, that could be -- it's been a while now :)
This is the Stephen's comment about this change: "it still isn't clear to me that multiplicity * makes sense. This is supposed to be a default path, if there are several you don't know which to use.
IIRC, the idea is to describe how writing into different paths would consume capacity from the same StorageShare. Perhaps a concrete example would help. Imagine a normal Linux/Unix machine that has NFS mounted a remote server. The same server could be mounted at two different places in the namespace. Writing into either path would consume capacity on that NFS server. In reality, this isn't about NFS servers, but rather that a particular storage resource may be writeable through different two paths that do not share a parent-child relationship; for example, /data/tape/type-A and /data/tape/type-B. In dCache, user write requests may target some subset of all available capacity. The namespace is one way of choosing which resources are eligible for storing data, somewhat similar to how, on a Linux machine, writing into a particular directory might target an NFS server or a USB drive.
Also if there is any software using this (possibly not) it would expect a unique result and probably break if it got more than one. If this is really wanted I would say add a new multivalued attribute Paths or OtherPaths."
I agree with him that adding a specific multivalued attribute would it be better than modifying the multiplicity of the existing one: would you agree as well?
Yes, fair enough. I have a slight preference for "Paths" over "OtherPaths", since OtherPaths suggests the value of Path is somehow canonical or preferred, which (I believe) isn't the intention. We should also take care to define the relationship between Path and Paths. For example, if there is a single path, is this published under Path, under Paths, under both? If there are multiple paths, should nothing be published under Path? or should some preferred path be published? Cheers, Paul.

glue-wg <glue-wg-bounces@ogf.org> On Behalf Of Paul Millar said:
I have a slight preference for "Paths" over "OtherPaths", since OtherPaths suggests the value of Path is somehow canonical or preferred, which (I believe) isn't the intention.
It's preferred in the sense of being a default, use that path if you don't have a reason to do something else. That's how lcg-utils always worked, if you just specified a file name rather than a full path the code would write it to the default path. If there's only one default that's unambiguous, if you have several it's not clear what the semantics should be and anyway it's a change. Lcg-utils is now deprecated in favour of gfal and that doesn't use the information system, it just leaves it to the user, but that still gives the user the same question of how they construct a path if they don't have other knowledge. Stephen

shall we name the new attribute "AdditionalPaths"? so by default it will be used the attribute "Path" and the other paths can be published in "AdditionalPaths" On 7 June 2018 at 12:01, Stephen Burke - UKRI STFC <stephen.burke@stfc.ac.uk
wrote:
glue-wg <glue-wg-bounces@ogf.org> On Behalf Of Paul Millar said:
I have a slight preference for "Paths" over "OtherPaths", since OtherPaths suggests the value of Path is somehow canonical or preferred, which (I believe) isn't the intention.
It's preferred in the sense of being a default, use that path if you don't have a reason to do something else. That's how lcg-utils always worked, if you just specified a file name rather than a full path the code would write it to the default path. If there's only one default that's unambiguous, if you have several it's not clear what the semantics should be and anyway it's a change. Lcg-utils is now deprecated in favour of gfal and that doesn't use the information system, it just leaves it to the user, but that still gives the user the same question of how they construct a path if they don't have other knowledge.
Stephen
-- Dr. Alessandro Paolini Operations Officer - EGI Foundation Science Park 140 1098 XG Amsterdam The Netherlands skype: alessandro.paolini.egi ********************************* "I believe in the power of laughter and tears" "as an antidote to hatred and terror" "A day without laughter" "is a wasted day" >>> Charlie Chaplin

Hi Alessandro, On 07/06/18 12:36, Alessandro Paolini wrote:
shall we name the new attribute "AdditionalPaths"?
so by default it will be used the attribute "Path" and the other paths can be published in "AdditionalPaths"
This has the same problem: it suggests that the other paths are secondary. Here's a concrete proposal: Path: A preferred path for writing data that should consume resources from this StorageShare. This will typically be used as a prefix when generating a path under which the data is stored. Paths: A list of paths for writing data that should consume resources from this StorageShare. If a Path attribute is specified then the value MUST be included as one of the Paths values. HTH, Paul.

Hi Stephen, On 07/06/18 12:01, Stephen Burke - UKRI STFC wrote:
glue-wg <glue-wg-bounces@ogf.org> On Behalf Of Paul Millar said:
I have a slight preference for "Paths" over "OtherPaths", since OtherPaths suggests the value of Path is somehow canonical or preferred, which (I believe) isn't the intention.
It's preferred in the sense of being a default, use that path if you don't have a reason to do something else.
I don't really buy that as an argument. Either a StorageShare is writeable from a single path, in which case it is (trivially) the default, or there are multiple paths that could be used. If multiple paths exist then the choice of which one to use is very likely to be domain-specific and not something that is universally true for all users. A (made up) example: the ATLAS Higgs group should default to /data/atlas/higgs and the ATLAS SUSY group should default to /data/atlas/susy, even though writing into either path would consume resources from the same StorageShare (and writing into /data/atlas would consume resources on a different StorageShare). Which path should be advertised as Path: /data/atlas/higgs or /data/atlas/susy ?
That's how lcg-utils always worked, if you just specified a file name rather than a full path the code would write it to the default path.
Perhaps, however I find this a poor argument -- lcg-utils is dead.
If there's only one default that's unambiguous, if you have several it's not clear what the semantics should be and anyway it's a change. Lcg-utils is now deprecated in favour of gfal
Not just deprecated, lcg-utils is no longer supported (== dead) and has been for years.
and [gfal] doesn't use the information system, it just leaves it to the user,
Exactly. The concept of a "default path" is broken, which is why gfal doesn't support it.
but that still gives the user the same question of how they construct a path if they don't have other knowledge.
I think only the user (or, by extension, the VO) can know into which path they should write their data. Cheers, Paul.

Paul Millar <paul.millar@desy.de> said:
It's preferred in the sense of being a default, use that path if you don't have a reason to do something else.
I don't really buy that as an argument.
It's a statement more than an argument, it's been done like that since the early EDG days and was embedded in the GLUE schema since the storage part was added in 2003, so you're about 15 years late to disagree. You were also involved in the GLUE 2 definition and had a chance to make your arguments then.
A (made up) example: the ATLAS Higgs group should default to /data/atlas/higgs and the ATLAS SUSY group should default to /data/atlas/susy, even though writing into either path would consume resources from the same StorageShare (and writing into /data/atlas would consume resources on a different StorageShare).
ATLAS is really irrelevant here, they have a very elaborate system that imposes a particular behaviour. As far as I remember their reason for wanting multiple paths was to do with files already written so you could tell which storage area an existing path belonged to, but I don't think they have any interest at all in the info system any more. The real use case here is for small VOs who don't have any particular relationship with the sites and are just using opportunistic storage and want to know where their files should go. In that case you need a simple rule that can be implemented mechanically. Anyway, really I find this discussion pointless. The current revision of GLUE 2 is supposed to be backward-compatible, your suggestion is not backward-compatible and therefore should not be accepted. I don't propose to discuss this further. Stephen

Hi All, Firstly, I think we all agree that the goal of this group is to discuss among all the interested parties on the best way to publish information, in order to propose an information schema allowing to publish information required and usable by all of us. All this in the context of the definition of a standard that should suit all our needs and be applicable to a broader audience. In this context, I think we all recognize that every feedback matters. Secondly, coming back to Alessandro's question, and as it was made clear that Glue should remain backwards-compatible, and as you all seem to have agreed on adding a new multivalued attribute, can we update the GLUE 2.1 draft with this: - keeping StorageSharePath as it is - adding StorageShareAdditionalPaths as a multi-valued attribute Best regards, Baptiste On Thu, 7 Jun 2018 at 15:25 Paul Millar <paul.millar@desy.de> wrote:
On 07/06/18 14:41, Stephen Burke - UKRI STFC wrote:
ATLAS is really irrelevant here
OK, that's clear.
I disagree. However, if this represents the opinion of this group then I'll stop trying to provide feedback.
I wish you all the best,
Paul. _______________________________________________ glue-wg mailing list glue-wg@ogf.org https://www.ogf.org/mailman/listinfo/glue-wg
-- Baptiste Grenier EGI Foundation - Operations Officer Phone: +31 627 860 852 Skype: baptiste.grenier.egi

On 07/06/2018 14:25, Paul Millar wrote:
On 07/06/18 14:41, Stephen Burke - UKRI STFC wrote:
ATLAS is really irrelevant here
OK, that's clear.
I disagree. However, if this represents the opinion of this group then I'll stop trying to provide feedback.
Sadly, ATLAS is irrelevant only because they have stopped using GLUE and moved on to their own format which has multiple paths. As I think Paul suggests, if GLUE doesn't meet the communities' needs then they will do something else, and eventually it will be GLUE that is irrelevant. Cheers --jens
participants (5)
-
Alessandro Paolini
-
Baptiste Grenier
-
Jens Jensen
-
Paul Millar
-
Stephen Burke - UKRI STFC