
Hi Stephen, On 05/03/14 19:10, stephen.burke@stfc.ac.uk wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
Stephen, to you DPM, dCache and Castor provide the same functionality, so you would be happy with instances of all three published as Service.Type of 'storage' (or similar).
Not entirely - the object names are prefixed with "Storage" anyway, so simply publishing a Type of "storage" would be redundant.
Yup, fair point.
Also it seems to me that something like a standalone xrootd server or a "classic SE" as we used to have would reasonably be different types of storage service, even aside from the details of which protocols they support.
Possibly, but then perhaps DPM and dCache also provide sufficiently different storage behaviour to quality. It's difficult without a benchmark to decide.
However, we do have a family of SRM-based SEs which seem to me to represent a commmon type - indeed I thought that one of the goals of EMI for dcache, DPM and StoRM was precisely to make them interoperable!
I think you continue to have this fallacy that dCache is somehow SRM-based. Yes, one can read data using SRM, but one can easily read the same data from the same dCache instance without using SRM: switch off SRM and dCache has always works perfectly well. The interoperability has always been at the protocol level, not the implementation level, so should appear in the Endpoint or AccessProtocol.
In the past I would have suggested "SRM" as a Type, but since we now seem to be making moves away from the use of SRM that may not be ideal as a name.
From my pov, 'SRM' was never a good name: SRM is a protocol, not a storage system. dCache, at least, has never been "based on" SRM.
From a dcache POV, what do you see as providing commonality with DPM and StoRM? (Beyond all being storage systems.)
One commonality between dCache and DPM is the immutable nature of stored data: once written, data may only be modified by replacing the old data with completely new data. I think StoRM also provides an immutable filesystem, but a StoRM person would need to confirm. However, this immutable nature could change in the (not too distant) future. Other than that, I don't think there's much that's similar: they're rather different implementations, with different design choices.
Somebody who needs some unique characteristic provided by dCache (or DPM, or ...) might want more detailed Type, specifically that the service provides the dCache-like facilities (or DPM-like or ...).
If someone really wants to know the implementation they can look at the EndpointImplementationName or ManagerProductName - although of course it's undesirable to have anything which is implementation-specific.
Both certainly true: they can look at the Manager.ProductName and that tying behaviour to implementation is undesirable.
For me, to be a valid type it would have to be the case that a completely different vendor could potentially produce an independent product which could reasonably be described as "a DPM" or "a dcache" - even conceptually, can you see such a thing as being meaningful? If so, how would you define it? You use "dcache-like" above, but what does that mean (in terms of external interfaces)?
I think the problem with Type is in deciding the use-case for querying it. When would they query Type rather than, say, Manager.ProductName? AFAIK, we don't have concrete examples where this information is useful. In terms of "dCache-like", there are any number of behavioural characteristics that distinguish dCache from DPM; for example, hot-spot detection and mitigation, overload protection, ability to stage file from tape, ... A client may adjust its behaviour if it detects that the storage system is "dCache-like" (or if it isn't dCache-like). As you point out, this could be discovered through Manager.ProductName, so it goes back to the above point: what are the use-cases for querying StorageService.Type?
For the xrootd protocol, dCache currently publishes
Endpoint.URL: xroot://xrootd-door.example.org/ EndpointInterface.Name: xroot and StorageAccessProtocol.Type: xrootd
What protocol name do you recognise in e.g. a getTURL operation to return an xroot TURL?
Currently it's 'root://'
Does it match what DPM and StoRM use?
I couldn't say: you would need to ask DPM and StoRM people.
What about webdav?
dCache SRM will return a TURL that starts 'http://' or 'https://'.
For WebDAV, dCache is currently publishing as either 'http' or 'https', depending on whether SSL/TLS tunnelling is enabled or not.
Bear in mind that the scheme name in the URL is not the same as the InterfaceName. I don't know a lot about webdav but my impression is that it's far from being identical with http as far as file access goes, so I would expect a different InterfaceName even if the URL is https:// (c.f. SRM vs. httpg://).
I'm pretty sure that, for uploading and downloading data, the HTTP and WebDAV requests *are* identical. WebDAV is about adding the "missing file-system ideas", like the concept of directories.
When publishing an Endpoint object the describes an HTTP or a WebDAV endpoint with unencrypted access then the URL SHOULD start 'http://' and the InterfaceName SHOULD be 'http'. If the endpoint is encrypted then the URL SHOULD start 'https://' and the InterfaceName SHOULD be 'https'. If the endpoint supports WebDAV then a SupportedProfile of 'http://webdav.org/' SHOULD be published.
If it's necessary to make that distinction think I would prefer to publish both http and webdav endpoints, doing it your way would seem likely to be error-prone.
Yes, but please bear in mind that there are many extensions that build on top of HTTP and that an endpoint may support many (into double-digits) of them concurrently. Publishing an endpoint for each results in (excessive?) duplication. While publishing multiple endpoints is possible, I was hoping we could come up with something better. Cheers, Paul.