Re: [glue-wg] New Endpoint and Service types

6 Mar 2014

      Hi Stephen,

On 05/03/14 19:10, stephen.burke@stfc.ac.uk wrote:
...
Paul Millar [mailto:paul.millar@desy.de] said:
...
Stephen, to you DPM, dCache and Castor provide the same
functionality, so you would be happy with instances of all three
published as Service.Type of 'storage' (or similar).
Not entirely - the object names are prefixed with "Storage" anyway,
so simply publishing a Type of "storage" would be redundant.
Yup, fair point.
...
Also it seems to me that something like a standalone xrootd server or
a "classic SE" as we used to have would reasonably be different types
of storage service, even aside from the details of which protocols
they support.
Possibly, but then perhaps DPM and dCache also provide sufficiently 
different storage behaviour to quality.  It's difficult without a 
benchmark to decide.
...
However, we do have a family of SRM-based SEs which seem to me to
represent a commmon type - indeed I thought that one of the goals of
EMI for dcache, DPM and StoRM was precisely to make them
interoperable!
I think you continue to have this fallacy that dCache is somehow
SRM-based.  Yes, one can read data using SRM, but one can easily read
the same data from the same dCache instance without using SRM: switch 
off SRM and dCache has always works perfectly well.

The interoperability has always been at the protocol level, not
the implementation level, so should appear in the Endpoint or
AccessProtocol.
...
In the past I would have suggested "SRM" as a Type, but since we now
seem to be making moves away from the use of SRM that may not be
ideal as a name.
From my pov, 'SRM' was never a good name: SRM is a protocol, not a
storage system.  dCache, at least, has never been "based on" SRM.
...
From a dcache POV, what do you see as providing commonality with DPM
and StoRM? (Beyond all being storage systems.)
One commonality between dCache and DPM is the immutable nature of stored
data: once written, data may only be modified by replacing the old data
with completely new data.  I think StoRM also provides an immutable 
filesystem, but a StoRM person would need to confirm.

However, this immutable nature could change in the (not too distant) future.

Other than that, I don't think there's much that's similar: they're
rather different implementations, with different design choices.
...
...
Somebody who needs some unique characteristic provided by dCache
(or DPM, or ...) might want more detailed Type, specifically that
the service provides the dCache-like facilities (or DPM-like or
...).
If someone really wants to know the implementation they can look at
the EndpointImplementationName or ManagerProductName - although of
course it's undesirable to have anything which is
implementation-specific.
Both certainly true: they can look at the Manager.ProductName and that 
tying behaviour to implementation is undesirable.
...
For me, to be a valid type it would have to be the case that a
completely different vendor could potentially produce an independent
product which could reasonably be described as "a DPM" or "a dcache"
- even conceptually, can you see such a thing as being meaningful? If
so, how would you define it? You use "dcache-like" above, but what
does that mean (in terms of external interfaces)?
I think the problem with Type is in deciding the use-case for querying 
it.  When would they query Type rather than, say, Manager.ProductName? 
AFAIK, we don't have concrete examples where this information is useful.

In terms of "dCache-like", there are any number of behavioural 
characteristics that distinguish dCache from DPM; for example, hot-spot 
detection and mitigation, overload protection, ability to stage file 
from tape, ...  A client may adjust its behaviour if it detects that the 
storage system is "dCache-like" (or if it isn't dCache-like).

As you point out, this could be discovered through Manager.ProductName, 
so it goes back to the above point: what are the use-cases for querying 
StorageService.Type?
...
...
For the xrootd protocol, dCache currently publishes
Endpoint.URL: xroot://xrootd-door.example.org/
EndpointInterface.Name: xroot and StorageAccessProtocol.Type:
xrootd
What protocol name do you recognise in e.g. a getTURL operation to
return an xroot TURL?
Currently it's 'root://'
...
Does it match what DPM and StoRM use?
I couldn't say: you would need to ask DPM and StoRM people.
...
What about webdav?
dCache SRM will return a TURL that starts 'http://' or 'https://'.
...
...
For WebDAV, dCache is currently publishing as either 'http' or
'https', depending on whether SSL/TLS tunnelling is enabled or
not.
Bear in mind that the scheme name in the URL is not the same as the
InterfaceName. I don't know a lot about webdav but my impression is
that it's far from being identical with http as far as file access
goes, so I would expect a different InterfaceName even if the URL is
https:// (c.f. SRM vs. httpg://).
I'm pretty sure that, for uploading and downloading data, the HTTP and 
WebDAV requests *are* identical.

WebDAV is about adding the "missing file-system ideas", like the concept 
of directories.
...
...
When publishing an Endpoint object the describes an HTTP or a
WebDAV endpoint with unencrypted access then the URL SHOULD start
'http://' and the InterfaceName SHOULD be 'http'.  If the endpoint
is encrypted then the URL SHOULD start 'https://' and the
InterfaceName SHOULD be 'https'. If the endpoint supports WebDAV
then a SupportedProfile of 'http://webdav.org/' SHOULD be
published.
If it's necessary to make that distinction think I would prefer to
publish both http and webdav endpoints, doing it your way would seem
likely to be error-prone.
Yes, but please bear in mind that there are many extensions that build 
on top of HTTP and that an endpoint may support many (into 
double-digits) of them concurrently.  Publishing an endpoint for each 
results in (excessive?) duplication.

While publishing multiple endpoints is possible, I was hoping we could 
come up with something better.

Cheers,

Paul.