Re: [glue-wg] ComputingService and Endpoints, a point of view

27 Aug 2012

      Hi Stephen,

On 2012-08-25 12:12, stephen.burke@stfc.ac.uk wrote:
...
Florido Paganelli [mailto:florido.paganelli@hep.lu.se] said:
...
...
What would you propose to do with Share, Resource and Manager?
Same approach. As I said, this depends if we want to override the
associations or not. This cannot be represented in UML, but makes
sense in realizations.
And what about the relations between them? And the same for the
storage classes? I think this would be quite a big change which would
need a significant advantage to be worthwhile, and so far I don't
think you've given one.
There is no changes. As I said, UML cannot express inheritance so well 
as implementation is straightforward.

But we have the opportunity to fix it in the realization documents that 
are not final yet.

I did not spend time reasoning about the other associations, but if we 
agree on a composition-driven approach (every specification adds, does 
not overload) rather than a bare inheritance-driven approach (every 
specification overloads associations) I see no problem whatsoever. We're 
still fully consistent with the model, everything works as expected.
...
...
In LDAP, I would scope the search for endpoints starting from the
ComputingService,
You can do that if you have chosen one specific ComputingService, but
in your own example of a delegation endpoint which could serve
computing or others kind of service, the current definition lets you
search for all the endpoints which serve computing services but not
the others.
Yes I understand what you mean. But what if I have a delegation endpoint 
that can be used both for computing and for storage? should I replicate 
such an endpoint in a ComputingService and in a StorageService? an in 
that case the same delegation endpoint would be a ComputingEndpoint and 
a StorageEndpoint, two different IDs. but in the end is the same endpoint!
How to express is the same endpoint? same ID? but then the record would 
have different objectclasses and associations... It's kinda bad to have 
differen records with the same ID.

I would rather call it Endpoint, add associations pointing to both the 
StorageService and ComputingService it serves, give it the same ID and 
place it in both Computing and Storage services.
...
...
but then give me something to relate a local information service
and its endpoints (some OpenLDAP service), or an independent
delegation Service to the box where the ComputingService is,
otherwise I run the
As I already said, I think an information endpoint should be a
separate Service. For a delegation service I can't say, it would
depend on how closely it's bound to the computing service and what
the use cases are.
...
risk of quering twice the information system(s) for no reason, and
submit jobs twice to the same endpoint because I cannot
distinguish between them.
Queries are normally very lightweight compared with real service
interactions like job submission, unless you're doing a very large
number of them - querying twice is not a problem. Being able to
recognise that you have the same Endpoint multiple times obviously is
important, but I don't see why it would be difficult to recognise
duplicates.
querying twice is a problem on big numbers. say I have 20 information 
endpoints and 40 submission endpoints in an index, such as EMIR, in 
which every Endpoint record has also the Service.ID of the Service the 
endpoint belongs to.

A client retrieves all the 60 of them. Then, it might want to query 
information endpoints to scan for submission endpoints.

Scenario 1)
I have Endpoints and ComputingEndpoints in a ComputingService.

I'll make it easy here. A single box might have more than one 
information/submission endpoint, that means  deciding which 
information/submission endpoints belonging to the same box one doesn't 
want to query. So, let's simplify the scenario and suppose submission 
endpoints belong to different boxes and information endopoints belong to 
different boxes.

BUT there might be information endpoints on the same box of at least one 
submission endpoint.

Then, since Endpoints and ComputingEndpoints are in the same 
ComputingService, IF the information endpoint has the same Service.ID of 
a submission endpoint, the client might decide not to query it.

Operation cost: one comparison for each information endpoint and 
submission endpoint at most, 20*40 = 800 ops

Scenario 2) Different services,
Endpoints in a Information Service and ComputingEndpoints in a 
ComputingService.

We then have different Service.IDs for each endpoint, because 
information endpoints belong to different services than submission 
endpoints.

The client cannot know which relationship exists between services, and 
then it must query information endpoints.

Suppose every information endpoint outputs 10 submission endpoints, some 
registered to the index (i.e. belonging to the set of 40 taken from the 
index) and some not (i.e. not in those 40 present in the index), ~200 
endpoints.

As said, since there is no information on how information and submission 
endpoints are coupled, I need to scan the information endpoints as I can 
gather more submission endpoints there. A client cannot just suppose 
that all the useful submission endpoints are in the index.

Hence I must check all the 40 submission endpoints in the index against 
the 200 retrieved from the  information endpoints , in order not to 
submit twice to the same endpoint.

In the worst case is 20 queries to information endpoints + 40*200 = 8000 
comparison operations, 8020 operations in total, and we're gone to the 
next order.

The numbers are arbitrary, but I can tell you that ARC will have at 
least 3 submission endpoints per box and you know what happens if you 
take a site-bdii as an information endpoint (one might easily reach 10 
there on big sites)

It is easy to see that as the number of job requests increases we might 
occur in an incredible amount of work just to submit a single job. Of 
course clients can use fancy ranking algorithms and or dynamic 
programming to solve the problem better.
...
...
In my initial implementation I wanted to use the
service-to-service association described in GFD1.47 (page 7, page
13); however I was told that this was not the purpose for it to be
there, but it was more to reflect some hierarchy between Services.
I don't see how it could represent a hierarchy unless you had some
other way to express it - Service-Service is a peer relation, there
is no directionality (unlike e.g. Domain-Domain). In any case, as
I've said repeatedly, the question is not what the purpose was when
the schema was defined (none in particular as far as a I remember)
but whether it can be used to satisfy whatever requirements you have
now in a specific case. For the things you're describing this may
well be sufficient.
It might be worth then pushing these associations records into an index. 
Many developers are underestimating these associations in 
implementations and I tend not to consider them reliable.
I can see that they were meant as an approach to database integrity with 
a relational DB in mind.

These things nowadays are better realized via graph databases. Maybe the 
IDs in the associations might be used as a foundation to query and build 
a graph database of relationships between services, but this is dreaming 
of the future :)
...
...
I think the flaw in such an association based approach would be
that the unique ID might be wrong at a certain point in time (for
example because of ID renewal) and not refer anymore to the record
it points to.
Persistency of IDs is a separate question, and a general one - IDs
must be persistent for as long as necessary for all the possible
uses. ServiceIDs in particular should probably change only when
services are reconfigured in a major way. If references to IDs can't
be followed the whole schema will be unusable!
I agree on both these two comments! we must push for those IDs to be 
crucial for implementations. Their value and importance for distributed 
deployments to work has been underestimated, especially regarding the 
rules regulating their persistence. I guess it is already part of you 
EGI profile, Stephen.

Cheers,
-- 
Florido Paganelli
Lund University - Particle Physics
ARC Middleware
EMI Project