
Hi Stephen, On 2012-08-25 12:12, stephen.burke@stfc.ac.uk wrote:
Florido Paganelli [mailto:florido.paganelli@hep.lu.se] said:
What would you propose to do with Share, Resource and Manager?
Same approach. As I said, this depends if we want to override the associations or not. This cannot be represented in UML, but makes sense in realizations.
And what about the relations between them? And the same for the storage classes? I think this would be quite a big change which would need a significant advantage to be worthwhile, and so far I don't think you've given one.
There is no changes. As I said, UML cannot express inheritance so well as implementation is straightforward. But we have the opportunity to fix it in the realization documents that are not final yet. I did not spend time reasoning about the other associations, but if we agree on a composition-driven approach (every specification adds, does not overload) rather than a bare inheritance-driven approach (every specification overloads associations) I see no problem whatsoever. We're still fully consistent with the model, everything works as expected.
In LDAP, I would scope the search for endpoints starting from the ComputingService,
You can do that if you have chosen one specific ComputingService, but in your own example of a delegation endpoint which could serve computing or others kind of service, the current definition lets you search for all the endpoints which serve computing services but not the others.
Yes I understand what you mean. But what if I have a delegation endpoint that can be used both for computing and for storage? should I replicate such an endpoint in a ComputingService and in a StorageService? an in that case the same delegation endpoint would be a ComputingEndpoint and a StorageEndpoint, two different IDs. but in the end is the same endpoint! How to express is the same endpoint? same ID? but then the record would have different objectclasses and associations... It's kinda bad to have differen records with the same ID. I would rather call it Endpoint, add associations pointing to both the StorageService and ComputingService it serves, give it the same ID and place it in both Computing and Storage services.
but then give me something to relate a local information service and its endpoints (some OpenLDAP service), or an independent delegation Service to the box where the ComputingService is, otherwise I run the
As I already said, I think an information endpoint should be a separate Service. For a delegation service I can't say, it would depend on how closely it's bound to the computing service and what the use cases are.
risk of quering twice the information system(s) for no reason, and submit jobs twice to the same endpoint because I cannot distinguish between them.
Queries are normally very lightweight compared with real service interactions like job submission, unless you're doing a very large number of them - querying twice is not a problem. Being able to recognise that you have the same Endpoint multiple times obviously is important, but I don't see why it would be difficult to recognise duplicates.
querying twice is a problem on big numbers. say I have 20 information endpoints and 40 submission endpoints in an index, such as EMIR, in which every Endpoint record has also the Service.ID of the Service the endpoint belongs to. A client retrieves all the 60 of them. Then, it might want to query information endpoints to scan for submission endpoints. Scenario 1) I have Endpoints and ComputingEndpoints in a ComputingService. I'll make it easy here. A single box might have more than one information/submission endpoint, that means deciding which information/submission endpoints belonging to the same box one doesn't want to query. So, let's simplify the scenario and suppose submission endpoints belong to different boxes and information endopoints belong to different boxes. BUT there might be information endpoints on the same box of at least one submission endpoint. Then, since Endpoints and ComputingEndpoints are in the same ComputingService, IF the information endpoint has the same Service.ID of a submission endpoint, the client might decide not to query it. Operation cost: one comparison for each information endpoint and submission endpoint at most, 20*40 = 800 ops Scenario 2) Different services, Endpoints in a Information Service and ComputingEndpoints in a ComputingService. We then have different Service.IDs for each endpoint, because information endpoints belong to different services than submission endpoints. The client cannot know which relationship exists between services, and then it must query information endpoints. Suppose every information endpoint outputs 10 submission endpoints, some registered to the index (i.e. belonging to the set of 40 taken from the index) and some not (i.e. not in those 40 present in the index), ~200 endpoints. As said, since there is no information on how information and submission endpoints are coupled, I need to scan the information endpoints as I can gather more submission endpoints there. A client cannot just suppose that all the useful submission endpoints are in the index. Hence I must check all the 40 submission endpoints in the index against the 200 retrieved from the information endpoints , in order not to submit twice to the same endpoint. In the worst case is 20 queries to information endpoints + 40*200 = 8000 comparison operations, 8020 operations in total, and we're gone to the next order. The numbers are arbitrary, but I can tell you that ARC will have at least 3 submission endpoints per box and you know what happens if you take a site-bdii as an information endpoint (one might easily reach 10 there on big sites) It is easy to see that as the number of job requests increases we might occur in an incredible amount of work just to submit a single job. Of course clients can use fancy ranking algorithms and or dynamic programming to solve the problem better.
In my initial implementation I wanted to use the service-to-service association described in GFD1.47 (page 7, page 13); however I was told that this was not the purpose for it to be there, but it was more to reflect some hierarchy between Services.
I don't see how it could represent a hierarchy unless you had some other way to express it - Service-Service is a peer relation, there is no directionality (unlike e.g. Domain-Domain). In any case, as I've said repeatedly, the question is not what the purpose was when the schema was defined (none in particular as far as a I remember) but whether it can be used to satisfy whatever requirements you have now in a specific case. For the things you're describing this may well be sufficient.
It might be worth then pushing these associations records into an index. Many developers are underestimating these associations in implementations and I tend not to consider them reliable. I can see that they were meant as an approach to database integrity with a relational DB in mind. These things nowadays are better realized via graph databases. Maybe the IDs in the associations might be used as a foundation to query and build a graph database of relationships between services, but this is dreaming of the future :)
I think the flaw in such an association based approach would be that the unique ID might be wrong at a certain point in time (for example because of ID renewal) and not refer anymore to the record it points to.
Persistency of IDs is a separate question, and a general one - IDs must be persistent for as long as necessary for all the possible uses. ServiceIDs in particular should probably change only when services are reconfigured in a major way. If references to IDs can't be followed the whole schema will be unusable!
I agree on both these two comments! we must push for those IDs to be crucial for implementations. Their value and importance for distributed deployments to work has been underestimated, especially regarding the rules regulating their persistence. I guess it is already part of you EGI profile, Stephen. Cheers, -- Florido Paganelli Lund University - Particle Physics ARC Middleware EMI Project