Notes from yesterday's F2F data session

These are notes I made from the comments made during yesterday's presentation. I stress that we encourage people to review the documents and give us feedback. I also think it is urgent that we produce a common subset of scenarios combining EMS & Data services. 1. All references to "profiles" in the presentation should have been to "further normative specifications" - which may include profiles where appropriate. The primary point is that we need to encourage key specs and to incorporate them into the OGSA Roadmap. 2. The slides did not clarify that the "Architecture" takes a toolbox approach, rather than a predefined relationship between the services. 3. The diagram showing the relationships between interfaces, services & resources should allow multiple resources per service. It should use the same style as the later diagrams (i.e. showing services & resources separately rather than as concentric layers). 3a. The Architecture and Services documents should use the same diagram styles - probably the one used in the OGSA 1.5 document. 4. The "Basic Structure" diagram should add a "Storage Access" interface (or perhaps just a sink/source interface for data transfer). Also, "Other Data Services" should be renamed to "Data Management" - there is no intention to distinguish these services per se from the other "Data Management" services in the diagram. Note: This diagram also appears in the OGSA 1.5 document. 5. The "Composite Entities" diagram should include a "federation management interface" and a "cache management interface". I got the impression that the diagram was not successful in showing that the federation and cache services exposed the same interfaces as their "supplier" services; it might be better to repeat these interfaces on the federation and cache services rather than using the dotted line. Note: This diagram also appears in the OGSA 1.5 document. 6. An alternative implementation of the Data Pipelining scenario could use StreamableByteIO between the Visualisation service and the stored animations. It might be worth giving this example to show the different ways that the services can be used to implement a solution. 7. The "Bringing data online" scenario should clarify why data transfer is used instead of data access. Perhaps the phrase "read files" should be changed to "transfer files". 8. In the "Data Staging" scenario, the "Parameter Space Exploration Service" should probably be replaced by a script engine or workflow enactment engine. 9. There was a discussion about the meaning of the term "policy". It was suggested that the sample "policies" listed on the slides were better described as sample parameters for policies. The Data Architecture document attempts to use the term policy consistently and we'd be happy to take guidance on improving this presentation. 10. The slide about "Common Properties" was floating an idea. The comments from the group were that if these properties were to be generalised, then the ParentDataResource should be an EPR instead of a URI and should allow multiple parents. Also, the "Sensitivity" properties were basically an expression of the consistency model, which in the general case could be more complex (as indeed we have noted elsewhere in the Data Architecture document). 11. Some of the functionality of the WS-DAI (optional) CoreResourceList interface will be provided by the WS-Naming facility - specifically the the functionality of the Resolve operation. RNS is not the appropriate reference here because it deals in strings instead of URIs. The other operation provided by the CoreResourceList interface, GetDataResourceList, does not belong with WS-Naming (and perhaps would better fit with the possibly-generic properties on the previous slide). 12. DFDL was mentioned in the discussion about "OGSA Data gaps", as a possible mechanism for describing file "schemas". In fact the slide was simply referring to POSIX-style "stat" information but we do see DFDL as a potential description mechanism. The current section on "Data Description" in the Data Architecture document probably needs some fleshing out to consider possible languages in more detail. 13. There was some discussion of what is meant by a "data service". The terminology used in the Data Architecture document is that "Data Services" are the subset of the OGSA services that deal with data and hence include services for data transfer, data access, data replication, storage management, etc. There is an alternative body of opinion that data access services should be terminologically privileged, i.e. that the term "data service" should mean a data access service. 14. Regarding the Information Model, Jay mentioned that a key piece of information is whether a data resource is logically "near" another resource. This is of use to schedulers, workflow enactments and applications trying to choose which data resource to use, among other examples. It isn't clear how to incorporate this information into an information model (although this should not stop us developing other aspects of that model). I encourage everyone to review the Data Architecture and Data Services documents. These documents will be released under the OGSA brand and we are at a good stage to take on board any comments. Best wishes, Dave Berry Deputy Director, Research & E-infrastructure Development National e-Science Centre, 15 South College Street Edinburgh, EH8 9AA +44 131 651 4039
participants (1)
-
Dave Berry