Re: [jsdl-wg] Questions about JSDL schema

Dear All, Some time ago I sent to the list the email to inform you that we are going to use JSDL to implement a uniform access to several Grid brokers and we asked you some questions concerning the specification. Now, since the JSDL schema changed, you can find the next portion of questions and comments below... :-) First of all we must admit that the new schema is much better designed than the previous version. Many of comments and questions, which we had before, became outdated. Nevertheless, the XML schema and specification changed considerably so the the first questions are: What is your versioning policy and plans for further development? Is the number of current version 1.1? Are you going to introduce any changes to this version or you plan to release subsequent versions? Do you think we can base our interfaces on this version of JSDL schema? We also noticed that the JSDL specification is not always compatible with the XML schema. I suppose that we should base on the XML schema, which is more up-to-date, rather than specification? Are there any elements in the specification that haven't been put into schema yet? When do you think you'll manage to finish the specification (or at least to synchronize with the current version of the schema)? We also have questions concerning some details in the schema: 1. We don't see where we could specify a type of application distribution, e.g. MPI, OpenMPI. Do you think this is too specific to add it to the application element? 2. How would you suggest to specify a logical file name: using predefined syntax or using a special attribute in the FileName element? Can a whole directory be specified as well? 3. Why Source & Target in DataStaging are in one element? How to use different name for an input and output file? We would suggest to add a choice element above Source and Target or to define some separate elements: e.g. SourceFiles and TargetFiles (probably of the same type). 4. Why the CreationFlag element is mandatory? I think this feature is not available in many systems. Furthermore, a reasonable default value can be chosen. 5. Definition 5.6.6.1: "A Source element contains the location and may contain a user on the remote system. This file MUST be staged in from the location specified by the URL as the user on the remote host before the job has terminated." Do you really mean "terminated" or it should be "started"? 6.What is expected as operatingSystemDesc? Is it human readable description? 7. I saw the uses-cases containing descriptions of a job consisting of multiple processes and threads requiring multiple resources and/or processors. I guess that specification of alternative configurations is not possible, e.g. 4 nodes 4 processors each OR 1 node, 16 processors? 8. I didn't find a specification of the queue name in the new schema. I think it might be sometimes useful similarly as an implicit specification of a specific host. 9. Regarding your question in the spec about the webService value of the ApplicationTypeEnumeration, we think this is definitely needed (invoking an existing WebService). 10. Do limits mean that if they are exceeded during application execution the application must be terminated? 11. Within the FileSystem element there is a sub-element MountPoint. Who should specify this? A user? I think a more common use case is that users use predefined variables (e.g. home, tmp) to specify paths that are relative to these variables (but mount points depend on a local system). 12. I asked in the previous email about defining software dependencies, e.g. necessary libraries and you requested some use-cases. Simple examples are as follows: - interactive application may need additional software (e.g. VNC) to enable users to access remotely application's user interface - graphical application may need the OpenGL library to run - Java applications need Java Virtual Machine installed Therefore, in my opinion such an element would be useful and general. Of course, we can add it using the extension possibility. Best regards & thanks, Ariel Oleksiak ----- Original Message ----- From: "Donal K. Fellows" <donal.k.fellows@manchester.ac.uk> To: "Ariel Oleksiak" <ariel@man.poznan.pl> Cc: "JSDL WG" <jsdl-wg@gridforum.org> Sent: Tuesday, October 26, 2004 4:21 PM Subject: Re: [jsdl-wg] Questions about JSDL schema
Ariel Oleksiak wrote:
1. How to define the MPI job that needs N procesors - can we use JobCategory + ProcessTopology? We just need to set a job type to MPI and to specify number of processes. BTW, have you already defined a final structure of the ProcessTopology element (I have seen discussions concerning this on the mailing list)?
This is a definite use-case, and we intend to nail down the ProcessTopology syntax at the up-coming face-to-face.
2. We couldn't find in the schema the "operator" attribute (according to the specification) to express the equal, min, and max operators. Moreover, we found in the spec: "<PhysicalMemory>4000 Units=M Constraint=Min</PhysicalMemory>". What method should be used? The latter is rather too less structured as for XML schema and probably units and constraint should be defined in the form of attributes.
That's one of the spots that remains to be tidied up. It's likely to end up as something like: <PhysicalMemory units="MB">4000</PhysicalMemory> With the default operator for numeric properties being "Min" (or whatever it is called this week.)
3. We found the Executable complex element in the schema. Are you going to use this element anywhere? Or you think ExecutionName is enough and Executable is left in the schema by mistake?
I've got use-cases where I need things other than just a path to an executable binary. (One case is where we provide access to an application but do not really wish to expose to users just what magic incantation they use to invoke it on the grounds that it is likely to change from time to time anyway and is really just a sysadmin thing.) Experience shows that you can't build a scalable grid system if every node has to understand masses of details in order to run a job; these abstract Executable elements are a significant part of the strategy to push these details closer to the resource providers.
Endpoint JSDL-accepting job engines might not understand these higher-level things of course. But then they can just throw out the jobs they don't understand; I'd expect other JSDL engines layered on top of them (*cough*resource brokers*cough*) to handle those sorts of aspects.
4. How to express required software dependencies, e.g. necessary libraries? Should they be inserted in the extend section in software requirements?
Something like that. Use-cases here would be helpful.
5. Where the extensions can be added (to be still comaptible with the JSDL spec)? Only in the parts of the schema where the "Extend" element is added? What about the remaining parts?
What parts remain? :^D
6. Have you considered specification of a type of a local resource management system (e.g. LSF, PBS, fork) to which user wants to submit a job? This could be helpful for a resource broker.
On one level, I don't consider that sort of information to actually be useful to a resource broker (wanting to run under a particular type of batch queue is an unusual requirement) but if such information was required it would be a software resource requirement. Possibly an extension though.
7. We also need a uniform description of information about job after submission, e.g. status etc. but as far as I understand this is out of the scope of JSDL spec?
Yes, it's out of our scope (and much more in the scope of JSIM).
8. Should "objectives", which are used to direct the search for the best resources, be a part of the JSDL document (e.g. as an extend element) or they are totally out of the scope of JSDL and should be placed in a separate document?
Out of scope, but it would be really cool to hear about how such things are done. I'd actually expect to be putting them in a document with a JSDL document as component of that outer document, just as we'd also do that sort of thing if we were expressing a workflow or some kind of advanced scheduling requirements. JSDL is very much only part of the overall picture (but we hope it will be a very useful part!) and we hope that we'll be able to work on these broader specifications in the future; there's definitely a need for such things, even if agreement on them is going to be more difficult to come by.
Donal.

Ariel Oleksiak wrote:
What is your versioning policy and plans for further development? Is the number of current version 1.1? Are you going to introduce any changes to this version or you plan to release subsequent versions? Do you think we can base our interfaces on this version of JSDL schema?
I think we plan to make the first version actually submitted to the formal part of the GGF process into version 1.0; everything before that is "bleeding edge".
1. We don't see where we could specify a type of application distribution, e.g. MPI, OpenMPI. Do you think this is too specific to add it to the application element?
Either it is a Resource (something which has to be there for the job to run) or it is part of the Application (an inherent aspect of the app running itself). I don't fully understand which. :^) Either way, it should be possible to fit it as an extension element into one of the two places.
2. How would you suggest to specify a logical file name: using predefined syntax or using a special attribute in the FileName element? Can a whole directory be specified as well?
I don't see why not.
3. Why Source & Target in DataStaging are in one element? How to use different name for an input and output file? We would suggest to add a choice element above Source and Target or to define some separate elements: e.g. SourceFiles and TargetFiles (probably of the same type).
They are in a single element so you can have a (logical) file that is staged in, modified, and then staged out again. No file has to have both Source and Target, and I think there are use-cases for having neither (e.g. where you just want to closely control some deletion behaviour).
4. Why the CreationFlag element is mandatory? I think this feature is not available in many systems. Furthermore, a reasonable default value can be chosen.
Pass. :^)
5. Definition 5.6.6.1: "A Source element contains the location and may contain a user on the remote system. This file MUST be staged in from the location specified by the URL as the user on the remote host before the job has terminated." Do you really mean "terminated" or it should be "started"?
Sounds like a typo.
6.What is expected as operatingSystemDesc? Is it human readable description?
I think that's one of the items we hope to borrow from CIM.
7. I saw the uses-cases containing descriptions of a job consisting of multiple processes and threads requiring multiple resources and/or processors. I guess that specification of alternative configurations is not possible, e.g. 4 nodes 4 processors each OR 1 node, 16 processors?
That's something you'd specify using Profiles I think.
8. I didn't find a specification of the queue name in the new schema. I think it might be sometimes useful similarly as an implicit specification of a specific host.
It got removed. It's a scheduling attribute, and so not within the domain of JSDL (we know we need further XML languages to specify these things; JSDL is definitely just a part of the wider picture of computational workflow template description.)
10. Do limits mean that if they are exceeded during application execution the application must be terminated?
They're meant to be interpreted as what the user want's his POSIX system limits to be.
11. Within the FileSystem element there is a sub-element MountPoint. Who should specify this? A user? I think a more common use case is that users use predefined variables (e.g. home, tmp) to specify paths that are relative to these variables (but mount points depend on a local system).
IIRC, the user can specify it and the system can either ensure that the FS is mounted, or it can just check to see if it is mounted and throw the job out if it isn't.
12. I asked in the previous email about defining software dependencies, e.g. necessary libraries and you requested some use-cases. Simple examples are as follows: - interactive application may need additional software (e.g. VNC) to enable users to access remotely application's user interface - graphical application may need the OpenGL library to run - Java applications need Java Virtual Machine installed Therefore, in my opinion such an element would be useful and general. Of course, we can add it using the extension possibility.
In the specific case of Java, there's a specialized ApplicationType (which needs some more fleshing out IIRC). But these are all good examples of "software resources". Thanks. Donal.

On Fri, 7 Jan 2005, Donal K. Fellows wrote: Thanks for your answers! Please find a few addidtional questions below.
I think we plan to make the first version actually submitted to the formal part of the GGF process into version 1.0; everything before that is "bleeding edge".
Ok, but when do you plan to do this? Are you going to introduce any changes to the current version? If we want to use it we just need a version of the schema to refer to it (to say that our implementation is based on JSDL version x.x, date: ...). Next question is what is more up-to-date: XML schema or the specification?
3. Why Source & Target in DataStaging are in one element? How to use different name for an input and output file? We would suggest to add a choice element above Source and Target or to define some separate elements: e.g. SourceFiles and TargetFiles (probably of the same type).
They are in a single element so you can have a (logical) file that is staged in, modified, and then staged out again. No file has to have both Source and Target, and I think there are use-cases for having neither (e.g. where you just want to closely control some deletion behaviour).
What if an input file differs from the output file (which is a very common use-case)? Then their FileName elements must be different so do you have to specify two DataStaging elements? What if in one of them both Source and Target are defined?
11. Within the FileSystem element there is a sub-element MountPoint. Who should specify this? A user? I think a more common use case is that users use predefined variables (e.g. home, tmp) to specify paths that are relative to these variables (but mount points depend on a local system).
IIRC, the user can specify it and the system can either ensure that the FS is mounted, or it can just check to see if it is mounted and throw the job out if it isn't.
But MountPoint is a mandatory element. How to specify required disc space without knowledge about mount points on local systems? We also noticed that some of elements (e.g. DataStaging) don't contain extensible elements (##other namespace). Is it an oversight or there is a reason for it? Regards, Ariel
In the specific case of Java, there's a specialized ApplicationType (which needs some more fleshing out IIRC). But these are all good examples of "software resources". Thanks.
Donal.

Ariel Oleksiak wrote:
Ok, but when do you plan to do this? Are you going to introduce any changes to the current version? If we want to use it we just need a version of the schema to refer to it (to say that our implementation is based on JSDL version x.x, date: ...). Next question is what is more up-to-date: XML schema or the specification?
They're going to be a single document in the end anyway. And I forget which is more up to date. :^) Probably the schema, but please report any discrepencies between the two.
What if an input file differs from the output file (which is a very common use-case)? Then their FileName elements must be different so do you have to specify two DataStaging elements? What if in one of them both Source and Target are defined?
If you have two local files, you need to <DataStaging>s. If any one has both <Source> and <Target>, it will be staged in and staged out.
But MountPoint is a mandatory element. How to specify required disc space without knowledge about mount points on local systems?
IIRC, the mount point does not need to be associated with an explicit location. (I hope someone else will try to answer this question!)
We also noticed that some of elements (e.g. DataStaging) don't contain extensible elements (##other namespace). Is it an oversight or there is a reason for it?
Probably a mistake/oversight. :^) Donal.

Hi I got an action at the last teleconf to followup on the filesystem/mountpoint/diskspace issues Ariel raised. Sorry for being late with this. (I've edited/re-arranged the two emails to gather the relevant parts together) Donal K. Fellows wrote:
Ariel Oleksiak wrote:
:
11. Within the FileSystem element there is a sub-element MountPoint. Who should specify this? A user? I think a more common use case is that users use predefined variables (e.g. home, tmp) to specify paths that are relative to these variables (but mount points depend on a local system).
IIRC, the user can specify it and the system can either ensure that the FS is mounted, or it can just check to see if it is mounted and throw the job out if it isn't.
:
But MountPoint is a mandatory element. How to specify required disc space without knowledge about mount points on local systems?
IIRC, the mount point does not need to be associated with an explicit location. (I hope someone else will try to answer this question!)
There are 3 scenarios here: 1. User specifies that certain filesystems must be available at a certain locations 2. User specifies certain filesystems must be available but doesn't care where, as long as the information is passed back somehow, e.g., environment variables. 3. User only cares that sufficient DiskSpace is available because there is access to some other mechanism that will deploy/configure the resource before the job is executed. At the moment we clearly support (1) and somewhat obsurely (2), and perhaps even (3) as Donal mentions. If we agree that these 3 scenarios are equally valid I wonder if we shouldn't make the rules clearer. For example: - make the MountPoint optional - if the MountPoint is specified then the filesystem MUST be available at that location - if the MountPoint is not specified then the filesystem MUST be available at some location and there MUST be an environment variable (subject to the rules of ApplicationType) defined by the execution system with the same name as the "FileSystem id". The value of this environment variable MUST be the location of the filesystem. And for case (3) we could allow the DiskSpace element to appear directly under Resource. Andreas

Ariel, Donal Donal K. Fellows wrote:
Ariel Oleksiak wrote:
:
4. Why the CreationFlag element is mandatory? I think this feature is not available in many systems. Furthermore, a reasonable default value can be chosen.
Pass. :^)
When the CreationFlag was not mandatory, behaviour was stated as implementation dependent if the flag was left undefined. The consensus was that it wasn't right to leave it implementation dependent. And since as a general approach we have chosen not to specify 'default default' values (we cannot say what is default for the grid) the flag became mandatory. I guess the main question (for me) would be whether the values defined for this flag at the moment (including 'dontOverwrite' which is not in the spec but is in the latest schema) cover the behaviour most people would want to specify. Obviously, whether this flag is mandatory or not, it is still possible to specify a value that is not supported by many systems. That only means that the job can only be executed in the, very restricted, environment that supports that option. We should probably provide informative text in the spec. on what we think is a reasonably common choice for this flag. -- Andreas Savva <andreas.savva@jp.fujitsu.com> Fujitsu Laboratories Ltd
participants (3)
-
Andreas Savva
-
Ariel Oleksiak
-
Donal K. Fellows