Questions and potential changes to JSDL, as seen from HPC Profile point-of-view

Hi; Coming from the point-of-view of the HPC Profile working group, I have several questions about JSDL, as well as some straw man thoughts about how JSDL should/could relate to the HPC Profile specification that I'm involved with. Some of my questions lead me to restrictions on JSDL that an HPC profile specification might make. Other questions lead to potential changes that might be made as part of creating future versions of JSDL. (I'm well aware that JSDL 1.0 was meant as a starting point rather than the final word on job submission descriptions and so please interpret my questions as being an attempt at constructive suggestions rather than a criticism of a very fine first step by the JSDL working group.) At a high level, there are several general questions that came up when reading the JSDL 1.0 specification: * Can JSDL documents describe jobs other than Linux/Unix/Posix jobs? For example, things like mount points and mount sources do not map in a completely straight-forward manner to how file systems are provided in the Windows world. * Is JSDL expressive enough to describe all the needs of a job? For example, it is unclear how one would specify a requirement for something like a particular instruction set variation of the IA86 architecture (e.g. the SSE3 version of the Pentium) or how one would specify that AMD processors are required rather than Intel ones (because the optimized libraries and the optimizations generated by the compiler used will differ for each). For another example, it is unclear how one would specify that all the compute nodes used for something like an MPI job should have the same hardware. * How will JSDL's normative set of enumeration values for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other). * Although JSDL specifies a means of including additional non-normative elements and attributes in a document, non-normative extensions make interoperability difficult. This implies the need for normative extensions to JSDL beyond the Posix extension currently described in the 1.0 specification. Are there plans to define additional extension profiles to address the above questions surrounding expressive power and normative descriptions of things like current OS types and versions? * If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging - with its attendant aspects such as mount points and mount sources - should be defined in an extension rather than in the core specification that will need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what's functionally necessary to execute a job correctly and should leave things that are "optimization hints", such as CPU speed and network bandwidth specifications, to extension profiles. * How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in practice? I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don't represent functional requirements for a job - meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support, the memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node's network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the type of communications patterns employed by the application. How should this aspect of network bandwidth be reflected - if at all - in the network bandwidth values that a job requests and that compute nodes advertise? * JSDL is intended for describing the requirements of a job being submitted for execution. To enable matchmaking between submitted jobs and available computational resources there must also be a way of describing existing/available resources. While much of JSDL can be used for this purpose, it is also clear that various extensions are necessary. For example, to describe a compute cluster requires that one be able to specify the resources for each compute node in the cluster (which may be a heterogeneous lot). Similarly, to describe a compute node with multiple network interfaces would require an extension to the current model, which assumes that only a single instance of such things can exist. This raises the question of whether something other than JSDL is intended to be used for describing available computational resources or whether there are intensions to extend JSDL to enable it to describe such resources. * The current specification stipulates that conformant implementations must be able to parse all the elements and attributes defined in the spec, but doesn't require that any of them be supplied. Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can't assume that any valid JSDL document will actually be acceptable by any given job submission service. * I have a number of questions about data staging: * Although the notions of working directory and environment variables are defined in the posix extension, they are implicitly assuming in the data staging section of the core specification. This implies to me that either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification. * Recursive directory copying can be specified, but is not required to be supplied by any job submission service. This makes it difficult to write applications that programmatically define their data staging needs since they cannot in the current design determine whether any given job submission service implements recursive directory copying. In practice this may mean that programmatically generated job submissions will only ever use lists of individual files to stage. * The current definitions of the well-known file systems seem imprecise to me. In particular: * What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so. * What is ROOT intended to be used for? Are there assumptions about what resides under root? Are there assumptions about what an application can read/write under the ROOT subtree? (ROOT also seems like the most Unix-specific of the 4 file system types defined.) * What are the sharing/consistency semantics of each file system in situations where a job is a multi-node application running on something like a cluster? Is HOME visible to all compute nodes in a data-consistent manner? I'm guessing that TMP would be assumed to be strictly local to each compute node, so that things like MPI applications would need to be cognizant that they are writing multiple files to multiple separate storage systems when they write to a file in TMP - and furthermore that data staging of such files after a job has run will result in multiple files that all map to the same target file. * Can other users write over or delete your data in TMP and/or SCRATCH? Is data in these file systems visible to other users or does each job get its own private TMP and SCRATCH? * How long does data in SCRATCH stay around? Without some normative definition - or at least a normative lower bound - on data lifetime clients will have to assume that the data can vanish arbitrarily and things like multi-job workflows will be very difficult to write if they try to take advantage of SCRATCH space to avoid unnecessary data staging actions to/from a computing facility. * From an interoperability and programmatic submission point-of-view, it is important to know which transports any given job submission service can be expected to support. This seems like another area where a normative minimal set that all job submission services must implement needs to be defined. Given these questions, as well as the mandate for the HPC profile to define a simple base interface (that can cover the HPC use case of submitting jobs to a compute cluster), I would like to present the following straw man proposal for feedback from this community: * Restructure the JSDL specification as a small core specification that must be universally implemented - i.e. not just parsable, but also suppliable by all compliant job submission services - and a number of optional extension profiles. * Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world. * Move data staging to an extension. * Create precise definitions of the various concepts introduced in the data staging extension, including normative requirements about whether or not one can change directory up and out of a file system's root directory, etc. * Define which transports are expected to be implemented by all compliant services. * Move the various enumeration types - e.g. for CPU architecture and OS - to separate specification documents so that they can evolve without requiring corresponding and constant revision of the core JSDL specification. * Define extension profiles (eventually, not right away) that enable richer description of hardware and software requirements, such as details of the CPU architecture or OS capabilities. As part of this, move optimization hints, such as CPU speed and network bandwidth elements out of the JSDL core and into a separate extension profile. * Embrace the issue of how to specify available resources at an execution subsystem. Start by defining a base case that allows the description of compute clusters by creating a compound JSDL document that consists of an outer element that ties together a sequence of individual JSDL elements, each of which describes a single compute node of a compute cluster. Define an explicit notion of extension profiles that could define other ways of describing computational resources beyond just an array of simple JSDL descriptions. Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases. Marvin.

Marvin Theimer wrote:
Coming from the point-of-view of the HPC Profile working group, I have several questions about JSDL, as well as some straw man thoughts about how JSDL should/could relate to the HPC Profile specification that I’m involved with. Some of my questions lead me to restrictions on JSDL that an HPC profile specification might make. Other questions lead to potential changes that might be made as part of creating future versions of JSDL. (I’m well aware that JSDL 1.0 was meant as a starting point rather than the final word on job submission descriptions and so please interpret my questions as being an attempt at constructive suggestions rather than a criticism of a very fine first step by the JSDL working group.)
I'm going to work through these things as I read through them, so the answers (well, my answers) might be a little disjointed. :-)
At a high level, there are several general questions that came up when reading the JSDL 1.0 specification:
· Can JSDL documents describe jobs other than Linux/Unix/Posix jobs? For example, things like mount points and mount sources do not map in a completely straight-forward manner to how file systems are provided in the Windows world.
Most certainly. The intent is that ultimately JSDL jobs should be able to describe pretty much any request for an atomic activity, and the POSIXApplication stuff was just a seed so that at least one common case would be handled by the initial specification. Work is ongoing with an extension to that to support parallel (mainly MPI, but also some other archtectures too) jobs, and we've had in mind other kinds of jobs for a while (including SQL jobs, Web-service invokation jobs, and JVM jobs, but obviously not limited to those). On the matter of mount points, the interpretation of a mount source is not that the mount source should be mounted at the mount point, but rather that the job should fail if the mount is not present. Now, a JSDL consumer might react to that failure by trying to perform the mount, but it is not required. (The meaning of the name of the mount source is not defined IIRC, though it probably ought to be URI-like, meaning that SMB mounts would work fine under windows with suitable munging.) We'd hope that most jobs would not actually specify the mount point, but would instead use the facilities provided by the JSDL abstract file system processing semantics to adapt to whatever was available.
· Is JSDL expressive enough to describe all the needs of a job? For example, it is unclear how one would specify a requirement for something like a particular instruction set variation of the IA86 architecture (e.g. the SSE3 version of the Pentium) or how one would specify that AMD processors are required rather than Intel ones (because the optimized libraries and the optimizations generated by the compiler used will differ for each). For another example, it is unclear how one would specify that all the compute nodes used for something like an MPI job should have the same hardware.
I think with processor types we just grabbed a snapshot of the CIM model and went with that; updating to use a later version of that would not cause great difficulty (though the reverse problem might then exist, in that it might become more difficult to say that any kind of x86 arch is OK for a particular job). However, I believe we would assume the following interpretation of processor requirements: if specified, that's what they want for all processors associated with the job. If they didn't specify, they didn't care and anything is therefore good enough.
· How will JSDL’s normative set of enumeration values for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other).
I don't know. :-) Maybe we should say that additional things as defined in some other model (e.g. CIM) SHOULD be accepted? (As I said above, we just took a snapshot of that model; updating isn't really a big deal.)
· Although JSDL specifies a means of including additional non-normative elements and attributes in a document, non-normative extensions make interoperability difficult. This implies the need for normative extensions to JSDL beyond the Posix extension currently described in the 1.0 specification. Are there plans to define additional extension profiles to address the above questions surrounding expressive power and normative descriptions of things like current OS types and versions?
We do not currently have *specific* plans to do this, but that does not mean we cannot have such specific plans in fairly short order. :-)
· If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging – with its attendant aspects such as mount points and mount sources – should be defined in an extension rather than in the core specification that will need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what’s /functionally/ necessary to execute a job correctly and should leave things that are “optimization hints”, such as CPU speed and network bandwidth specifications, to extension profiles.
Sounds fairly reasonable, though the abstract filesystem stuff has real uses in that it makes it much easier to write a job request that deals with things like varying locations of home directories and scratch space. The alternative is to assume that temporary files are always written to somewhere like /tmp, immediately stuffing interop even between Unix-based HPC centres (we don't write large files to /tmp here because that's not a cluster-wide resource and is therefore not very useful) let alone with any Windows-based service. But it is entirely reasonable to support mount points and sources by saying things like "if it doesn't match my current configuration, I'll fault". That is most certainly a legal interpretation of how to process a JSDL document. This is probably an issue that ought to be covered in the primer, when we finally write it. :-)
· How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in practice? I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don’t represent functional requirements for a job – meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support, the memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node’s network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the /type/ of communications patterns employed by the application. How should this aspect of network bandwidth be reflected – if at all – in the network bandwidth values that a job requests and that compute nodes advertise?
CPU speed is a fairly meaningless value really, since it is at best only a poor approximant to application performance (which is what people are really interested in) though app-perf is not portable in any sensible way as you can't extrapolate from the performance of one application to that of another. But it's probably the best we've got (we could do FLOPS or MIPS instead I suppose, but I suspect neither is much better). Network bandwidth is worse, because it is only meaningful when defined with respect to a defined pair of endpoints (or, more particularly here, w.r.t. a defined remote endpoint, since the other one is defined by where the job is submitted to). What's worse is that latency isn't defined at all, and that's at least as important for complex apps. In short, I think we didn't get the network bandwidth right. :-\ However, the general policy of accepting quality-of-service requirements on resources is one I agree with, since they really do matter and they are constraints on whether a particular resource is fit for the user's purpose.
· JSDL is intended for describing the requirements of a job being submitted for execution. To enable matchmaking between submitted jobs and available computational resources there must also be a way of describing existing/available resources. While much of JSDL can be used for this purpose, it is also clear that various extensions are necessary. For example, to describe a compute cluster requires that one be able to specify the resources for each compute node in the cluster (which may be a heterogeneous lot). Similarly, to describe a compute node with multiple network interfaces would require an extension to the current model, which assumes that only a single instance of such things can exist. This raises the question of whether something other than JSDL is intended to be used for describing available computational resources or whether there are intensions to extend JSDL to enable it to describe such resources.
Strictly this is outside the scope of JSDL, where we've stuck firmly to the niche of describing user requests and not the things with which those requests may be satisfied. However, I do have some ideas on this. :-) JSDL terms can indeed be used for resource description, and this is because you can interpret them as saying something like "this is the maximal set of processors I will allocate to any job you submit". The UniGrids project has looked at several ways to do such resource descriptions based over JSDL. The simplest model we've found was to say that each target system service (BES-analog) supports a single unified homogenous resource description, and that where we have a heterogenous cluster we describe that as multiple services, each with smaller claims of range of resources allocated to it. This allows for a simple resource model and matching rules, but it covers the 90% case neatly. Let me flesh that out with an example. Suppose we have a cluster of machines, four from Intel (with 2GB memory each) and four from AMD (two with 1GB, two with 4GB). This induces 5 services, with resource claims as follows: * 2 AMD processors, 4GB * 4 AMD processors, 1GB * 4 Intel processors, 2GB * 6 x86 processors, 2GB * 8 x86 processors, 1GB It should be noted that these separate services woud actually be pretty cheap in our implementation, since we can host them in the same container at a cost of a few extra objects. :-) Maybe other approaches would be better, but the matter of resource description is politically tricky for this WG since it gets into space claimed by others.
· The current specification stipulates that conformant implementations must be able to parse all the elements and attributes defined in the spec, but doesn’t require that any of them be supplied. Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can’t assume that any valid JSDL document will actually be acceptable by any given job submission service.
I'd argue that this profiling of JSDL should be done by BES or yourselves (the HPC profile). This is because there are other cases (e.g. as synchronization points in workflow processing) where null jobs are actually useful.
· I have a number of questions about data staging:
I have one major observation: the data staging stuff is known to be a long way off imperfect.
· Although the notions of working directory and environment variables are defined in the posix extension, they are implicitly assuming in the data staging section of the core specification. This implies to me that either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification.
Good point. I suppose our response to this should be contingent on whether "context location" (i.e. working directory) can be defined for all currently conceived-of job types. I don't know how to answer this yet. It's certainly possible for many of the things we've identified, but all?
· Recursive directory copying can be specified, but is not required to be supplied by any job submission service. This makes it difficult to write applications that programmatically define their data staging needs since they cannot in the current design determine whether any given job submission service implements recursive directory copying. In practice this may mean that programmatically generated job submissions will only ever use lists of individual files to stage.
It means that only _interoperable_ ones will do that, but I think there are already implementations of directory staging out there and clients that are generating jobs that use it. I may be wrong though. :-)
· The current definitions of the well-known file systems seem imprecise to me. In particular:
· What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so.
We don't specify. Portable applications don't change directory at all in my experience; it's too full of strange behaviour as the meaning of all relative paths change...
· What is ROOT intended to be used for? Are there assumptions about what resides under root? Are there assumptions about what an application can read/write under the ROOT subtree? (ROOT also seems like the most Unix-specific of the 4 file system types defined.)
Fair points, and I'd usually assume that the root FS was not writable. It probably is fairly Unix-specific. But it does make life much easier for integrating with legacy job systems which can handle the other FS types by translation into the root and adding a prefix to the paths. FWIW, I wouldn't use ROOT in my jobs. :-)
· What are the sharing/consistency semantics of each file system in situations where a job is a multi-node application running on something like a cluster? Is HOME visible to all compute nodes in a data-consistent manner? I’m guessing that TMP would be assumed to be strictly local to each compute node, so that things like MPI applications would need to be cognizant that they are writing multiple files to multiple separate storage systems when they write to a file in TMP – and furthermore that data staging of such files after a job has run will result in multiple files that all map to the same target file.
I've been assuming that (or at least configuring our local systems so that) TMP was node-local and SCRATCH was cluster-wide.
· Can other users write over or delete your data in TMP and/or SCRATCH? Is data in these file systems visible to other users or does each job get its own private TMP and SCRATCH?
I'd assume that other users never can overwrite your data and wouldn't make any assumptions at all about the level of isolation of either TMP or SCRATCH with respect to other jobs owned by the same user. But that would make an excellent topic to be included in any system policy statement. (Another policy might be that your job submission has to be digitally signed and the signer's certificate has to be signed in turn by a particular CA.) It might be a good idea to codify some best practice on this in the HPC profile.
· How long does data in SCRATCH stay around? Without some normative definition – or at least a normative lower bound – on data lifetime clients will have to assume that the data can vanish arbitrarily and things like multi-job workflows will be very difficult to write if they try to take advantage of SCRATCH space to avoid unnecessary data staging actions to/from a computing facility.
Again, that's something that is a site policy (I think we've locally got a "one month after last use, with some fairly coarse granularity" policy). However, grid systems bring something to the table here in that by describing jobs as resources in their own right (with definite known lifespans) it should be possible to design systems that make better decisions over when a piece of temporary data has become unreferenced and may be deleted. Profiling some best practice here seems sensible.
· From an interoperability and programmatic submission point-of-view, it is important to know which transports any given job submission service can be expected to support. This seems like another area where a normative minimal set that all job submission services must implement needs to be defined.
Agreed, but this is something that we basically punted on. (Also, the notion of what is a source or destination for a staging action turns out to be messy sometimes. Alas.)
Given these questions, as well as the mandate for the HPC profile to define a simple base interface (that can cover the HPC use case of submitting jobs to a compute cluster), I would like to present the following straw man proposal for feedback from this community:
· Restructure the JSDL specification as a small core specification that must be universally implemented – i.e. not just parsable, but also suppliable by all compliant job submission services – and a number of optional extension profiles.
Sounds sensible.
· Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world.
Again, it's not quite clear to me that all those concepts are meaningful in all job types (as opposed to those that are clearly just a way to execute some binary with a bunch of arguments).
· Move data staging to an extension.
I'm not sure about this.
· Create precise definitions of the various concepts introduced in the data staging extension, including normative requirements about whether or not one can change directory up and out of a file system’s root directory, etc.
Good idea.
· Define which transports are expected to be implemented by all compliant services.
Very good idea.
· Move the various enumeration types – e.g. for CPU architecture and OS – to separate specification documents so that they can evolve without requiring corresponding and constant revision of the core JSDL specification.
Excellent idea. :-)
· Define extension profiles (eventually, not right away) that enable richer description of hardware and software requirements, such as details of the CPU architecture or OS capabilities. As part of this, move optimization hints, such as CPU speed and network bandwidth elements out of the JSDL core and into a separate extension profile.
Sounds pretty sensible to me.
· Embrace the issue of how to specify available resources at an execution subsystem. Start by defining a base case that allows the description of compute clusters by creating a compound JSDL document that consists of an outer element that ties together a sequence of individual JSDL elements, each of which describes a single compute node of a compute cluster. Define an explicit notion of extension profiles that could define other ways of describing computational resources beyond just an array of simple JSDL descriptions.
Interesting. Probably a good topic for discussion going forward.
Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases.
Sounds like a reasonable plan to me. Donal.

One thing Donal mentioned which I would like to emphasize: The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware. This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request! karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
One thing Donal mentioned which I would like to emphasize:
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request!
I'm in strong agreement with Karl here (yes, it does happen from time to time! ;-)) I'd go further and state that a resource that I cannot access should not exist at all. Well, at least from my perspective. A side effect of this is that resource discovery and selection *must* be aware of the identity of the user for whom a resource is being chosen (and that in turn means that the EPS spec will have to make non-trivial statements about security. Yuck.) Donal.

Hi; For the case I'm focused on -- namely a BES service exposing what it has available -- one can argue that the service already knows who the requestor is because of standard WS-Security. So it can choose to tailor its answers accordingly if it wants to. By-the-way, I'm now thinking that maybe the simplest case for describing available resources is actually to specify a very small number of aggregate values, such as number of compute nodes that are available, rather than specifying an array of jsdl documents. What do you think? Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Friday, June 09, 2006 3:20 AM To: Karl Czajkowski Cc: Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Karl Czajkowski wrote:
One thing Donal mentioned which I would like to emphasize:
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request!
I'm in strong agreement with Karl here (yes, it does happen from time to time! ;-)) I'd go further and state that a resource that I cannot access should not exist at all. Well, at least from my perspective. A side effect of this is that resource discovery and selection *must* be aware of the identity of the user for whom a resource is being chosen (and that in turn means that the EPS spec will have to make non-trivial statements about security. Yuck.) Donal.

Marvin Theimer wrote:
For the case I'm focused on -- namely a BES service exposing what it has available -- one can argue that the service already knows who the requestor is because of standard WS-Security. So it can choose to tailor its answers accordingly if it wants to.
If the BES is being accessed directly by a service that's able to pass on the requestor identity, that'll work just fine. The complication comes when there's some kind of info-service as a mediator, and you need that to make things scale. On the other hand, I suspect that the answers to that challenge are still a bit too close to being research problems, and can be ignored (from a standards perspective, but not necessarily an implementation perspective) for now.
By-the-way, I'm now thinking that maybe the simplest case for describing available resources is actually to specify a very small number of aggregate values, such as number of compute nodes that are available, rather than specifying an array of jsdl documents. What do you think?
Sounds fair enough to me. :-) Donal.

Karl Czajkowski wrote:
One thing Donal mentioned which I would like to emphasize:
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
Yes! Yes! *waves the supporting flag*
This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request!
Hit base. After reading this mail, we are probably best fitted if we provide *two* resource models. Which may sound impractical, wasted resources or even impossible, I think the idea may be worth exploring: (Note that I take quite some assumptions for granted for the sake of simplicity:) While system administrators are interested in making the best use out of their machines (simple reason: return of investment), job submitters are interested in having their jobs actually executed rather than optimised 'til the last percent (while I acknowledge that there actually are submitters that do want to or even need to optimise that way, I think that this is a relatively small subset of Grid users at least in the future). A solution to this dilemma may be to provide two "languages", one fitting each group best: One language that job submitters use to specify resources they need, which sacrifices accuracy for practicability. This can be very simple, even name/value pairs. The other language, however, aims for maximum accuracy that I envision to be a feature rich, (strongly?) typed representation of their resources. Obviously, these two languages need matching when a job is submitted. The natural candidate for that is (Donal, forgive me for inaccuracy here) "something" coming from the RSS WG. Any boos, rotten eggs? Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834

Michel Drescher wrote
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
Yes! Yes! *waves the supporting flag*
I also agree. From a user's POV, I pretty much don't care on what resource (within the constraints I specify) the job is run, just that it is at some point.
After reading this mail, we are probably best fitted if we provide *two* resource models. Which may sound impractical, wasted resources or even impossible, I think the idea may be worth exploring:
I'm unsure whether one needs *two* different resource models here.
While system administrators are interested in making the best use out of their machines (simple reason: return of investment), job submitters are interested in having their jobs actually executed rather than optimised
I think this observation is correct. The whole idea is that the user does not care about the specific resource, but only about a defined TOS (type of service, for the non-acronym impaired), and even then, only regarding the proper (and potentially early) execution of his job.
A solution to this dilemma may be to provide two "languages", one fitting each group best: One language that job submitters use to specify resources they need, which sacrifices accuracy for practicability. This can be very simple, even name/value pairs.
The other language, however, aims for maximum accuracy that I envision to be a feature rich, (strongly?) typed representation of their resources.
Obviously, these two languages need matching when a job is submitted. The natural candidate for that is (Donal, forgive me for inaccuracy here) "something" coming from the RSS WG.
Maybe this can be covered by one "language". Then, one would need to specify two, let's say, versions of it: - the provider version, which should be strict to ensure accurate descriptions of their service's capabilities and - the consumer version, which is more relaxed in terms of allowing to leave out certain parts in order to be able to specify "loose" constraints for their service requirements IMHO, this would cover what Michel suggested, but make life easier for the involved WGs (not having to maintain two languages which are more or less identical) and -- later -- for the implementors (not having to provide two libraries for more or less the same functionality). Furthermore, the match maker people wouldn't have to suffer that much... Greetings, Alexander -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://www.irf.de/

Hi All, On Fri, Jun 09, 2006 at 01:23:37PM +0200, Alexander Papaspyrou wrote:
Michel Drescher wrote
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
Yes! Yes! *waves the supporting flag*
I also agree. From a user's POV, I pretty much don't care on what resource (within the constraints I specify) the job is run, just that it is at some point.
If my job may run on ten different machines, I may be interested in the one which provides the best value (objective function, min cost, min completion time, ...). Hence, just meeting the constraints may not be sufficient. Cheers, Thomas -- ---------------------------------------------------------------------- Thomas Roeblitz, roeblitz@zib.de, http://www.zib.de/roeblitz ----------------------------------------------------------------------

Thomas Röblitz wrote:
If my job may run on ten different machines, I may be interested in the one which provides the best value (objective function, min cost, min completion time, ...). Hence, just meeting the constraints may not be sufficient.
That's what the Execution Planning Service (the output from the OGSA-RSS group which I chair) is supposed to deal with. Donal.

Hi all,
While system administrators are interested in making the best use out of their machines (simple reason: return of investment), job submitters are interested in having their jobs actually executed rather than optimised
I think this observation is correct. The whole idea is that the user does not care about the specific resource, but only about a defined TOS (type of service, for the non-acronym impaired), and even then, only regarding the proper (and potentially early) execution of his job.
As a user, I strongly disagree. I *am* interested to have my jobs executed as soon as possible, for sure. This means I want them to be sent by a workload management system not just to any site that matches job requirements, but to the best site - e.g., with the fastest processor, bigger memory, better bandwidth etc. I may also be interested in to send them to a cheapest site, or to a fastest site among the cheap ones. I may prefer to stay away from sites that use afs, and I may need to specify that I need inbound connectivity for a worker node. I perhaps only want to use sites in one specific country, for some licensing reasons. There are so many levels of optimization that users need, one can write a book about it. This is not a hypothetical case: I know many users that schedule jobs by hand to sites that in their experience are better, while the workload management system can not tell this from the available information or job description. This manual scheduling is orthogonal to the Grid idea, I dare say. Instead, job specification should include very explicit attributes, including potentially a preferred sysadmin name :-) As it is pretty difficult to define a boundary between generic service levels description and specific informmation for fine-tuning, I would say it's better to stay with one "language" that covers all. Cheers, Oxana

Oxana Smirnova schrieb:
As a user, I strongly disagree. I *am* interested to have my jobs executed as soon as possible, for sure. This means I want them to be sent by a workload management system not just to any site that matches job requirements, but to the best site - e.g., with the fastest processor, bigger memory, better bandwidth etc. I may also be interested in to send them to a cheapest site, or to a fastest site among the cheap ones. I may prefer to stay away from sites that use afs, and I may need to specify that I need inbound connectivity for a worker node. I perhaps only want to use sites in one specific country, for some licensing reasons. There are so many levels of optimization that users need, one can write a book about it.
I agree. However, this is a policy within the Resource Broker / Workload Manager / Scheduler. Sorry if I didn't make this clear (maybe since I'M coming from the scheduling POV). All things that you have mentioned can be interpreted as constraints for your job (price, speed, afs, blue logos on the lower right, ad nauseam).
This is not a hypothetical case: I know many users that schedule jobs by hand to sites that in their experience are better, while the workload management system can not tell this from the available information or job description. This manual scheduling is orthogonal to the Grid idea,
Yes, it is (alas, many users -- at the moment -- know better than the available schedulers). Still, we should keep this open: either the user keeps his requirements general, such that a Resource Broker can decide; or he specifies his constraints in a way that, in the end, only a single (the one the user wants) resource matches. Greetings, Alexander -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://www.irf.de/

Alexander Papaspyrou пишет:
Yes, it is (alas, many users -- at the moment -- know better than the available schedulers). Still, we should keep this open: either the user keeps his requirements general, such that a Resource Broker can decide; or he specifies his constraints in a way that, in the end, only a single (the one the user wants) resource matches.
The way we are testing now with ARC is to let users to specify as much (or as little) requirements as they find suitable with the same language, *and* to allow the user to select different brokering algorithms. That is, one may define a "hello world" job and select the "random" brokering algorithm. Or, one may specify a job that needs a lot of input files, and select the "data proximity" algorithm. I don't suppose there can be a universal algorithm that is optimal for every kind of task, even if the job description is so detailed that it matches only two sites (I hope the Grid will be big enough to find more than one site that satisfies the requirements). Oxana

Oxana Smirnova wrote:
As a user, I strongly disagree. I *am* interested to have my jobs executed as soon as possible, for sure. This means I want them to be sent by a workload management system not just to any site that matches job requirements, but to the best site - e.g., with the fastest processor, bigger memory, better bandwidth etc. I may also be interested in to send them to a cheapest site, or to a fastest site among the cheap ones. I may prefer to stay away from sites that use afs, and I may need to specify that I need inbound connectivity for a worker node. I perhaps only want to use sites in one specific country, for some licensing reasons. There are so many levels of optimization that users need, one can write a book about it.
This is certainly a good indication that there cannot be a fixed set of resource selection descriptions. Any truly workable solution to this problem has to be really quite general. I've already designed (and implemented) such a scheme as it happens. ;-)
This is not a hypothetical case: I know many users that schedule jobs by hand to sites that in their experience are better, while the workload management system can not tell this from the available information or job description. This manual scheduling is orthogonal to the Grid idea, I dare say.
It's certainly reflecting the (too common) case where users and admins are in different warring camps. :-(
Instead, job specification should include very explicit attributes, including potentially a preferred sysadmin name :-) As it is pretty difficult to define a boundary between generic service levels description and specific informmation for fine-tuning, I would say it's better to stay with one "language" that covers all.
I should note that this is properly part of the domain of the OGSA-RSS working group, which I happen to chair. :-) I'm currently looking for a co-chair (more than one would be cool!) so that I'm not completely overworked. Volunteers will be able to partake in numerous benefits, such as invitations to speak at OGSA F2F meetings and the chance to go to the Chairs' Appreciation Night... Donal.

Hi; I'm going to reply to a bunch of the previous emails in this one. As several people, including Oxana and Karl and Michel and Donal, have pointed out, the fully general problem of describing job requirements and the resources available from various candidate execution systems is awfully hard. Indeed, it is arguably still a research problem and hence not something for defining standards about. The HPC profile work is about defining a standard for those simple situations that are well-understood and common enough that a standard would provide some tangible benefit in terms of allowing multiple implementations of things like job schedulers and job scheduling client libraries to interoperate with each other. From that point-of-view, I need to divide the world of JSDL and RSS solutions into at least two parts: a (very simple) part that we understand well enough to standardize now and a research part that requires further exploration and should NOT be standardized now. This is why I keep banging the drum about structuring things as base cases and extensions, so that I can define a very simple base case and a few simple extensions for purposes of the HPC profile standards work now (i.e. this summer) while allowing a graceful path to the future for all the really interesting and important work yet to come. So, what should be the base case and the few simple extensions that should get standardized by the HPC profile working group now (meaning this summer)? I would love to hear peoples' opinions and feedback -- especially in terms of deltas on the straw man proposal that I posted in my previous email(s). :-) With thanks in advance, Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Friday, June 09, 2006 7:53 AM To: Oxana Smirnova Cc: Alexander Papaspyrou; Michel Drescher; Karl Czajkowski; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Oxana Smirnova wrote:
As a user, I strongly disagree. I *am* interested to have my jobs executed as soon as possible, for sure. This means I want them to be sent by a workload management system not just to any site that matches
job requirements, but to the best site - e.g., with the fastest processor, bigger memory, better bandwidth etc. I may also be interested in to send them to a cheapest site, or to a fastest site among the cheap ones. I may prefer to stay away from sites that use afs, and I may need to specify that I need inbound connectivity for a worker node. I perhaps only want to use sites in one specific country, for some licensing reasons. There are so many levels of optimization that users need, one
can write a book about it.
This is certainly a good indication that there cannot be a fixed set of resource selection descriptions. Any truly workable solution to this problem has to be really quite general. I've already designed (and implemented) such a scheme as it happens. ;-)
This is not a hypothetical case: I know many users that schedule jobs by hand to sites that in their experience are better, while the workload management system can not tell this from the available information or job description. This manual scheduling is orthogonal to the Grid idea, I dare say.
It's certainly reflecting the (too common) case where users and admins are in different warring camps. :-(
Instead, job specification should include very explicit attributes, including potentially a preferred sysadmin name :-) As it is pretty difficult to define a boundary between generic service levels description and specific informmation for fine-tuning, I would say it's better to stay with one "language" that covers all.
I should note that this is properly part of the domain of the OGSA-RSS working group, which I happen to chair. :-) I'm currently looking for a co-chair (more than one would be cool!) so that I'm not completely overworked. Volunteers will be able to partake in numerous benefits, such as invitations to speak at OGSA F2F meetings and the chance to go to the Chairs' Appreciation Night... Donal.

Marvin Theimer wrote:
So, what should be the base case and the few simple extensions that should get standardized by the HPC profile working group now (meaning this summer)? I would love to hear peoples' opinions and feedback -- especially in terms of deltas on the straw man proposal that I posted in my previous email(s). :-)
I think the most accurate characterisation of my position would be that I really support this effort and I want you to succeed, and on the timetable you've proposed. (It also fits very well with what I believe the general aims of the GGF[*] over the coming couple of years to be.) Any quibbling I've done is over minor things in the hope of making the result better and more practical from an implementor's and user's PoV. Donal. [* Or rather, the organization shortly to be formerly known as GGF. ]

Hi; If you structure things to explicitly support evolution and extension then I believe you get the desired effects: - Simple base case - Extensions that selectively and optionally add richness while building on the base - An explicit sanction to make early progress on the simple, well-understood parts of a problem without preventing future evolution to richer, more complete solutions. Marvin. -----Original Message----- From: Alexander Papaspyrou [mailto:alexander.papaspyrou@udo.edu] Sent: Friday, June 09, 2006 4:24 AM To: Michel Drescher Cc: Karl Czajkowski; Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Michel Drescher wrote
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
Yes! Yes! *waves the supporting flag*
I also agree. From a user's POV, I pretty much don't care on what resource (within the constraints I specify) the job is run, just that it is at some point.
After reading this mail, we are probably best fitted if we provide *two* resource models. Which may sound impractical, wasted resources or even impossible, I think the idea may be worth exploring:
I'm unsure whether one needs *two* different resource models here.
While system administrators are interested in making the best use out of their machines (simple reason: return of investment), job submitters are interested in having their jobs actually executed rather than optimised
I think this observation is correct. The whole idea is that the user does not care about the specific resource, but only about a defined TOS (type of service, for the non-acronym impaired), and even then, only regarding the proper (and potentially early) execution of his job.
A solution to this dilemma may be to provide two "languages", one fitting each group best: One language that job submitters use to specify resources they need, which sacrifices accuracy for practicability. This can be very simple, even name/value pairs.
The other language, however, aims for maximum accuracy that I envision to be a feature rich, (strongly?) typed representation of their resources.
Obviously, these two languages need matching when a job is submitted. The natural candidate for that is (Donal, forgive me for inaccuracy here) "something" coming from the RSS WG.
Maybe this can be covered by one "language". Then, one would need to specify two, let's say, versions of it: - the provider version, which should be strict to ensure accurate descriptions of their service's capabilities and - the consumer version, which is more relaxed in terms of allowing to leave out certain parts in order to be able to specify "loose" constraints for their service requirements IMHO, this would cover what Michel suggested, but make life easier for the involved WGs (not having to maintain two languages which are more or less identical) and -- later -- for the implementors (not having to provide two libraries for more or less the same functionality). Furthermore, the match maker people wouldn't have to suffer that much... Greetings, Alexander -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://www.irf.de/

From that perspective, I want a simple, practicable means of specifying both job submission requirements as well as available resource descriptions (where we all agree that "available" means "available to
Hi; While it is certainly interesting to explore multiple languages with different levels of expressiveness, the HPC profile work faces a challenge that I would argue is mostly focused on your first point around simplicity. This is because interoperability gets harder the more complicated the design and the working group also faces a time deadline of end-of-summer. So, whereas I very much encourage you and others to explore richer languages, the one I'm interested in (for the moment) is the first one. the requestor", not "raw availability"). I see two "base" cases for describing available resources: - Some simple ways of describing aggregates, such as the number of available compute nodes in a cluster or the overall "load" of a system (a number between 0" and 100%). - A simple way of describing the actual hardware/software resources available in a system so that clients like simple meta-schedulers can at least get at the raw data (with all the caveats about how much of the raw data to expose to any given requestor). This second type of description seems like it could be achieved with Donal Fellows' suggestion of an array of jsdl infosets. I fully recognize that these two base cases only cover some of the common scenarios that can occur in grids, but I would argue that they cover an important set and they are relatively easy to provide, implying that the HPC profile work could employ them without too much delay. Marvin. -----Original Message----- From: Michel Drescher [mailto:Michel.Drescher@uk.fujitsu.com] Sent: Friday, June 09, 2006 3:35 AM To: Karl Czajkowski Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Karl Czajkowski wrote:
One thing Donal mentioned which I would like to emphasize:
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
Yes! Yes! *waves the supporting flag*
This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request!
Hit base. After reading this mail, we are probably best fitted if we provide *two* resource models. Which may sound impractical, wasted resources or even impossible, I think the idea may be worth exploring: (Note that I take quite some assumptions for granted for the sake of simplicity:) While system administrators are interested in making the best use out of their machines (simple reason: return of investment), job submitters are interested in having their jobs actually executed rather than optimised 'til the last percent (while I acknowledge that there actually are submitters that do want to or even need to optimise that way, I think that this is a relatively small subset of Grid users at least in the future). A solution to this dilemma may be to provide two "languages", one fitting each group best: One language that job submitters use to specify resources they need, which sacrifices accuracy for practicability. This can be very simple, even name/value pairs. The other language, however, aims for maximum accuracy that I envision to be a feature rich, (strongly?) typed representation of their resources. Obviously, these two languages need matching when a job is submitted. The natural candidate for that is (Donal, forgive me for inaccuracy here) "something" coming from the RSS WG. Any boos, rotten eggs? Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834

Marvin: I think one decision to make is whether BES services are homogeneous or not. I think Donal is advocating homogeneity. However, I do not think this is the main source of complexity. In either case, I agree with you that JSDL ought to be usable as a core syntax for describing the "resources available from a BES instance" as well as the "resources required for an activity". As you describe it, this is sort of a "class ad" in the Condor sense of the word. The problem comes from trying to advertise a resource that can handle multiple jobs simultaneously. The tricky part is that this is not just "nodes free", but must be intersected with policies such as maximum job size. Should there be a vocabulary for listing the total free resources and the job sizing policies directly? Or should the advertisement list a set of jobs that can be supported simultaneously, e.g. I publish 512 nodes as quanity 4 128-node job availability slots? The latter is easier to match, but probably doesn't work in the simple case because of combinatoric problem of grouping jobs which are not maximal. How does a user know that they can have quantity 8 64-node jobs or not? Also, I am ignoring the very real problem of capturing per-user policies. I do not think it is as simple as returning a customized response for the authenticating client. How is middleware supposed to layer on top of BES here? How does a meta-scheduler know whether quantity 8 64-node jobs can be accepted for one user? For 8 distinct users? Does a (shared) meta-scheduler now need to make separate queries for every client? How does it understand the interference of multiple user jobs? I think there is really a need for a composite availability view so such metaschedulers can reasonably think about a tentative future, in which they try to subdivide and claim parts of the BES resource for multiple jobs. Can this be handled with a declarative advertisement, or does it require some transactional dialogue? The transactional approach seems too tightly coupled to me, i.e. I should be able to compute a sensible candidate plan before I start negotiating. If we say all of this is too "researchy" for standardization, then I am not sure what the standard will really support. Perhaps the best approach is the first one I mentioned, where relatively raw data is exposed on several extensible axes (subject to authorization checks): overall resource pool descriptions, job sizing policies, user rights information, etc. The simple users may only receive a simple subset of this information which requires minimal transformation to tell them what they can submit. The middleware clients receive more elaborate data (if trusted) and can do more elaborate transformation of the data to help their planning. The only alternative I can imagine, right now, would be a very elaborate resource description language utilizing the JSDL "range value" concept to expose some core policy limits, as well as a number of extensions to express overall constraints which define the outer bounds of the combinatoric solution space. This DOES seem pretty "researchy" to me... but maybe someone else sees a more appealing middle ground? karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
I think one decision to make is whether BES services are homogeneous or not. I think Donal is advocating homogeneity. However, I do not think this is the main source of complexity. In either case, I agree with you that JSDL ought to be usable as a core syntax for describing the "resources available from a BES instance" as well as the "resources required for an activity". As you describe it, this is sort of a "class ad" in the Condor sense of the word. The problem comes from trying to advertise a resource that can handle multiple jobs simultaneously.
I'd largely agree with this paragraph, except that I'd note that I'm only advocating that a BES instance export a (maximal) homogenous view of itself, and that more complex configurations be modelled as multiple containers. This helps keep the reasoning for resource selection simple, and that's good because the reasoning is already quite complex.
The tricky part is that this is not just "nodes free", but must be intersected with policies such as maximum job size. Should there be a vocabulary for listing the total free resources and the job sizing policies directly? Or should the advertisement list a set of jobs that can be supported simultaneously, e.g. I publish 512 nodes as quanity 4 128-node job availability slots? The latter is easier to match, but probably doesn't work in the simple case because of combinatoric problem of grouping jobs which are not maximal. How does a user know that they can have quantity 8 64-node jobs or not?
Firstly, a container should not publish a capacity that exceeds the max size of job it is willing to accept. The size of the physical resource is uninteresting; the queue capacity is what matters. Secondly, the ability to run simultaneous isolated jobs is independent of that. That's instead something that is discovered by talking to some reservation service and discovering that you can get two (or more) overlapping reservations for the same container. I don't think anyone is working on standardising reservation services at the moment (and nor can they really be described as being part of the simple HPC profile; there are lots of resources out there without reservation capability and they still work fine). In short, in the example above the user *doesn't* know that they can have that number of jobs running at once, but they can know that they can submit that many jobs to that resource and have them run eventually (according to system policy). Of course, a site might choose to publish additional information about their system policies allowing users to really know such things. But again, they might not; local autonomy rules after all. [eliding more points that are interesting, but a bit imponderable]
If we say all of this is too "researchy" for standardization, then I am not sure what the standard will really support. Perhaps the best approach is the first one I mentioned, where relatively raw data is exposed on several extensible axes (subject to authorization checks): overall resource pool descriptions, job sizing policies, user rights information, etc. The simple users may only receive a simple subset of this information which requires minimal transformation to tell them what they can submit. The middleware clients receive more elaborate data (if trusted) and can do more elaborate transformation of the data to help their planning.
I advocate a simple model. If a BES publishes a resource description to me, then it should accept a job from me that asks for the maximal resources from that description. It need not execute that job straight away, but should do so as soon as reasonably possible given workload (and other things like maintenance periods, etc.) Like this, there is no need to publish further policy information; any relevant policies have already been applied by the time an invitation to offer is sent. Now, while it is possible that there are some configurations that will not be captured by this (e.g. ascribing a polynomial scoring function to each resource type and capping the maximum total score), what I describe is going to capture what most people do I think. It's also pretty easy to implement and roll out.
The only alternative I can imagine, right now, would be a very elaborate resource description language utilizing the JSDL "range value" concept to expose some core policy limits, as well as a number of extensions to express overall constraints which define the outer bounds of the combinatoric solution space. This DOES seem pretty "researchy" to me... but maybe someone else sees a more appealing middle ground?
I think I'm probably with Marvin on this. Let's get something that's workable for the 90% case without closing off the 10% from being tackled in the future (though maybe by a different route). That's got to be the way with the best payoff in the next 6-12 months. Donal.

Hi; One thing that I think might be useful is to enumerate use cases to try to identify the simple, common use cases. In particular, I suspect that looking at what various existing production (as compared to research) meta-schedulers support will be instructive since they already face the task of trying to schedule against programmatically defined interfaces for existing BES-like services (as compared to human clients, who can eyeball a given service's resource descriptions/policies and then perform scheduling decisions in wetware). My first cut at looking at various meta-schedulers - including LSF and Condor - is that the following types of resource descriptions get used: * Simple aggregate descriptions, such as the number of available CPUs in a subsidiary scheduling service (e.g. a compute cluster), the average CPU load, and the job queue length. * Named queues that jobs can be submitted/forwarded to. LSF, in particular, allows for the definition of a variety of queues that are effectively globally visible and that a LSF (meta-)scheduler can forward jobs to. A concrete example is a "fan-in" scenario, in which a large central compute cluster accepts "large jobs" from a set of distributed workgroup clusters to which human users submit their various jobs. A "large job" queue is defined that all the workgroup cluster schedulers are aware of and users can then submit all their jobs to their local workgroup cluster. For large jobs they submit it to the "large job" queue and the workgroup scheduler forwards jobs received for that queue to the central compute cluster's job scheduler. * "Full" descriptions of subsidiary system compute resources. In this case the meta-scheduler receives "full" information about all the compute resources that exist in all of the subsidiary scheduling systems. LSF supports this with a notion of "resource leasing", where a compute cluster's LSF scheduler can lease some (or all) of its compute nodes to a remote LSF scheduler. In that case all the state information that would normally go to the local LSF scheduler about the leased nodes is also forwarded to the remote scheduler owning the lease. Condor supports something similar with its class-ads design. A meta-scheduler will receive class-ad descriptions for all the compute nodes that it may do match-making for. In this case, a "full" description consists of whatever has been put into the class-ads by the individual compute nodes participating in the system. I would love to hear from other members of the community what their characterization of common simple use cases is. Also, it would be great if people who could provide additional characterization of various existing production meta-schedulers would post such information to the mailing list (or point me to where that information has already been posted if I'm unaware of it :-)). Several things leap to my mind from looking at these usage examples: * A few relatively simple, standardized aggregate description quantities can enable one of the most desirable common use cases, namely spreading volumes of high-throughput jobs across multiple clusters in an approximately load-balanced manner. * Condor's extensible class-ad design, of arbitrary name-value pairs with some number of standardized names, has been fairly successful and provides a lot of flexibility. As an example, note that LSF's job forwarding queues can be implemented as class-ad elements. Note that the open-ended nature of class-ads means that any installation can define its own queues (with associated semantics) that are meaningful within, for example, a particular organization. * To efficiently describe something like the leased compute nodes of an LSF cluster or the class-ads for an entire compute cluster may require introducing the notion of arrays of descriptions/class-ads. * The key to interoperability is to define a useful set of standard elements that clients can specify in job submission requests and that resource management services (including compute nodes and schedulers) can advertise. The interesting/key question is how small this set can be while still enabling an interesting set of actual usage scenarios. It would be interesting to know what LSF exports when leasing compute nodes to remote LSF schedulers and what the "commonly used" set of class-ad terms is across representative sets of Condor installations. (I'm guessing that the JSDL working group has already looked at questions like this and has some sense of what the answers are?) I know that the JSDL WG is already discussing the topic of class-ad-like approaches. I guess I'm placing a vote in favor of looking at such a design approach and adding the question of what a beginning "base" set of standardized class-ad names should be. This would be "one" approach that might be workable without requiring that we first solve one or more research problems. If JSDL is structured to allow for multiple approaches then it would allow for progress now while not excluding more ambitious approaches of the kind Karl outlined in his email. Marvin. -----Original Message----- From: owner-ogsa-bes-wg@ggf.org [mailto:owner-ogsa-bes-wg@ggf.org] On Behalf Of Karl Czajkowski Sent: Saturday, June 10, 2006 10:39 PM To: Marvin Theimer Cc: Michel Drescher; Donal K. Fellows; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Marvin: I think one decision to make is whether BES services are homogeneous or not. I think Donal is advocating homogeneity. However, I do not think this is the main source of complexity. In either case, I agree with you that JSDL ought to be usable as a core syntax for describing the "resources available from a BES instance" as well as the "resources required for an activity". As you describe it, this is sort of a "class ad" in the Condor sense of the word. The problem comes from trying to advertise a resource that can handle multiple jobs simultaneously. The tricky part is that this is not just "nodes free", but must be intersected with policies such as maximum job size. Should there be a vocabulary for listing the total free resources and the job sizing policies directly? Or should the advertisement list a set of jobs that can be supported simultaneously, e.g. I publish 512 nodes as quanity 4 128-node job availability slots? The latter is easier to match, but probably doesn't work in the simple case because of combinatoric problem of grouping jobs which are not maximal. How does a user know that they can have quantity 8 64-node jobs or not? Also, I am ignoring the very real problem of capturing per-user policies. I do not think it is as simple as returning a customized response for the authenticating client. How is middleware supposed to layer on top of BES here? How does a meta-scheduler know whether quantity 8 64-node jobs can be accepted for one user? For 8 distinct users? Does a (shared) meta-scheduler now need to make separate queries for every client? How does it understand the interference of multiple user jobs? I think there is really a need for a composite availability view so such metaschedulers can reasonably think about a tentative future, in which they try to subdivide and claim parts of the BES resource for multiple jobs. Can this be handled with a declarative advertisement, or does it require some transactional dialogue? The transactional approach seems too tightly coupled to me, i.e. I should be able to compute a sensible candidate plan before I start negotiating. If we say all of this is too "researchy" for standardization, then I am not sure what the standard will really support. Perhaps the best approach is the first one I mentioned, where relatively raw data is exposed on several extensible axes (subject to authorization checks): overall resource pool descriptions, job sizing policies, user rights information, etc. The simple users may only receive a simple subset of this information which requires minimal transformation to tell them what they can submit. The middleware clients receive more elaborate data (if trusted) and can do more elaborate transformation of the data to help their planning. The only alternative I can imagine, right now, would be a very elaborate resource description language utilizing the JSDL "range value" concept to expose some core policy limits, as well as a number of extensions to express overall constraints which define the outer bounds of the combinatoric solution space. This DOES seem pretty "researchy" to me... but maybe someone else sees a more appealing middle ground? karl -- Karl Czajkowski karlcz@univa.com

For clarity, I should add that I was not intending to exclude any performance or dynamic availability indicators. I was merely trying to emphasize that what is needed is something predictive: the types of jobs that are acceptable, which ought to include "terms of service" parameters such as scheduling response time in a non-trivial QoS environment. This is not the same thing as an administrative overview, because the full resource set is meaningless without taking into account the operating policies (min/max job sizes and job durations) and the availability of the backing resources as influenced by the (site-specific) scheduling policy and load. And of course, this predictive data will exist in a continuum of precision, accuracy, and determinism. I don't want my statement below to be misconstrued as saying the user-level discovery is easy. On the contrary, it is MUCH more difficult than merely enumerating resources for an administrative user or "console" application. Oxana makes a point later on that I also would endorse. In the absence of good solutions to this problem, the administrative view is necessary to allow fallback. The status quo of our field is that users are "application administrators" who make all sorts of cunning decisions to coax the system into behaving more optimally than it ought to based on its own internal capabilities. I don't think one of these views can be addressed to the exclusion of the other. karl On Jun 09, Karl Czajkowski modulated:
One thing Donal mentioned which I would like to emphasize:
The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware.
This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request!
-- Karl Czajkowski karlcz@univa.com

Hi; I totally agree. You make a very important point. That said, the simplest common case to define may be an array of available resources. Specifying richer policy restrictions can quickly lead down the slippery slope to very complicated, very difficult-to-provide designs and hence must be done with great care. Marvin. -----Original Message----- From: Karl Czajkowski [mailto:karlcz@univa.com] Sent: Friday, June 09, 2006 3:08 AM To: Donal K. Fellows Cc: Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view One thing Donal mentioned which I would like to emphasize: The discovery ought to be "what types of job are acceptable" and not what resources are there. Or rather, the latter is part of some administrative interface which is misleading for job-submitting users and middleware. This may sound pedantic, but it will be crucial for interop. The discovery has to capture realistic operating policy, and not just give enticing catalogues of resources which can never be combined in a single request! karl -- Karl Czajkowski karlcz@univa.com

Hi; I think with processor types we just grabbed a snapshot of the CIM model and went with that; updating to use a later version of that would not cause great difficulty (though the reverse problem might then exist, in that it might become more difficult to say that any kind of x86 arch is OK for a particular job). However, I believe we would assume the following interpretation of processor requirements: if specified, that's what they want for all processors associated with the job. If they didn't specify, they didn't care and anything is therefore good enough. Agreed. Also, one possibility is to explicitly specify some of the commonly occurring "semi-bound" scenarios, such as "any x86" architecture. I'm not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general. Sounds fairly reasonable, though the abstract filesystem stuff has real uses in that it makes it much easier to write a job request that deals with things like varying locations of home directories and scratch space. The alternative is to assume that temporary files are always written to somewhere like /tmp, immediately stuffing interop even between Unix-based HPC centres (we don't write large files to /tmp here because that's not a cluster-wide resource and is therefore not very useful) let alone with any Windows-based service. But it is entirely reasonable to support mount points and sources by saying things like "if it doesn't match my current configuration, I'll fault". That is most certainly a legal interpretation of how to process a JSDL document. This is probably an issue that ought to be covered in the primer, when we finally write it. :-) If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore. Since the HPC profile base case treats data staging as being out-of-scope, the base interface profile will exclude these; but that can be done independently of anything else. (And, of course, the data staging extension to the HPC profile will need to deal with this subject in any case, even if it's ignored in the base case.) Strictly this is outside the scope of JSDL, where we've stuck firmly to the niche of describing user requests and not the things with which those requests may be satisfied. However, I do have some ideas on this. :-) The HPC profile (and BES) have to deal with the issue of describing available resources. So, one way or the other, the subject will get addressed this summer. As much as possible, I'd like to avoid duplicating the work done in JSDL for that - if for no other reason than that users will likely be unhappy if they have to learn two different ways of describing what they will perceive as being variations of the core concept, namely resource description - both required and available. Maybe other approaches would be better, but the matter of resource description is politically tricky for this WG since it gets into space claimed by others. Any advice on this subject would be greatly appreciated. As I said above, I have to deal with this subject one way or the other and would prefer to do so with the minimum of feather-ruffling (while still making progress that results in a usable HPC profile by the end of the summer). Good point. I suppose our response to this should be contingent on whether "context location" (i.e. working directory) can be defined for all currently conceived-of job types. I don't know how to answer this yet. It's certainly possible for many of the things we've identified, but all? If you allow for the notion of file systems and mount points to be in the core spec then I would argue that you are implicitly buying into systems that also support the notion of current working directory (some jobs may of course not use it). We don't specify. Portable applications don't change directory at all in my experience; it's too full of strange behaviour as the meaning of all relative paths change... I would argue that one not specifying allowed/disallowed behaviors is a bad approach when interoperability is at issue. (I'm talking about disallowing "cd .." out the top, not disallowing change-directory within the subtree specified by a file system element.) Fair points, and I'd usually assume that the root FS was not writable. It probably is fairly Unix-specific. But it does make life much easier for integrating with legacy job systems which can handle the other FS types by translation into the root and adding a prefix to the paths. FWIW, I wouldn't use ROOT in my jobs. :-) Again, from an interop point-of-view, this seems dangerous. It might be a good idea to codify some best practice on this in the HPC profile. Agreed. Regarding your reactions to my straw man proposal, it seems like you pretty much agree with everything except the following: * You're not convinced how universal the posix extension elements for things like command-line arguments and working directory are. My response is that I think they are at least as universal as the data staging elements. * You don't want to move the data staging section out of the core specification. For the HPC profile base case, data staging elements will be prohibited since they are out-of-scope for the base case. The HPC profile extension for data staging will allow the JSDL data staging elements. Whether or not these should be in a separate JSDL extension or whether they can be generalized to cover a wide(r) range of systems is a topic for future discussion. * You're leery of tackling the resource description problem. Understandable, although the HPC profile working group will have to and will be seeking guidance from the JSDL and other communities on how to do so. Is that a fair characterization of your position? Thanks, Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Friday, June 09, 2006 2:45 AM To: Marvin Theimer Cc: JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Marvin Theimer wrote: > Coming from the point-of-view of the HPC Profile working group, I have > several questions about JSDL, as well as some straw man thoughts about > how JSDL should/could relate to the HPC Profile specification that I'm > involved with. Some of my questions lead me to restrictions on JSDL > that an HPC profile specification might make. Other questions lead to > potential changes that might be made as part of creating future versions > of JSDL. (I'm well aware that JSDL 1.0 was meant as a starting point > rather than the final word on job submission descriptions and so please > interpret my questions as being an attempt at constructive suggestions > rather than a criticism of a very fine first step by the JSDL working > group.) I'm going to work through these things as I read through them, so the answers (well, my answers) might be a little disjointed. :-) > At a high level, there are several general questions that came up when > reading the JSDL 1.0 specification: > > * Can JSDL documents describe jobs other than Linux/Unix/Posix > jobs? For example, things like mount points and mount sources do not > map in a completely straight-forward manner to how file systems are > provided in the Windows world. Most certainly. The intent is that ultimately JSDL jobs should be able to describe pretty much any request for an atomic activity, and the POSIXApplication stuff was just a seed so that at least one common case would be handled by the initial specification. Work is ongoing with an extension to that to support parallel (mainly MPI, but also some other archtectures too) jobs, and we've had in mind other kinds of jobs for a while (including SQL jobs, Web-service invokation jobs, and JVM jobs, but obviously not limited to those). On the matter of mount points, the interpretation of a mount source is not that the mount source should be mounted at the mount point, but rather that the job should fail if the mount is not present. Now, a JSDL consumer might react to that failure by trying to perform the mount, but it is not required. (The meaning of the name of the mount source is not defined IIRC, though it probably ought to be URI-like, meaning that SMB mounts would work fine under windows with suitable munging.) We'd hope that most jobs would not actually specify the mount point, but would instead use the facilities provided by the JSDL abstract file system processing semantics to adapt to whatever was available. > * Is JSDL expressive enough to describe all the needs of a job? > For example, it is unclear how one would specify a requirement for > something like a particular instruction set variation of the IA86 > architecture (e.g. the SSE3 version of the Pentium) or how one would > specify that AMD processors are required rather than Intel ones (because > the optimized libraries and the optimizations generated by the compiler > used will differ for each). For another example, it is unclear how one > would specify that all the compute nodes used for something like an MPI > job should have the same hardware. I think with processor types we just grabbed a snapshot of the CIM model and went with that; updating to use a later version of that would not cause great difficulty (though the reverse problem might then exist, in that it might become more difficult to say that any kind of x86 arch is OK for a particular job). However, I believe we would assume the following interpretation of processor requirements: if specified, that's what they want for all processors associated with the job. If they didn't specify, they didn't care and anything is therefore good enough. > * How will JSDL's normative set of enumeration values for things > like processor architecture and operating system be kept up-to-date and > relevant? Also, how should things like operating system version get > specified in a normative manner that will enable interoperability among > multiple clients and job scheduling services? For example, things like > Linux and Windows versions are constantly being introduced, each with > potentially significant differences in capabilities that a job might > depend on. Without a normative way of specifying these constantly > evolving version sets it will be difficult, if not impossible, to create > interoperable job submission clients and job scheduling services > (including meta-scheduling services where multiple schedulers must > interoperate with each other). I don't know. :-) Maybe we should say that additional things as defined in some other model (e.g. CIM) SHOULD be accepted? (As I said above, we just took a snapshot of that model; updating isn't really a big deal.) > * Although JSDL specifies a means of including additional > non-normative elements and attributes in a document, non-normative > extensions make interoperability difficult. This implies the need for > normative extensions to JSDL beyond the Posix extension currently > described in the 1.0 specification. Are there plans to define > additional extension profiles to address the above questions surrounding > expressive power and normative descriptions of things like current OS > types and versions? We do not currently have *specific* plans to do this, but that does not mean we cannot have such specific plans in fairly short order. :-) > * If one accepts the need for a variety of extension profiles > then this raises the question of what should be in the base case. For > example, it could be argued that data staging - with its attendant > aspects such as mount points and mount sources - should be defined in an > extension rather than in the core specification that will need to cover > a variety of systems beyond just Linux/Unix/Posix. Similarly, one might > argue that the base case should focus on what's /functionally/ necessary > to execute a job correctly and should leave things that are > "optimization hints", such as CPU speed and network bandwidth > specifications, to extension profiles. Sounds fairly reasonable, though the abstract filesystem stuff has real uses in that it makes it much easier to write a job request that deals with things like varying locations of home directories and scratch space. The alternative is to assume that temporary files are always written to somewhere like /tmp, immediately stuffing interop even between Unix-based HPC centres (we don't write large files to /tmp here because that's not a cluster-wide resource and is therefore not very useful) let alone with any Windows-based service. But it is entirely reasonable to support mount points and sources by saying things like "if it doesn't match my current configuration, I'll fault". That is most certainly a legal interpretation of how to process a JSDL document. This is probably an issue that ought to be covered in the primer, when we finally write it. :-) > * How are concepts such as IndividualCPUSpeed and > IndividualNetworkBandwidth intended to be defined and used in practice? > I understand the concept of specifying things like the amount of > physical memory or disk space that a job will require in order to be > able to run. However, CPU speed and network bandwidth don't represent > functional requirements for a job - meaning that a job will correctly > run and produce the same results irrespective of the CPU speed and > network bandwidth available to it. Also, the current definitions seem > fuzzy: the megahertz number for a CPU does not tell you how fast a given > compute node will be able to execute various kinds of jobs, given all > the various hardware factors that can affect the performance of a > processor (consider the presence/absence of floating point support, the > memory caching architecture, etc.). Similarly, is network bandwidth > meant to represent the theoretical maximum of a compute node's network > interface card? Is it expected to take into account the performance of > the switch that the compute node is attached to? Since switch > performance is partially a function of the pattern of (aggregate) > traffic going through it, the network bandwidth that a job such as an > MPI application can expect to receive will depend on the /type/ of > communications patterns employed by the application. How should this > aspect of network bandwidth be reflected - if at all - in the network > bandwidth values that a job requests and that compute nodes advertise? CPU speed is a fairly meaningless value really, since it is at best only a poor approximant to application performance (which is what people are really interested in) though app-perf is not portable in any sensible way as you can't extrapolate from the performance of one application to that of another. But it's probably the best we've got (we could do FLOPS or MIPS instead I suppose, but I suspect neither is much better). Network bandwidth is worse, because it is only meaningful when defined with respect to a defined pair of endpoints (or, more particularly here, w.r.t. a defined remote endpoint, since the other one is defined by where the job is submitted to). What's worse is that latency isn't defined at all, and that's at least as important for complex apps. In short, I think we didn't get the network bandwidth right. :-\ However, the general policy of accepting quality-of-service requirements on resources is one I agree with, since they really do matter and they are constraints on whether a particular resource is fit for the user's purpose. > * JSDL is intended for describing the requirements of a job being > submitted for execution. To enable matchmaking between submitted jobs > and available computational resources there must also be a way of > describing existing/available resources. While much of JSDL can be used > for this purpose, it is also clear that various extensions are > necessary. For example, to describe a compute cluster requires that one > be able to specify the resources for each compute node in the cluster > (which may be a heterogeneous lot). Similarly, to describe a compute > node with multiple network interfaces would require an extension to the > current model, which assumes that only a single instance of such things > can exist. This raises the question of whether something other than > JSDL is intended to be used for describing available computational > resources or whether there are intensions to extend JSDL to enable it to > describe such resources. Strictly this is outside the scope of JSDL, where we've stuck firmly to the niche of describing user requests and not the things with which those requests may be satisfied. However, I do have some ideas on this. :-) JSDL terms can indeed be used for resource description, and this is because you can interpret them as saying something like "this is the maximal set of processors I will allocate to any job you submit". The UniGrids project has looked at several ways to do such resource descriptions based over JSDL. The simplest model we've found was to say that each target system service (BES-analog) supports a single unified homogenous resource description, and that where we have a heterogenous cluster we describe that as multiple services, each with smaller claims of range of resources allocated to it. This allows for a simple resource model and matching rules, but it covers the 90% case neatly. Let me flesh that out with an example. Suppose we have a cluster of machines, four from Intel (with 2GB memory each) and four from AMD (two with 1GB, two with 4GB). This induces 5 services, with resource claims as follows: * 2 AMD processors, 4GB * 4 AMD processors, 1GB * 4 Intel processors, 2GB * 6 x86 processors, 2GB * 8 x86 processors, 1GB It should be noted that these separate services woud actually be pretty cheap in our implementation, since we can host them in the same container at a cost of a few extra objects. :-) Maybe other approaches would be better, but the matter of resource description is politically tricky for this WG since it gets into space claimed by others. > * The current specification stipulates that conformant > implementations must be able to parse all the elements and attributes > defined in the spec, but doesn't require that any of them be supplied. > Thus, a scheduling service that does nothing could claim to be compliant > as long as it can correctly parse JSDL documents. For interoperability > purposes, I would argue that the spec should define a minimum set of > elements that any compliant service must be able to supply. Otherwise > clients will not be able to make any assumptions about what they can > specify in a JSDL document and, in particular, client applications that > programmatically submit job submission requests will not be possible > since they can't assume that any valid JSDL document will actually be > acceptable by any given job submission service. I'd argue that this profiling of JSDL should be done by BES or yourselves (the HPC profile). This is because there are other cases (e.g. as synchronization points in workflow processing) where null jobs are actually useful. > * I have a number of questions about data staging: I have one major observation: the data staging stuff is known to be a long way off imperfect. > * Although the notions of working directory and environment > variables are defined in the posix extension, they are implicitly > assuming in the data staging section of the core specification. This > implies to me that either (a) data staging is made an extension or (b) > these concepts are made a normative, required part of the core > specification. Good point. I suppose our response to this should be contingent on whether "context location" (i.e. working directory) can be defined for all currently conceived-of job types. I don't know how to answer this yet. It's certainly possible for many of the things we've identified, but all? > * Recursive directory copying can be specified, but is not > required to be supplied by any job submission service. This makes it > difficult to write applications that programmatically define their data > staging needs since they cannot in the current design determine whether > any given job submission service implements recursive directory > copying. In practice this may mean that programmatically generated job > submissions will only ever use lists of individual files to stage. It means that only _interoperable_ ones will do that, but I think there are already implementations of directory staging out there and clients that are generating jobs that use it. I may be wrong though. :-) > * The current definitions of the well-known file systems seem > imprecise to me. In particular: > > * What are the navigation rules associated with each? Can you cd > out of the subtree that each represents? ROOT almost certainly does not > allow that. Is there an assumption that one can cd out of HOME or TMP > or SCRATCH? Hopefully not, since that would make these file systems > even more Unix/Linux-centric, plus one would now need to specify what > clients can expect to see when they do so. We don't specify. Portable applications don't change directory at all in my experience; it's too full of strange behaviour as the meaning of all relative paths change... > * What is ROOT intended to be used for? Are there assumptions > about what resides under root? Are there assumptions about what an > application can read/write under the ROOT subtree? (ROOT also seems > like the most Unix-specific of the 4 file system types defined.) Fair points, and I'd usually assume that the root FS was not writable. It probably is fairly Unix-specific. But it does make life much easier for integrating with legacy job systems which can handle the other FS types by translation into the root and adding a prefix to the paths. FWIW, I wouldn't use ROOT in my jobs. :-) > * What are the sharing/consistency semantics of each file system > in situations where a job is a multi-node application running on > something like a cluster? Is HOME visible to all compute nodes in a > data-consistent manner? I'm guessing that TMP would be assumed to be > strictly local to each compute node, so that things like MPI > applications would need to be cognizant that they are writing multiple > files to multiple separate storage systems when they write to a file in > TMP - and furthermore that data staging of such files after a job has > run will result in multiple files that all map to the same target file. I've been assuming that (or at least configuring our local systems so that) TMP was node-local and SCRATCH was cluster-wide. > * Can other users write over or delete your data in TMP and/or > SCRATCH? Is data in these file systems visible to other users or does > each job get its own private TMP and SCRATCH? I'd assume that other users never can overwrite your data and wouldn't make any assumptions at all about the level of isolation of either TMP or SCRATCH with respect to other jobs owned by the same user. But that would make an excellent topic to be included in any system policy statement. (Another policy might be that your job submission has to be digitally signed and the signer's certificate has to be signed in turn by a particular CA.) It might be a good idea to codify some best practice on this in the HPC profile. > * How long does data in SCRATCH stay around? Without some > normative definition - or at least a normative lower bound - on data > lifetime clients will have to assume that the data can vanish > arbitrarily and things like multi-job workflows will be very difficult > to write if they try to take advantage of SCRATCH space to avoid > unnecessary data staging actions to/from a computing facility. Again, that's something that is a site policy (I think we've locally got a "one month after last use, with some fairly coarse granularity" policy). However, grid systems bring something to the table here in that by describing jobs as resources in their own right (with definite known lifespans) it should be possible to design systems that make better decisions over when a piece of temporary data has become unreferenced and may be deleted. Profiling some best practice here seems sensible. > * From an interoperability and programmatic submission > point-of-view, it is important to know which transports any given job > submission service can be expected to support. This seems like another > area where a normative minimal set that all job submission services must > implement needs to be defined. Agreed, but this is something that we basically punted on. (Also, the notion of what is a source or destination for a staging action turns out to be messy sometimes. Alas.) > Given these questions, as well as the mandate for the HPC profile to > define a simple base interface (that can cover the HPC use case of > submitting jobs to a compute cluster), I would like to present the > following straw man proposal for feedback from this community: > > * Restructure the JSDL specification as a small core > specification that must be universally implemented - i.e. not just > parsable, but also suppliable by all compliant job submission services - > and a number of optional extension profiles. Sounds sensible. > * Declare concepts such as executable path, command-line > arguments, environment variables, and working directory to be generic > and include them in the core JSDL specification rather than the posix > extension. This may enable the core specification to support things > like Windows-based jobs (TBD). The goal here is to define a core JSDL > specification that in-and-of-itself could enable job submission to a > fairly wide range of execution subsystems, including both the > Unix/Linux/Posix world and the Windows world. Again, it's not quite clear to me that all those concepts are meaningful in all job types (as opposed to those that are clearly just a way to execute some binary with a bunch of arguments). > * Move data staging to an extension. I'm not sure about this. > * Create precise definitions of the various concepts introduced > in the data staging extension, including normative requirements about > whether or not one can change directory up and out of a file system's > root directory, etc. Good idea. > * Define which transports are expected to be implemented by all > compliant services. Very good idea. > * Move the various enumeration types - e.g. for CPU architecture > and OS - to separate specification documents so that they can evolve > without requiring corresponding and constant revision of the core JSDL > specification. Excellent idea. :-) > * Define extension profiles (eventually, not right away) that > enable richer description of hardware and software requirements, such as > details of the CPU architecture or OS capabilities. As part of this, > move optimization hints, such as CPU speed and network bandwidth > elements out of the JSDL core and into a separate extension profile. Sounds pretty sensible to me. > * Embrace the issue of how to specify available resources at an > execution subsystem. Start by defining a base case that allows the > description of compute clusters by creating a compound JSDL document > that consists of an outer element that ties together a sequence of > individual JSDL elements, each of which describes a single compute node > of a compute cluster. Define an explicit notion of extension profiles > that could define other ways of describing computational resources > beyond just an array of simple JSDL descriptions. Interesting. Probably a good topic for discussion going forward. > Now, as presented above, my straw man proposal looks like suggestions > for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In > the near-term, the HPC profile working group will be exploring what can > be done with just JSDL-1.0 and restrictions to that specification. The > restrictions would correspond to disallowing those parts of the JSDL-1.0 > specification that the above proposal advocates moving to extension > profiles. It will also explore whether a restricted version of the > posix extension could be used to cover most common Windows cases. Sounds like a reasonable plan to me. Donal.

On Jun 09, Marvin Theimer modulated: ...
Agreed. Also, one possibility is to explicitly specify some of the commonly occurring “semi-bound” scenarios, such as “any x86” architecture. I’m not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general.
It is a flat enumeration set, I believe, so some logical composition model would be required, e.g. "give me cpu type i386 OR i486 OR pentium OR pentium-m OR opteron ..." BTW, I sometimes wonder if CIM is the right model here. The above values are a subset of those available to GCC to control ABI and processor-specific optimizations. They seem pretty aggressive about introducing new processor support in a backwards-compatible manner. Perhaps a slightly "weaker-typed" constraint ought to be defined to accept strings which are interpreted as a compilers-specific concept? I haven't thought about this too much, so maybe I am off base. But in some sense, compiler settings are closest to what the user/app really cares about for predictability. On the negative side, it means having to support multiple compiler-specific namespaces for these constraints, and having reasonable best practices for translating between them in service implementations.
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I’m thinking of things like saying that you can’t navigate “out” of a file system via “cd ..”, etc. This is definitely something to explore.
Since the HPC profile base case treats data staging as being out-of-scope, the base interface profile will exclude these; but that can be done independently of anything else. (And, of course, the data staging extension to the HPC profile will need to deal with this subject in any case, even if it’s ignored in the base case.)
I'm not sure if scoping away staging means we can ignore storage system structure. How will one compose a staging service activity with a compute activity if there is no underlying model for how the storage is manipulated and named?
Any advice on this subject would be greatly appreciated. As I said above, I have to deal with this subject one way or the other and would prefer to do so with the minimum of feather-ruffling (while still making progress that results in a usable HPC profile by the end of the summer).
Personally, I think it is silly to say that a "space" can be claimed by others! Anyone with good ideas and willingness to contribute should feel free to develop ideas. If they are great, maybe the others will realized they would benefit by working together. If the others are unsure, maybe there will have to be competing proposals and a decision made in the marketplace... top-down architecture really doesn't work very well with voluntary, collaborative projects. karl -- Karl Czajkowski karlcz@univa.com

Hi; One of the advantages of trying to leverage CIM is that there is already lots of support and tooling out there for it, so generating CIM-based solutions may be pragmatically desirable. Regarding data staging, my argument for making it an extension is that there are lots of systems where explicit data staging is unnecessary and hence that's why it's not in the HPC base use case. Anyone who needs to deal with data staging -- whether as part of a job, part of a larger workflow, or part of a data staging service -- will need to employ the data staging extension to JSDL and the HPC profile. Regarding the question of who "owns" which part of the problem space within GGF, that's something I really want to stay away from. As co-chair of the HPC profile working group, with a mandate to get things done by the end-of-summer, I need to arrive at concrete specifications irrespective of by whom or how they are generated. I'm happy to work with anyone and I'm more than happy to share credit with everyone. :-) Marvin. -----Original Message----- From: owner-ogsa-bes-wg@ggf.org [mailto:owner-ogsa-bes-wg@ggf.org] On Behalf Of Karl Czajkowski Sent: Friday, June 09, 2006 9:48 PM To: Marvin Theimer Cc: Donal K. Fellows; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] RE: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view On Jun 09, Marvin Theimer modulated: ...
Agreed. Also, one possibility is to explicitly specify some of the commonly occurring "semi-bound" scenarios, such as "any x86" architecture. I'm not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general.
It is a flat enumeration set, I believe, so some logical composition model would be required, e.g. "give me cpu type i386 OR i486 OR pentium OR pentium-m OR opteron ..." BTW, I sometimes wonder if CIM is the right model here. The above values are a subset of those available to GCC to control ABI and processor-specific optimizations. They seem pretty aggressive about introducing new processor support in a backwards-compatible manner. Perhaps a slightly "weaker-typed" constraint ought to be defined to accept strings which are interpreted as a compilers-specific concept? I haven't thought about this too much, so maybe I am off base. But in some sense, compiler settings are closest to what the user/app really cares about for predictability. On the negative side, it means having to support multiple compiler-specific namespaces for these constraints, and having reasonable best practices for translating between them in service implementations.
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore.
Since the HPC profile base case treats data staging as being out-of-scope, the base interface profile will exclude these; but that can be done independently of anything else. (And, of course, the data staging extension to the HPC profile will need to deal with this subject in any case, even if it's ignored in the base case.)
I'm not sure if scoping away staging means we can ignore storage system structure. How will one compose a staging service activity with a compute activity if there is no underlying model for how the storage is manipulated and named?
Any advice on this subject would be greatly appreciated. As I said above, I have to deal with this subject one way or the other and would prefer to do so with the minimum of feather-ruffling (while still making progress that results in a usable HPC profile by the end of the summer).
Personally, I think it is silly to say that a "space" can be claimed by others! Anyone with good ideas and willingness to contribute should feel free to develop ideas. If they are great, maybe the others will realized they would benefit by working together. If the others are unsure, maybe there will have to be competing proposals and a decision made in the marketplace... top-down architecture really doesn't work very well with voluntary, collaborative projects. karl -- Karl Czajkowski karlcz@univa.com

The processor type values were *not* taken from CIM. (We looked at CIM but decided it was not appropriate in this case.) The values are "based on the Instruction Set Architecture (ISA) names of a small number of common processor architectures." (p10 of the spec.) You can make the statement for "any x86" with the defined values. Marvin Theimer wrote:
Hi;
I think with processor types we just grabbed a snapshot of the CIM model
and went with that; updating to use a later version of that would not
cause great difficulty (though the reverse problem might then exist, in
that it might become more difficult to say that any kind of x86 arch is
OK for a particular job).
However, I believe we would assume the following interpretation of
processor requirements: if specified, that's what they want for all
processors associated with the job. If they didn't specify, they didn't
care and anything is therefore good enough.
Agreed. Also, one possibility is to explicitly specify some of the commonly occurring “semi-bound” scenarios, such as “any x86” architecture. I’m not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general.
-- Andreas Savva Fujitsu Laboratories Ltd

Marvin Theimer wrote:
Agreed. Also, one possibility is to explicitly specify some of the commonly occurring “semi-bound” scenarios, such as “any x86” architecture. I’m not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general.
The problem with CIM (certainly up to the most recent release, 2.12) is that they model processor types as a pair: (enum value, string) with the string being used only when the enum is "Other". Moreover, the only matching model that you can have using such a model is exact matching; for example, there is no concept that "80386" (the string rendering of value 5 for the CIM_Processor.Family field) is effectively subsumed by "80486" (value 6). Given that, I'd say that while CIM is a nice catalog of processor types (particularly given that someone else is maintaining it ;-)) it's utterly impossible to use for resource selection since the one thing you can say for sure is that people won't match the processor types precisely. What we'd need is some kind of partial ordering (or set of POs, I fear) that allows inexact matching of processor types, but that is itself quite a bit of work (and it's ongoing work too). And such relations are not at all clean to describe in CIM either; although they've got a rich relation model, they're about relating classes and not values of enumerations. Fixing the model that way would entail a major change to the CIM core itself I think, and not just defining more classes on top. Either that or changing the specification of the processor part of CIM to model processor families as classes, which would be very disruptive to what is a pretty static (and hence widely deployed) of CIM. And IIRC, CIM's OS types (CIM_OperatingSystem.OSType) suffer from the same problem. (If I have a binary that works on Win2k, I'd expect it to be fine on WinServer2003 for example, but that's not information captured within CIM.) CIM does description of what is out there well. But it's a very poor base for anything with much richer semantics than equality. I hope this message explains why adequately. Donal.

I think you've definitely touched on the problem with using the CIM model, but I have to imagine that the people discussing these issues in the DMTF working groups have faced exactly the same issue as we're seeing, and the result is that they went for this simple approach. So we can go through the same exercise they did, perhaps with a chance that we arrive at a different approach which allows us more flexibility, but I'm guessing we'd arrive at the same place. How do we keep up (in standardization) with the fast changing world around us? I for one want to make sure that when someone asks me for a particular OS or processor type, that I can understand the "token" that I'm given. -- Chris On 12/6/06 01:28, "Donal K. Fellows" <donal.k.fellows@manchester.ac.uk> wrote:
Marvin Theimer wrote:
Agreed. Also, one possibility is to explicitly specify some of the commonly occurring ³semi-bound² scenarios, such as ³any x86² architecture. I¹m not familiar enough with the CIM world to know if they can provide us with guidance on how to solve the problem in general.
The problem with CIM (certainly up to the most recent release, 2.12) is that they model processor types as a pair: (enum value, string) with the string being used only when the enum is "Other". Moreover, the only matching model that you can have using such a model is exact matching; for example, there is no concept that "80386" (the string rendering of value 5 for the CIM_Processor.Family field) is effectively subsumed by "80486" (value 6). Given that, I'd say that while CIM is a nice catalog of processor types (particularly given that someone else is maintaining it ;-)) it's utterly impossible to use for resource selection since the one thing you can say for sure is that people won't match the processor types precisely. What we'd need is some kind of partial ordering (or set of POs, I fear) that allows inexact matching of processor types, but that is itself quite a bit of work (and it's ongoing work too). And such relations are not at all clean to describe in CIM either; although they've got a rich relation model, they're about relating classes and not values of enumerations. Fixing the model that way would entail a major change to the CIM core itself I think, and not just defining more classes on top. Either that or changing the specification of the processor part of CIM to model processor families as classes, which would be very disruptive to what is a pretty static (and hence widely deployed) of CIM.
And IIRC, CIM's OS types (CIM_OperatingSystem.OSType) suffer from the same problem. (If I have a binary that works on Win2k, I'd expect it to be fine on WinServer2003 for example, but that's not information captured within CIM.)
CIM does description of what is out there well. But it's a very poor base for anything with much richer semantics than equality. I hope this message explains why adequately.
Donal.

Christopher Smith wrote:
I think you've definitely touched on the problem with using the CIM model, but I have to imagine that the people discussing these issues in the DMTF working groups have faced exactly the same issue as we're seeing, and the result is that they went for this simple approach. So we can go through the same exercise they did, perhaps with a chance that we arrive at a different approach which allows us more flexibility, but I'm guessing we'd arrive at the same place. How do we keep up (in standardization) with the fast changing world around us? I for one want to make sure that when someone asks me for a particular OS or processor type, that I can understand the "token" that I'm given.
I'm not at all convinced that they've got sufficiently similar terms of reference to us for them to have hit the same problem. If your problem domain is "describe what is out there in a format compatible with big iron databases" then CIM does that nicely. OTOH, it would be nice if there was a group in the DMTF that was prepared to work on this, not just because that would take the work out of our hands :-) but also because it is introducing a level of semantic richness that will stand the CIM model in good stead. Though I'm not sure if most CPU and OS designers are actually good enough to satisfy the requirements of a true partial order over their components. I know both groups too well to trust them to get that right; CPU designers are too keen on "cute" hacks and OS designers tend to not believe just how important keeping things *really* compatible is... If we (GGF) do the subsumption ordering ourselves, we need to check whether our process can take such regular publication of recommendation track documents. It seemed rather slow the last time we tried. :-\ (Also, perhaps we'd want to do this outside of the JSDL activity; it's only tangentially related at best.) Donal.

On Jun 12, Donal K. Fellows modulated: ...
Though I'm not sure if most CPU and OS designers are actually good enough to satisfy the requirements of a true partial order over their components. I know both groups too well to trust them to get that right; CPU designers are too keen on "cute" hacks and OS designers tend to not believe just how important keeping things *really* compatible is...
I'd go a step further, and say that these relationships are application specific and not intrinsic to the OS or architecture. The underlying matching system should include some logical combining mechanism, and "user friendly" client-side tools should allow imposition of certain ordering idioms if this is desired. I don't think ANY standards body will capture all of the combinations which matter to applications, unless they produce a logical combination structure and canonically name nearly every combination. ;-) karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
I'd go a step further, and say that these relationships are application specific and not intrinsic to the OS or architecture. The underlying matching system should include some logical combining mechanism, and "user friendly" client-side tools should allow imposition of certain ordering idioms if this is desired. I don't think ANY standards body will capture all of the combinations which matter to applications, unless they produce a logical combination structure and canonically name nearly every combination. ;-)
If I remember right, this is why the only matching semantics that are actually defined for operating system or CPU type by the JSDL spec are "exact match" despite that being not very useful in most cases. Another case that has the same problem is application version; some apps define a sane scheme, but others are just mad (for example, the Linux version scheme where the low bit of the second digit indicates stability status and not actual versioning, and Java product versioning is driven by marketing and not technical utility). Exact matching is the only universal matching plan, but it does not serve users well; if a point release has been done to fix some obscure bug, why should that invalidate a user's job submissions? They probably weren't bitten by that bug in the first place. The best approach for users is for them to only give CPU types, OS types/versions and application versions when they *really* need it, and for them to leave those resource facets unspecified otherwise. Indeed, that works well in highly managed environments (such as those targetted by UNICORE) where applications are things that are installed by admins and users just use them. That's obviously not what everyone wants, but perhaps it is all that can be done interoperably. :-( Donal.

I think you've definitely touched on the problem with using the CIM model, but I have to imagine that the people discussing these issues in the DMTF working groups have faced exactly the same issue as we're seeing, and
result is that they went for this simple approach. So we can go
-----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Monday, June 12, 2006 3:38 PM To: JSDL Working Group; ogsa-bes-wg@ggf.org Cc: Marvin Theimer; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Christopher Smith wrote: the through the
same exercise they did, perhaps with a chance that we arrive at a different approach which allows us more flexibility, but I'm guessing we'd arrive at the same place. How do we keep up (in standardization) with the fast changing world around us? I for one want to make sure that when someone asks me for a particular OS or processor type, that I can understand the "token" that I'm given.
I'm not at all convinced that they've got sufficiently similar terms of reference to us for them to have hit the same problem. If your problem domain is "describe what is out there in a format compatible with big iron databases" then CIM does that nicely. OTOH, it would be nice if there was a group in the DMTF that was prepared to work on this, not just because that would take the work out of our hands :-) but also because it is introducing a level of semantic richness that will stand the CIM model in good stead. Though I'm not sure if most CPU and OS designers are actually good enough to satisfy the requirements of a true partial order over their components. I know both groups too well to trust them to get that right; CPU designers are too keen on "cute" hacks and OS designers tend to not believe just how important keeping things *really* compatible is... If we (GGF) do the subsumption ordering ourselves, we need to check whether our process can take such regular publication of recommendation track documents. It seemed rather slow the last time we tried. :-\ (Also, perhaps we'd want to do this outside of the JSDL activity; it's only tangentially related at best.) Donal. [MARVIN] Is there a pragmatic "next step" to take to try to solve this dilemma? If the main problem you see is that GGF will need to define a somewhat different document publication process for things like subsumption specs then that's at least something that is in our (meaning GGF's) control. Marvin.

Marvin Theimer wrote:
[MARVIN] Is there a pragmatic "next step" to take to try to solve this dilemma? If the main problem you see is that GGF will need to define a somewhat different document publication process for things like subsumption specs then that's at least something that is in our (meaning GGF's) control.
The immediate pragmatic step is to say that only exact comparison of CPU, OS and App types/versions is possible, and to recommend that jobs leave as much information out as possible. Actually, the same problem exists with the JSDL Host resource, which should also usually be left out. It's going to be easy for people to overspecify their requirements, but since there are going to be use-cases for such highly specified stuff I can't see it being practical to drop them completely. Any tackling of the version satisfaction relation is going to be hard and will take a while to do. As such, it had better be out of scope of the HPC profile. ;-) It's probably also out of scope of JSDL, and should be the domain of the core info modelling crowd (once they've stopped rearranging the syntactic deck-chairs). Donal.

On Jun 16, Donal K. Fellows modulated: ...
The immediate pragmatic step is to say that only exact comparison of CPU, OS and App types/versions is possible, and to recommend that jobs leave as much information out as possible. Actually, the same problem exists with the JSDL Host resource, which should also usually be left out. It's going to be easy for people to overspecify their requirements, but since there are going to be use-cases for such highly specified stuff I can't see it being practical to drop them completely.
What about introducing a disjunctive constraint syntax to allow listing of multiple types, and only one needs to match? This is similar to the host matching, where obviously no single host matches all the different names... it also seems consistent w/ the use of the range-value concept on numeric constraints. karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
What about introducing a disjunctive constraint syntax to allow listing of multiple types, and only one needs to match? This is similar to the host matching, where obviously no single host matches all the different names... it also seems consistent w/ the use of the range-value concept on numeric constraints.
Disjunctions are much more complex to match (and matching those range-value type instancess against each other isn't pretty, having written that code) and are going to be correspondingly slower to specify too. If we can target most of the real traffic with a much simpler solution, I think we should go for that as a first effort. Given the projected timescale for the first rev of the HPC profile, that's got to be the best approach. Donal (shooting for "good enough" and not "best").

Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I’m thinking of things like saying that you can’t navigate “out” of a file system via “cd ..”, etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't. In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Since the HPC profile base case treats data staging as being out-of-scope, the base interface profile will exclude these; but that can be done independently of anything else. (And, of course, the data staging extension to the HPC profile will need to deal with this subject in any case, even if it’s ignored in the base case.)
Perhaps. Virtual FS references also sneak into arguments, stdio redirection, working directory specs, and environment variables, all of which may well need to be defined with respect to some VFS root. One way of dealing with this is to define a minimal set of VFSes in the profile (probably based on the JSDL set?) and state that profile-conforming job submissions will only use those.
The HPC profile (and BES) /have/ to deal with the issue of describing available resources. So, one way or the other, the subject will get addressed this summer. As much as possible, I’d like to avoid duplicating the work done in JSDL for that – if for no other reason than that users will likely be unhappy if they have to learn two different ways of describing what they will perceive as being variations of the core concept, namely resource description – both required and available.
Sure, but you're using the elements to have subtly different meanings. Not that I'm objecting; just pointing it out. :-) The best bet I'd guess is to state that the published elements represent a description of the maximal job (maximal capacity, most specific capability) supported by the container. You'll also need to manufacture a way of describing the set of apps (and other related things, like libraries) supported. I don't think that the way JSDL does this (even under the "maximal" interpretation outlined above) is going to work in a sane-enough way.
Any advice on this subject would be greatly appreciated. As I said above, I have to deal with this subject one way or the other and would prefer to do so with the minimum of feather-ruffling (while still making progress that results in a usable HPC profile by the end of the summer).
You've got a mandate to ruffle feathers. You're writing a profile. JSDL isn't a profile, and so we (with my JSDL hat on) need to be a little bit more circumspect, if only to keep the noise down. :-)
If you allow for the notion of file systems and mount points to be in the core spec then I would argue that you are implicitly buying into systems that also support the notion of current working directory (some jobs may of course not use it).
Good argument. :-)
I would argue that one not specifying allowed/disallowed behaviors is a bad approach when interoperability is at issue. (I’m talking about disallowing “cd ..” out the top, not disallowing change-directory within the subtree specified by a file system element.)
I'd certainly say that it is a feature nobody has to implement, and which no interoperable job may make use of. (I don't see any way of preventing the job itself from trying to do a 'cd ..' internally, nor even any way to properly analyse whether the job will try to do it. Any security checks required by a VFS interpretation must still be enforced at runtime anyway.) [re root filesystems]
Again, from an interop point-of-view, this seems dangerous.
This seems to me like an argument in favour of not including ROOT in the HPC profile. That doesn't cause me problems, though I suspect that everyone building a BES on top of unix will add it. (Who knows, perhaps it would be in a POSIX-HPC sub-profile?)
Regarding your reactions to my straw man proposal, it seems like you pretty much agree with everything except the following:
· You’re not convinced how universal the posix extension elements for things like command-line arguments and working directory are. My response is that I think they are at least as universal as the data staging elements.
I'm likely agree with you there.
· You don’t want to move the data staging section out of the core specification. For the HPC profile base case, data staging elements will be prohibited since they are out-of-scope for the base case. The HPC profile extension for data staging will allow the JSDL data staging elements. Whether or not these should be in a separate JSDL extension or whether they can be generalized to cover a wide(r) range of systems is a topic for future discussion.
Sounds fair enough. The inclusion of data staging was so that JSDL would be relevant in cases that don't really match the HPC case, such as clusters where everything has to be staged into the machines for anything at all to work, and where no real workflow coordinator was available. (Myself, I prefer the data staging to be in a separate workflow step, but workflows are not really in scope of any GGF working group at the moment.) IIRC, it was Darren Pulsipher who had this use case.
· You’re leery of tackling the resource description problem. Understandable, although the HPC profile working group will have to and will be seeking guidance from the JSDL and other communities on how to do so.
I think it's hard and some people have very entrenched views on this. But I also think that you can get something workable for the 90% case together in short order here. I suggest that the "homogenous container" model under the maximal resource interpretation is good enough. :-)
Is that a fair characterization of your position?
Pretty fair. I can't speak for whether it is a characterisation of anyone else's position. :-) Donal.

Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I’m thinking of things like saying that you can’t navigate “out” of a file system via “cd ..”, etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't.
+1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Again, +1 (and having it normatively mentioned in the JSDL publication)
[...]
If you allow for the notion of file systems and mount points to be in the core spec then I would argue that you are implicitly buying into systems that also support the notion of current working directory (some jobs may of course not use it).
Good argument. :-)
Yes. However, it does *not* imply that the current working directory a) is always the same regardless the container a job has been sent to b) is even relative to (or relatively addressable to or from) a particular other directory on the container (e.g. common libs dir) c) is configured to be "cd'd out" of to another directory elsewhere on the container d) is even something that should be externally published in its local incarnation e) or any other "abstractable"(?) directory (e.g. "ROOT") is actually a mount point in the sense of NFS. (Yes, using "moint point" as element name in JSDL may be a bad choice.) I guess there is a need to agree on an abstract naming pattern for such use cases. We should short cut with WS-Naming on this issue, I think. Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834

Comments inline. Michel Drescher wrote:
Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I’m thinking of things like saying that you can’t navigate “out” of a file system via “cd ..”, etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't.
+1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
-1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect.
In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Again, +1 (and having it normatively mentioned in the JSDL publication)
I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively. -- Andreas Savva Fujitsu Laboratories Ltd

Hi; If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid. Marvin. -----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 11:10 PM To: Michel Drescher Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Comments inline. Michel Drescher wrote:
Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't.
+1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
-1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect.
In fact, I'd be happy enough with the profile stating that paths in
JSDL
documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Again, +1 (and having it normatively mentioned in the JSDL publication)
I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively. -- Andreas Savva Fujitsu Laboratories Ltd

Hi Marvin, I think that, in general, if a restriction the HPC profile chooses to make is specific to its domain then, yes, it is possible that some other profile may choose to make different choices. Are you saying that JSDL must be changed to fit exactly what the HPC profile requires? What we do with the topics below is still under discussion of course. As I said on the call if the group decides something is a bug then we should fix it. Andreas Marvin Theimer wrote:
Hi;
If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid.
Marvin.
-----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 11:10 PM To: Michel Drescher Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Comments inline.
Michel Drescher wrote:
Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore. Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't. +1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
-1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect.
In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced. Again, +1 (and having it normatively mentioned in the JSDL publication)
I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively.
-- Andreas Savva Fujitsu Laboratories Ltd

On Jun 16, Andreas Savva modulated:
Hi Marvin,
I think that, in general, if a restriction the HPC profile chooses to make is specific to its domain then, yes, it is possible that some other profile may choose to make different choices. Are you saying that JSDL must be changed to fit exactly what the HPC profile requires?
If I might interpret, I think the issue is that there is a subtle and qualitative judgment involved in deciding whether something is: a) a basic syntax and semantics that ought to underlie every profile b) a restriction of the base to simplify a profile c) an extension of the base for a domain-specific profile d) a contradiction of the base in a profile I am sure that the people who have been involved in JSDL for a long period of time would have a certain level of "shared taste" as to how to draw these lines. Other people looking from the outside might not appreciate the history of interactions which create this group perspective. Alas, I do not think there is any simple technical or procedural solution to the make these distinctions and to support evolution of ideas. At best, we can periodically reassure each other that we are willing to consider new viewpoints and alternatives, while also being patient to "educate" new arrivals, thereby having the necessary hysteresis in the process to make progress and yet avoid thrashing. karl -- Karl Czajkowski karlcz@univa.com

In short, yes. Thanks Karl. Sometimes things might have been left unspecified for a reason---because, for example, we decided we could not define grid-default behaviour in a language specification. Sometimes because we might have missed something. I think the HPCP effort is good since it will help us make these distinctions clearer and also to see whether we 'drew the lines' at reasonable places. Andreas Karl Czajkowski wrote:
On Jun 16, Andreas Savva modulated:
Hi Marvin,
I think that, in general, if a restriction the HPC profile chooses to make is specific to its domain then, yes, it is possible that some other profile may choose to make different choices. Are you saying that JSDL must be changed to fit exactly what the HPC profile requires?
If I might interpret, I think the issue is that there is a subtle and qualitative judgment involved in deciding whether something is:
a) a basic syntax and semantics that ought to underlie every profile
b) a restriction of the base to simplify a profile
c) an extension of the base for a domain-specific profile
d) a contradiction of the base in a profile
I am sure that the people who have been involved in JSDL for a long period of time would have a certain level of "shared taste" as to how to draw these lines. Other people looking from the outside might not appreciate the history of interactions which create this group perspective.
Alas, I do not think there is any simple technical or procedural solution to the make these distinctions and to support evolution of ideas. At best, we can periodically reassure each other that we are willing to consider new viewpoints and alternatives, while also being patient to "educate" new arrivals, thereby having the necessary hysteresis in the process to make progress and yet avoid thrashing.
karl
-- Andreas Savva Fujitsu Laboratories Ltd

Marvin Theimer wrote:
If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid.
Yes, but ideally we'll be able to arrange things - maybe not in the first iteration - so that the HPC profile is always just a restriction of the JSDL spec. In that case, we're still all OK. It's the cases where you (hypothetically) introduce strict semantic extensions that we'll need to be careful in. Not that you shouldn't; they're just the places to watch out. As long as we keep talking to each other, we'll manage somehow. :-) Donal.

Matching in with some of the comments from Andreas the JSDL specification is the language and not how it is used. If there are mistakes in the language these should be fixed. That said there may be some justification for writing a base profile document which all profiles (such as HPC) should conform with. Thoughts? steve.. Marvin Theimer wrote:
Hi;
If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid.
Marvin.
-----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 11:10 PM To: Michel Drescher Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Comments inline.
Michel Drescher wrote:
Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough
and
precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like
saying
that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points.
Systems
can do it that way, but at least some won't.
+1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
-1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect.
In fact, I'd be happy enough with the profile stating that paths in
JSDL
documents should not contain either the "." or the ".." elements at
all.
That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Again, +1 (and having it normatively mentioned in the JSDL publication)
I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively.
-- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

A S McGough wrote:
Thoughts?
While I don't approve of thinking on Friday afternoon :-) I believe I agree with you. The only complication is the breadth of possible kinds of application types; we don't want to over-restrict ourselves there so that we can't do SQL queries or invoke webservices or the other things we talked about (thinking back to Hawaii). Donal.

Now thinking back to Hawaii on a Friday afternoon - that's a good thing ;-) Yes - this is one of my concerns with specifying things. We don't want to restrict things. Though we could have base cases such as "running an executable" steve.. Donal K. Fellows wrote:
A S McGough wrote:
Thoughts?
While I don't approve of thinking on Friday afternoon :-) I believe I agree with you. The only complication is the breadth of possible kinds of application types; we don't want to over-restrict ourselves there so that we can't do SQL queries or invoke webservices or the other things we talked about (thinking back to Hawaii).
Donal.
-- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

Hi; This is in response to several emails that were sent out on the subject of what things to define in JSDL and what things to define in profiles that layer on top of JSDL. I understand the desire to avoid restricting JSDL to just the HPC use cases. That said, actual implementations of JSDL that desire to be interoperable will have to pick precise definitions for what each JSDL element means. All I'm advocating is that we stick to good software engineering principles and avoid ending up in situations where someone who wants to run both HPC and non-HPC workloads has to needlessly deal with context-sensitive definitions of JSDL elements. Along those lines, I like your suggestion of defining a base profile document that seeks to specify those things that we believe are "universally" definable across all workloads. I haven't, however, thought about how large - or how small - such a base profile would be. Marvin. ________________________________ From: A S McGough [mailto:asm@doc.ic.ac.uk] Sent: Friday, June 16, 2006 7:47 AM To: Marvin Theimer Cc: Andreas Savva; Michel Drescher; Donal K. Fellows; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Matching in with some of the comments from Andreas the JSDL specification is the language and not how it is used. If there are mistakes in the language these should be fixed. That said there may be some justification for writing a base profile document which all profiles (such as HPC) should conform with. Thoughts? steve.. Marvin Theimer wrote: Hi; If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid. Marvin. -----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 11:10 PM To: Michel Drescher Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Comments inline. Michel Drescher wrote: Donal K. Fellows wrote: Marvin Theimer wrote: If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore. Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't. +1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway. -1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect. In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced. Again, +1 (and having it normatively mentioned in the JSDL publication) I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively. -- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

Marvin, In essence you are correct here. We don't want to end up with different non-interoperable JSDL use cases. Though likewise we need to ensure that we don't restrict the use of JSDL. Perhaps the best approach here is for different groups (HPC etc) to propose profiles and for the JSDL people to distill this into a base profile? Thoughts? steve.. Marvin Theimer wrote:
Hi;
This is in response to several emails that were sent out on the subject of what things to define in JSDL and what things to define in profiles that layer on top of JSDL.
I understand the desire to avoid restricting JSDL to just the HPC use cases. That said, actual implementations of JSDL that desire to be interoperable will /have/ to pick precise definitions for what each JSDL element means. All I'm advocating is that we stick to good software engineering principles and avoid ending up in situations where someone who wants to run both HPC and non-HPC workloads has to needlessly deal with context-sensitive definitions of JSDL elements. Along those lines, I like your suggestion of defining a base profile document that seeks to specify those things that we believe are "universally" definable across all workloads. I haven't, however, thought about how large -- or how small -- such a base profile would be.
Marvin.
------------------------------------------------------------------------
*From:* A S McGough [mailto:asm@doc.ic.ac.uk] *Sent:* Friday, June 16, 2006 7:47 AM *To:* Marvin Theimer *Cc:* Andreas Savva; Michel Drescher; Donal K. Fellows; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) *Subject:* Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Matching in with some of the comments from Andreas the JSDL specification is the language and not how it is used. If there are mistakes in the language these should be fixed. That said there may be some justification for writing a base profile document which all profiles (such as HPC) should conform with. Thoughts?
steve..
Marvin Theimer wrote:
Hi;
If the HPC profile defines the semantics of something and the JSDL spec doesn't then that implies that some other profile is free to define the semantics differently. Is that really what you want to allow? That seems like it will invite unexpected mishaps for anyone who tries to run both HPC and other workloads on a grid.
Marvin.
-----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 11:10 PM To: Michel Drescher Cc: Donal K. Fellows; Marvin Theimer; JSDL Working Group; ogsa-bes-wg@ggf.org <mailto:ogsa-bes-wg@ggf.org>; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [ogsa-bes-wg] Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Comments inline.
Michel Drescher wrote:
Donal K. Fellows wrote:
Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough
and
precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like
saying
that you can't navigate "out" of a file system via "cd ..", etc. This is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points.
Systems
can do it that way, but at least some won't.
+1 from me. In fact, I think this should be part of JSDL in a "maintenance release" sort of publication anyway.
-1 from me for adding this in JSDL. It is not a language issue. I do think the HPC Profile should probably speak to this with respect to the execution environment that a job should expect.
In fact, I'd be happy enough with the profile stating that paths in
JSDL
documents should not contain either the "." or the ".." elements at
all.
That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Again, +1 (and having it normatively mentioned in the JSDL publication)
I too see this is a profiling issue. I have no problem for the HPC profile to make a stronger statement than the JSDL spec on this as a security consideration. So -1 from me for adding this in the JSDL spec normatively.
-- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm <http://www.doc.ic.ac.uk/%7Easm> ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------
-- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

Hi;
[...]
If you allow for the notion of file systems and mount points to be in
the core spec then I would argue that you are implicitly buying into systems that also support the notion of current working directory (some jobs may of course not use it).
Good argument. :-)
Yes. However, it does *not* imply that the current working directory a) is always the same regardless the container a job has been sent to b) is even relative to (or relatively addressable to or from) a particular other directory on the container (e.g. common libs dir) c) is configured to be "cd'd out" of to another directory elsewhere on the container d) is even something that should be externally published in its local incarnation e) or any other "abstractable"(?) directory (e.g. "ROOT") is actually a mount point in the sense of NFS. (Yes, using "moint point" as element name in JSDL may be a bad choice.) I guess there is a need to agree on an abstract naming pattern for such use cases. We should short cut with WS-Naming on this issue, I think. [MARVIN] This is an excellent summary of some of the things that need to be made explicit in order to enable interop. Marvin.

Hi; Comments in-line. Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Monday, June 12, 2006 2:45 AM To: Marvin Theimer Cc: JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Marvin Theimer wrote:
If we narrow the definitions of mountpoint and mountsource enough and precisely describe their semantics then we might arrive at something that could be fairly widely used. I'm thinking of things like saying that you can't navigate "out" of a file system via "cd ..", etc. This
is definitely something to explore.
Change "can't" to "shouldn't" and I'd agree. I don't regard the mount stuff as being a way of describing security enforcement points. Systems can do it that way, but at least some won't. [MARVIN] The main thing I'm after is that the behavior of trying to step "outside" the mountpoint by cd'ing out the top must be either (a) prohibited or (b) explicitly marked as undefined in its behavior, with an error fault potentially being generated. This is because in the Windows world I can imagine that a mountpoint definition might map to setting up a drive letter and you can't cd up out of a drive letter. In fact, I'd be happy enough with the profile stating that paths in JSDL documents should not contain either the "." or the ".." elements at all. That's a fairly strong requirement and guarantees that the job won't fail on systems where your style of semantics are enforced.
Since the HPC profile base case treats data staging as being out-of-scope, the base interface profile will exclude these; but that can be done independently of anything else. (And, of course, the data
staging extension to the HPC profile will need to deal with this subject in any case, even if it's ignored in the base case.)
The HPC profile (and BES) /have/ to deal with the issue of describing available resources. So, one way or the other, the subject will get addressed this summer. As much as possible, I'd like to avoid duplicating the work done in JSDL for that - if for no other reason
Perhaps. Virtual FS references also sneak into arguments, stdio redirection, working directory specs, and environment variables, all of which may well need to be defined with respect to some VFS root. One way of dealing with this is to define a minimal set of VFSes in the profile (probably based on the JSDL set?) and state that profile-conforming job submissions will only use those. [MARVIN] Yes, but these references don't require the BES service to parse the mountpoint and mountsource elements or do something like execute a stat operation on the mountpoint to confirm that it properly exists. Excluding these elements from the HPC profile base case doesn't mean that an implicitly required mountpoint doesn't exist, merely that the compliant (BES) service is allowed to be oblivious to it. than
that users will likely be unhappy if they have to learn two different ways of describing what they will perceive as being variations of the core concept, namely resource description - both required and available.
Sure, but you're using the elements to have subtly different meanings. Not that I'm objecting; just pointing it out. :-) The best bet I'd guess is to state that the published elements represent a description of the maximal job (maximal capacity, most specific capability) supported by the container. You'll also need to manufacture a way of describing the set of apps (and other related things, like libraries) supported. I don't think that the way JSDL does this (even under the "maximal" interpretation outlined above) is going to work in a sane-enough way. [MARVIN] All very good points. But I'll have to make progress on this front one way or the other ...
Any advice on this subject would be greatly appreciated. As I said above, I have to deal with this subject one way or the other and would
prefer to do so with the minimum of feather-ruffling (while still making progress that results in a usable HPC profile by the end of the summer).
You've got a mandate to ruffle feathers. You're writing a profile. JSDL isn't a profile, and so we (with my JSDL hat on) need to be a little bit more circumspect, if only to keep the noise down. :-)
If you allow for the notion of file systems and mount points to be in the core spec then I would argue that you are implicitly buying into systems that also support the notion of current working directory (some jobs may of course not use it).
Good argument. :-)
I would argue that one not specifying allowed/disallowed behaviors is a bad approach when interoperability is at issue. (I'm talking about disallowing "cd .." out the top, not disallowing change-directory within the subtree specified by a file system element.)
I'd certainly say that it is a feature nobody has to implement, and which no interoperable job may make use of. (I don't see any way of preventing the job itself from trying to do a 'cd ..' internally, nor even any way to properly analyse whether the job will try to do it. Any security checks required by a VFS interpretation must still be enforced at runtime anyway.) [MARVIN] Agreed. [re root filesystems]
Again, from an interop point-of-view, this seems dangerous.
This seems to me like an argument in favour of not including ROOT in the HPC profile. That doesn't cause me problems, though I suspect that everyone building a BES on top of unix will add it. (Who knows, perhaps it would be in a POSIX-HPC sub-profile?) [MARVIN] I think the HPC profile should say that specifying ROOT will result in undefined behavior: the compliant service is free to accept the specification, ignore it, or raise a fault. Similarly, if an activity is created then it must be prepared for undefined behavior if it tries to use ROOT in a file name that it opens.
Regarding your reactions to my straw man proposal, it seems like you pretty much agree with everything except the following:
* You're not convinced how universal the posix extension elements for things like command-line arguments and working directory are. My response is that I think they are at least as universal as the data staging elements.
I'm likely agree with you there.
* You don't want to move the data staging section out of the core specification. For the HPC profile base case, data staging elements will be prohibited since they are out-of-scope for the base case. The
HPC profile extension for data staging will allow the JSDL data staging elements. Whether or not these should be in a separate JSDL extension
or whether they can be generalized to cover a wide(r) range of systems
is a topic for future discussion.
Sounds fair enough. The inclusion of data staging was so that JSDL would be relevant in cases that don't really match the HPC case, such as clusters where everything has to be staged into the machines for anything at all to work, and where no real workflow coordinator was available. (Myself, I prefer the data staging to be in a separate workflow step, but workflows are not really in scope of any GGF working group at the moment.) IIRC, it was Darren Pulsipher who had this use case.
* You're leery of tackling the resource description problem. Understandable, although the HPC profile working group will have to and will be seeking guidance from the JSDL and other communities on how to
do so.
I think it's hard and some people have very entrenched views on this. But I also think that you can get something workable for the 90% case together in short order here. I suggest that the "homogenous container" model under the maximal resource interpretation is good enough. :-) [MARVIN] Definitely an approach that we should look at carefully. :-)
Is that a fair characterization of your position?
Pretty fair. I can't speak for whether it is a characterisation of anyone else's position. :-) Donal.

Marvin Theimer wrote:
[MARVIN] The main thing I'm after is that the behavior of trying to step "outside" the mountpoint by cd'ing out the top must be either (a) prohibited or (b) explicitly marked as undefined in its behavior, with an error fault potentially being generated. This is because in the Windows world I can imagine that a mountpoint definition might map to setting up a drive letter and you can't cd up out of a drive letter.
As I said, forbidding ".." from anywhere in *any* path in a HPC-profiled JSDL document would be reasonable, and forbidding interoperable jobs from doing a chdir() or equivalent at all is also reasonable (in both cases, if people do those things then we can say that we don't define what happens; it's up to vendors, with faulting being a nice strategy). Also, using drive letter mappings for JSDL virtual paths on Windows is a perfectly reasonable implementation strategy (but not something we should mandate at either specification or profile level, of course). Looks to me like we're in 100% agreement on this.
[MARVIN] Yes, but these references don't require the BES service to parse the mountpoint and mountsource elements or do something like execute a stat operation on the mountpoint to confirm that it properly exists. Excluding these elements from the HPC profile base case doesn't mean that an implicitly required mountpoint doesn't exist, merely that the compliant (BES) service is allowed to be oblivious to it.
The best way forward is probably to just define a small set of VFSes that should/must be supported. That will scope the interop problem nicely. I think (based on UNICORE experience) that the key locations for a job are: Working Directory - main place for the job to work; may be isolated from other jobs (though that's really a quality-of-service feature that will be good for some jobs, bad for others, and neutral for the rest) Fast Local Temporary Directory - basically /tmp on Unix; need not be shared across a cluster; no isolation guarantees Large Temporary Directory - place for things like staged in BLAST databases, intermediate weather model dumps, etc. Should be shared across the cluster, should have a longer-term delete policy than /tmp if different from it. May be the same as FLTD; no isolation guarantees User's Home Directory - for application settings, definitely long term persistence (backups strongly recommended!) but might not permit very large files. Definitely possible to access outside the scope of the job. Need not be isolated from other jobs, and isolation from other users is dependent on user and site policy. In theory, all could be the same directory, but that would be very odd. All those FSes should have standardized names, and we should state that profile-compliant jobs will not refer to any other filesystem. (And that punts loads of complexity right out of the ballpark!) (Any features of the above FS types I've missed anyone? Any preferred names? I'm happy to call them John, Paul, George and Ringo, but that might be confusing to others...)
[MARVIN] I think the HPC profile should say that specifying ROOT will result in undefined behavior: the compliant service is free to accept the specification, ignore it, or raise a fault. Similarly, if an activity is created then it must be prepared for undefined behavior if it tries to use ROOT in a file name that it opens.
Since I've not put ROOT in the basic set above, I'd argue that if the job refers to ROOT, it's not in the space that should be defined by the HPC profile. And so we don't need to solve the problem of how to handle it on multi-rooted operating systems. :-) Donal.

Hi; I haven't tried to figure out if there are corner cases to consider, but I think I agree with everything you say in your reply. :-) Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Friday, June 16, 2006 3:31 AM To: Marvin Theimer Cc: JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Marvin Theimer wrote:
[MARVIN] The main thing I'm after is that the behavior of trying to step "outside" the mountpoint by cd'ing out the top must be either (a) prohibited or (b) explicitly marked as undefined in its behavior, with an error fault potentially being generated. This is because in the Windows world I can imagine that a mountpoint definition might map to setting up a drive letter and you can't cd up out of a drive letter.
As I said, forbidding ".." from anywhere in *any* path in a HPC-profiled JSDL document would be reasonable, and forbidding interoperable jobs from doing a chdir() or equivalent at all is also reasonable (in both cases, if people do those things then we can say that we don't define what happens; it's up to vendors, with faulting being a nice strategy). Also, using drive letter mappings for JSDL virtual paths on Windows is a perfectly reasonable implementation strategy (but not something we should mandate at either specification or profile level, of course). Looks to me like we're in 100% agreement on this.
[MARVIN] Yes, but these references don't require the BES service to parse the mountpoint and mountsource elements or do something like execute a stat operation on the mountpoint to confirm that it properly exists. Excluding these elements from the HPC profile base case doesn't mean that an implicitly required mountpoint doesn't exist, merely that the compliant (BES) service is allowed to be oblivious to it.
The best way forward is probably to just define a small set of VFSes that should/must be supported. That will scope the interop problem nicely. I think (based on UNICORE experience) that the key locations for a job are: Working Directory - main place for the job to work; may be isolated from other jobs (though that's really a quality-of-service feature that will be good for some jobs, bad for others, and neutral for the rest) Fast Local Temporary Directory - basically /tmp on Unix; need not be shared across a cluster; no isolation guarantees Large Temporary Directory - place for things like staged in BLAST databases, intermediate weather model dumps, etc. Should be shared across the cluster, should have a longer-term delete policy than /tmp if different from it. May be the same as FLTD; no isolation guarantees User's Home Directory - for application settings, definitely long term persistence (backups strongly recommended!) but might not permit very large files. Definitely possible to access outside the scope of the job. Need not be isolated from other jobs, and isolation from other users is dependent on user and site policy. In theory, all could be the same directory, but that would be very odd. All those FSes should have standardized names, and we should state that profile-compliant jobs will not refer to any other filesystem. (And that punts loads of complexity right out of the ballpark!) (Any features of the above FS types I've missed anyone? Any preferred names? I'm happy to call them John, Paul, George and Ringo, but that might be confusing to others...)
[MARVIN] I think the HPC profile should say that specifying ROOT will result in undefined behavior: the compliant service is free to accept the specification, ignore it, or raise a fault. Similarly, if an activity is created then it must be prepared for undefined behavior if it tries to use ROOT in a file name that it opens.
Since I've not put ROOT in the basic set above, I'd argue that if the job refers to ROOT, it's not in the space that should be defined by the HPC profile. And so we don't need to solve the problem of how to handle it on multi-rooted operating systems. :-) Donal.

Hi Marvin, Thanks for the (fairly long) email, you've raised quite a few interesting points - which I'll address inline below. First off I'd just like to say that the JSDL document is meant to be a language specification document thus a large number of the issues about how JSDL should be used and what they have to support is not really in scope for that document. However, I do agree with you that such a document needs to exist - but for all uses of JSDL not just HPC. I would like to take your straw man and use it as the starting point for this document for the section on "using JSDL for HPC". Let me know what you think. More comments below: Marvin Theimer wrote:
Hi;
Coming from the point-of-view of the HPC Profile working group, I have several questions about JSDL, as well as some straw man thoughts about how JSDL should/could relate to the HPC Profile specification that I'm involved with. Some of my questions lead me to restrictions on JSDL that an HPC profile specification might make. Other questions lead to potential changes that might be made as part of creating future versions of JSDL. (I'm well aware that JSDL 1.0 was meant as a starting point rather than the final word on job submission descriptions and so please interpret my questions as being an attempt at constructive suggestions rather than a criticism of a very fine first step by the JSDL working group.)
Will do.
At a high level, there are several general questions that came up when reading the JSDL 1.0 specification:
· Can JSDL documents describe jobs other than Linux/Unix/Posix jobs? For example, things like mount points and mount sources do not map in a completely straight-forward manner to how file systems are provided in the Windows world.
The idea is that JSDL (possibly through extensions) will be able to describe all kinds of jobs that can be submitted. This may include database queries, control of instruments or web invocations. We did Posix job type fist as that was the one the majority of people in the group wanted first. We're currently working on a ParallelApplication extension for JSDL. We'd be more than happy to see if an extension for Windows (or any other system) can be done through tweaks to the existing setup or by adding a new extension. Could you say how file systems don't map to the Windows world? My naive assumption was that you could do it.
· Is JSDL expressive enough to describe all the needs of a job? For example, it is unclear how one would specify a requirement for something like a particular instruction set variation of the IA86 architecture (e.g. the SSE3 version of the Pentium) or how one would specify that AMD processors are required rather than Intel ones (because the optimized libraries and the optimizations generated by the compiler used will differ for each). For another example, it is unclear how one would specify that all the compute nodes used for something like an MPI job should have the same hardware.
NO. I doubt JSDL is expressive enough in its current state to describe the needs of all jobs. We're working with the Information model people in the OGSA group at the moment on this please help! I liked some of your ideas for this below by the way.
· How will JSDL's normative set of enumeration values for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other).
Agreed. We don't yet have a way to add to the normative enumerations. I think you suggest below to move these into a separate document so that they can be updated more easily - this would seem a good idea. As for OS versioning I have my ideas though JSDL doesn't have a central plan yet. Again input here would be appreciated.
· Although JSDL specifies a means of including additional non-normative elements and attributes in a document, non-normative extensions make interoperability difficult. This implies the need for normative extensions to JSDL beyond the Posix extension currently described in the 1.0 specification. Are there plans to define additional extension profiles to address the above questions surrounding expressive power and normative descriptions of things like current OS types and versions?
Yes. The intention with JSDL has always been to produce more normative extensions post JSDL 1.0.
· If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging -- with its attendant aspects such as mount points and mount sources -- should be defined in an extension rather than in the core specification that will need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what's /functionally/ necessary to execute a job correctly and should leave things that are "optimization hints", such as CPU speed and network bandwidth specifications, to extension profiles.
Personally I'd agree with you that file staging should be in an extension. Though the view of the group was that most current DRM systems which would consume JSDL had file staging as a core element. I also agree on the idea of "optimization hints".
· How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in practice? I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don't represent functional requirements for a job -- meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support, the memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node's network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the /type/ of communications patterns employed by the application. How should this aspect of network bandwidth be reflected -- if at all -- in the network bandwidth values that a job requests and that compute nodes advertise?
As said above we really need to define this in a separate "profile" document.
· JSDL is intended for describing the requirements of a job being submitted for execution. To enable matchmaking between submitted jobs and available computational resources there must also be a way of describing existing/available resources. While much of JSDL can be used for this purpose, it is also clear that various extensions are necessary. For example, to describe a compute cluster requires that one be able to specify the resources for each compute node in the cluster (which may be a heterogeneous lot). Similarly, to describe a compute node with multiple network interfaces would require an extension to the current model, which assumes that only a single instance of such things can exist. This raises the question of whether something other than JSDL is intended to be used for describing available computational resources or whether there are intensions to extend JSDL to enable it to describe such resources.
The writing of a resource description language was something we were told we couldn't do in the JSDL group. I do agree that it's now important that we have one. I think we'd need to go back to GGF (or whatever there name is this week) and ask to set up a group to do this. Perhaps we could take all the stuff out of JSDL which is appropriate as a starting point?
· The current specification stipulates that conformant implementations must be able to parse all the elements and attributes defined in the spec, but doesn't require that any of them be supplied. Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can't assume that any valid JSDL document will actually be acceptable by any given job submission service.
Yes - this is true - though as the current document is a description of the JSDL "language" this is correct. These issues should all be clarified in the profile document.
· I have a number of questions about data staging:
· Although the notions of working directory and environment variables are defined in the posix extension, they are implicitly assuming in the data staging section of the core specification. This implies to me that either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification.
Hmm - well spotted. Personally as I've said I'd like to see it made into an extension. This probably need s some discussion on the list.
· Recursive directory copying can be specified, but is not required to be supplied by any job submission service. This makes it difficult to write applications that programmatically define their data staging needs since they cannot in the current design determine whether any given job submission service implements recursive directory copying. In practice this may mean that programmatically generated job submissions will only ever use lists of individual files to stage.
This is a major problem as many of the systems that are currently available out there do not support recursive directory copying. Again we could clarify the use of this through a HPC profile.
· The current definitions of the well-known file systems seem imprecise to me. In particular:
· What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so.
Again not defined here. Though I'd assume we can easily say in the profile that you can't cd out of it.
· What is ROOT intended to be used for? Are there assumptions about what resides under root? Are there assumptions about what an application can read/write under the ROOT subtree? (ROOT also seems like the most Unix-specific of the 4 file system types defined.)
Personally I don't have a use for it. Anyone else?
· What are the sharing/consistency semantics of each file system in situations where a job is a multi-node application running on something like a cluster? Is HOME visible to all compute nodes in a data-consistent manner? I'm guessing that TMP would be assumed to be strictly local to each compute node, so that things like MPI applications would need to be cognizant that they are writing multiple files to multiple separate storage systems when they write to a file in TMP -- and furthermore that data staging of such files after a job has run will result in multiple files that all map to the same target file.
Again profile issue.
· Can other users write over or delete your data in TMP and/or SCRATCH? Is data in these file systems visible to other users or does each job get its own private TMP and SCRATCH?
Profile.
· How long does data in SCRATCH stay around? Without some normative definition -- or at least a normative lower bound -- on data lifetime clients will have to assume that the data can vanish arbitrarily and things like multi-job workflows will be very difficult to write if they try to take advantage of SCRATCH space to avoid unnecessary data staging actions to/from a computing facility.
Profile.
· From an interoperability and programmatic submission point-of-view, it is important to know which transports any given job submission service can be expected to support. This seems like another area where a normative minimal set that all job submission services must implement needs to be defined.
This gets very difficult and political! Though we should be able to come up with a core set for the profile.
Given these questions, as well as the mandate for the HPC profile to define a simple base interface (that can cover the HPC use case of submitting jobs to a compute cluster), I would like to present the following straw man proposal for feedback from this community:
· Restructure the JSDL specification as a small core specification that must be universally implemented -- i.e. not just parsable, but also suppliable by all compliant job submission services -- and a number of optional extension profiles.
Hopefully the language as it stands at the moment (with a few exceptions) is a good core set. With profiles for different use cases we could mandate the implemented side too.
· Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world.
Why do these need to be in the core? We had problems before in a pre-release version when they were in the core as people who wanted to do database submissions (and other things) were trying to map these into such elements.
· Move data staging to an extension.
· Create precise definitions of the various concepts introduced in the data staging extension, including normative requirements about whether or not one can change directory up and out of a file system's root directory, etc.
· Define which transports are expected to be implemented by all compliant services.
Quite possibly - and the use of a profile.
· Move the various enumeration types -- e.g. for CPU architecture and OS -- to separate specification documents so that they can evolve without requiring corresponding and constant revision of the core JSDL specification.
Sounds good. Even better if we can get someone else to update these for us.
· Define extension profiles (eventually, not right away) that enable richer description of hardware and software requirements, such as details of the CPU architecture or OS capabilities. As part of this, move optimization hints, such as CPU speed and network bandwidth elements out of the JSDL core and into a separate extension profile.
This should come from the work we are doing with the Information model people - please join in.
· Embrace the issue of how to specify available resources at an execution subsystem. Start by defining a base case that allows the description of compute clusters by creating a compound JSDL document that consists of an outer element that ties together a sequence of individual JSDL elements, each of which describes a single compute node of a compute cluster. Define an explicit notion of extension profiles that could define other ways of describing computational resources beyond just an array of simple JSDL descriptions.
Not entirely sure what you are meaning on this one. Can you explain further.
Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases.
Marvin.
OK for those who have made it this far - possibly not many. I'm going to propose a JSDL call on this in a new email so all can see it. steve.. -- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

Hi; My responses are in-line below. Marvin. ________________________________ From: A S McGough [mailto:asm@doc.ic.ac.uk] Sent: Friday, June 09, 2006 3:03 AM To: Marvin Theimer Cc: JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Hi Marvin, Thanks for the (fairly long) email, you've raised quite a few interesting points - which I'll address inline below. First off I'd just like to say that the JSDL document is meant to be a language specification document thus a large number of the issues about how JSDL should be used and what they have to support is not really in scope for that document. However, I do agree with you that such a document needs to exist - but for all uses of JSDL not just HPC. I would like to take your straw man and use it as the starting point for this document for the section on "using JSDL for HPC". Let me know what you think. [Marvin] As long as the HPC profile specification has some document/specification that it can employ to normatively define behaviors, I'm happy. Presumably "compliance" with JSDL will defined to mean compliance with this second document that you propose to create? More comments below: Marvin Theimer wrote: Hi; Coming from the point-of-view of the HPC Profile working group, I have several questions about JSDL, as well as some straw man thoughts about how JSDL should/could relate to the HPC Profile specification that I'm involved with. Some of my questions lead me to restrictions on JSDL that an HPC profile specification might make. Other questions lead to potential changes that might be made as part of creating future versions of JSDL. (I'm well aware that JSDL 1.0 was meant as a starting point rather than the final word on job submission descriptions and so please interpret my questions as being an attempt at constructive suggestions rather than a criticism of a very fine first step by the JSDL working group.) Will do. At a high level, there are several general questions that came up when reading the JSDL 1.0 specification: 1. Can JSDL documents describe jobs other than Linux/Unix/Posix jobs? For example, things like mount points and mount sources do not map in a completely straight-forward manner to how file systems are provided in the Windows world. The idea is that JSDL (possibly through extensions) will be able to describe all kinds of jobs that can be submitted. This may include database queries, control of instruments or web invocations. We did Posix job type fist as that was the one the majority of people in the group wanted first. We're currently working on a ParallelApplication extension for JSDL. We'd be more than happy to see if an extension for Windows (or any other system) can be done through tweaks to the existing setup or by adding a new extension. Could you say how file systems don't map to the Windows world? My naive assumption was that you could do it. [Marvin] With suitable - and hopefully relatively minor - restrictions the concepts of mount-point and mount-source might be made to map to the Windows world. It's the details that may not map precisely. In general, there are concepts in Unix and Windows file systems that don't map to each other. For example, there are no symbolic links in NTFS and there is no notion of fine-grained op-locks in the Unix file system. 2. Is JSDL expressive enough to describe all the needs of a job? For example, it is unclear how one would specify a requirement for something like a particular instruction set variation of the IA86 architecture (e.g. the SSE3 version of the Pentium) or how one would specify that AMD processors are required rather than Intel ones (because the optimized libraries and the optimizations generated by the compiler used will differ for each). For another example, it is unclear how one would specify that all the compute nodes used for something like an MPI job should have the same hardware. NO. I doubt JSDL is expressive enough in its current state to describe the needs of all jobs. We're working with the Information model people in the OGSA group at the moment on this please help! I liked some of your ideas for this below by the way. 3. How will JSDL's normative set of enumeration values for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other). Agreed. We don't yet have a way to add to the normative enumerations. I think you suggest below to move these into a separate document so that they can be updated more easily - this would seem a good idea. As for OS versioning I have my ideas though JSDL doesn't have a central plan yet. Again input here would be appreciated. 4. Although JSDL specifies a means of including additional non-normative elements and attributes in a document, non-normative extensions make interoperability difficult. This implies the need for normative extensions to JSDL beyond the Posix extension currently described in the 1.0 specification. Are there plans to define additional extension profiles to address the above questions surrounding expressive power and normative descriptions of things like current OS types and versions? Yes. The intention with JSDL has always been to produce more normative extensions post JSDL 1.0. 5. If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging - with its attendant aspects such as mount points and mount sources - should be defined in an extension rather than in the core specification that will need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what's functionally necessary to execute a job correctly and should leave things that are "optimization hints", such as CPU speed and network bandwidth specifications, to extension profiles. Personally I'd agree with you that file staging should be in an extension. Though the view of the group was that most current DRM systems which would consume JSDL had file staging as a core element. I also agree on the idea of "optimization hints". 6. How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in practice? I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don't represent functional requirements for a job - meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support, the memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node's network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the type of communications patterns employed by the application. How should this aspect of network bandwidth be reflected - if at all - in the network bandwidth values that a job requests and that compute nodes advertise? As said above we really need to define this in a separate "profile" document. 7. JSDL is intended for describing the requirements of a job being submitted for execution. To enable matchmaking between submitted jobs and available computational resources there must also be a way of describing existing/available resources. While much of JSDL can be used for this purpose, it is also clear that various extensions are necessary. For example, to describe a compute cluster requires that one be able to specify the resources for each compute node in the cluster (which may be a heterogeneous lot). Similarly, to describe a compute node with multiple network interfaces would require an extension to the current model, which assumes that only a single instance of such things can exist. This raises the question of whether something other than JSDL is intended to be used for describing available computational resources or whether there are intensions to extend JSDL to enable it to describe such resources. The writing of a resource description language was something we were told we couldn't do in the JSDL group. I do agree that it's now important that we have one. I think we'd need to go back to GGF (or whatever there name is this week) and ask to set up a group to do this. Perhaps we could take all the stuff out of JSDL which is appropriate as a starting point? [Marvin] Whichever group the work is done in, the HPC profile working group will need to deal with the matter sooner rather than later (during this summer, to be precise). It may be the case that the HPC profile working group will end up defining a "Basic Resource Description" specification in the same spirit as BES is a "basic" version of what's being pursued in the EMS working group. But that's a personal speculation thus far. 8. The current specification stipulates that conformant implementations must be able to parse all the elements and attributes defined in the spec, but doesn't require that any of them be supplied. Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can't assume that any valid JSDL document will actually be acceptable by any given job submission service. Yes - this is true - though as the current document is a description of the JSDL "language" this is correct. These issues should all be clarified in the profile document. 9. I have a number of questions about data staging: 10. Although the notions of working directory and environment variables are defined in the posix extension, they are implicitly assuming in the data staging section of the core specification. This implies to me that either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification. Hmm - well spotted. Personally as I've said I'd like to see it made into an extension. This probably need s some discussion on the list. 11. Recursive directory copying can be specified, but is not required to be supplied by any job submission service. This makes it difficult to write applications that programmatically define their data staging needs since they cannot in the current design determine whether any given job submission service implements recursive directory copying. In practice this may mean that programmatically generated job submissions will only ever use lists of individual files to stage. This is a major problem as many of the systems that are currently available out there do not support recursive directory copying. Again we could clarify the use of this through a HPC profile. 12. The current definitions of the well-known file systems seem imprecise to me. In particular: 13. What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so. Again not defined here. Though I'd assume we can easily say in the profile that you can't cd out of it. 14. What is ROOT intended to be used for? Are there assumptions about what resides under root? Are there assumptions about what an application can read/write under the ROOT subtree? (ROOT also seems like the most Unix-specific of the 4 file system types defined.) Personally I don't have a use for it. Anyone else? 15. What are the sharing/consistency semantics of each file system in situations where a job is a multi-node application running on something like a cluster? Is HOME visible to all compute nodes in a data-consistent manner? I'm guessing that TMP would be assumed to be strictly local to each compute node, so that things like MPI applications would need to be cognizant that they are writing multiple files to multiple separate storage systems when they write to a file in TMP - and furthermore that data staging of such files after a job has run will result in multiple files that all map to the same target file. Again profile issue. 16. Can other users write over or delete your data in TMP and/or SCRATCH? Is data in these file systems visible to other users or does each job get its own private TMP and SCRATCH? Profile. 17. How long does data in SCRATCH stay around? Without some normative definition - or at least a normative lower bound - on data lifetime clients will have to assume that the data can vanish arbitrarily and things like multi-job workflows will be very difficult to write if they try to take advantage of SCRATCH space to avoid unnecessary data staging actions to/from a computing facility. Profile. 18. From an interoperability and programmatic submission point-of-view, it is important to know which transports any given job submission service can be expected to support. This seems like another area where a normative minimal set that all job submission services must implement needs to be defined. This gets very difficult and political! Though we should be able to come up with a core set for the profile. Given these questions, as well as the mandate for the HPC profile to define a simple base interface (that can cover the HPC use case of submitting jobs to a compute cluster), I would like to present the following straw man proposal for feedback from this community: 19. Restructure the JSDL specification as a small core specification that must be universally implemented - i.e. not just parsable, but also suppliable by all compliant job submission services - and a number of optional extension profiles. Hopefully the language as it stands at the moment (with a few exceptions) is a good core set. With profiles for different use cases we could mandate the implemented side too. 20. Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world. Why do these need to be in the core? We had problems before in a pre-release version when they were in the core as people who wanted to do database submissions (and other things) were trying to map these into such elements. [Marvin] A Windows HPC job is not completely posix-compliant, yet has overlap on the above-listed set of concepts (and actually many more). So I would argue that we need something that abstracts out the core concepts of a traditional HPC job. Given the presence of file data staging elements in the core specification - which I would argue are meaningless for database submissions - it seems like the above-listed elements are at least as generic as the data staging elements. 21. Move data staging to an extension. 22. Create precise definitions of the various concepts introduced in the data staging extension, including normative requirements about whether or not one can change directory up and out of a file system's root directory, etc. 23. Define which transports are expected to be implemented by all compliant services. Quite possibly - and the use of a profile. 24. Move the various enumeration types - e.g. for CPU architecture and OS - to separate specification documents so that they can evolve without requiring corresponding and constant revision of the core JSDL specification. Sounds good. Even better if we can get someone else to update these for us. 25. Define extension profiles (eventually, not right away) that enable richer description of hardware and software requirements, such as details of the CPU architecture or OS capabilities. As part of this, move optimization hints, such as CPU speed and network bandwidth elements out of the JSDL core and into a separate extension profile. This should come from the work we are doing with the Information model people - please join in. 26. Embrace the issue of how to specify available resources at an execution subsystem. Start by defining a base case that allows the description of compute clusters by creating a compound JSDL document that consists of an outer element that ties together a sequence of individual JSDL elements, each of which describes a single compute node of a compute cluster. Define an explicit notion of extension profiles that could define other ways of describing computational resources beyond just an array of simple JSDL descriptions. Not entirely sure what you are meaning on this one. Can you explain further. [Marvin] I'm basically advocating two things: (a) tackle the problem of how to describe available resources since it's so closely allied to the topic of describing required resources, and (b) start with a simple approach and a means of allowing evolution/extension to support richer approaches later on. Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases. Marvin. OK for those who have made it this far - possibly not many. I'm going to propose a JSDL call on this in a new email so all can see it. [Marvin] Great idea. I will try hard to be on that call. (If you send me a direct email then that will increase the likelihood since all my GGF email now goes into one folder and I sometimes miss important ones in the deluge of all emails.) steve.. -- ------------------------------------------------------------------------ Dr A. Stephen McGough http://www.doc.ic.ac.uk/~asm ------------------------------------------------------------------------ Technical Coordinator, London e-Science Centre, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8409 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------

Hi Marvin, Steve, First off, Marvin, could I ask you to put unresolved issues into the JSDL post-v1 tracker (as separate artifacts)? https://forge.gridforum.org/sf/tracker/do/listArtifacts/projects.jsdl-wg/tra... (I'm sure there are many so I, and hopefully a few others from the group, can help out with if you want. It's difficult to tell what's agreed on and what's not after wading through long email threads.) Some other comments in-line below. I've deleted some blocks of text for clarity and I didn't try to reply to everything... Andreas Marvin Theimer wrote:
Hi;
My responses are in-line below.
Marvin.
------------------------------------------------------------------------
*From:* A S McGough [mailto:asm@doc.ic.ac.uk] *Sent:* Friday, June 09, 2006 3:03 AM *To:* Marvin Theimer *Cc:* JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) *Subject:* Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Hi Marvin,
Thanks for the (fairly long) email, you've raised quite a few interesting points - which I'll address inline below. First off I'd just like to say that the JSDL document is meant to be a language specification document thus a large number of the issues about how JSDL should be used and what they have to support is not really in scope for that document. However, I do agree with you that such a document needs to exist - but for all uses of JSDL not just HPC. I would like to take your straw man and use it as the starting point for this document for the section on "using JSDL for HPC". Let me know what you think.
[Marvin] As long as the HPC profile specification has /some/ document/specification that it can employ to normatively define behaviors, I’m happy. Presumably “compliance” with JSDL will defined to mean compliance with this second document that you propose to create?
I agree with Steve's general statement that a number of these issues are profiling issues. But I would have thought that the "JSDL for HPC" is what the HPC profile would define through appropriate restrictions to the JSDL spec.(?)
3. How will JSDL’s normative set of enumeration values for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other).
Agreed. We don't yet have a way to add to the normative enumerations. I think you suggest below to move these into a separate document so that they can be updated more easily - this would seem a good idea. As for OS versioning I have my ideas though JSDL doesn't have a central plan yet. Again input here would be appreciated.
In general I agree with the idea of separate schemas for the values. And if there is an appropriate schema already defined in CIM, or elsewhere, we could just point to it. But I think the JSDL-WG shouldn't be in the business of maintaining such lists. We should reuse what's available.
5. If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging – with its attendant aspects such as mount points and mount sources – should be defined in an extension rather than in the core specification that will need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what’s /functionally/ necessary to execute a job correctly and should leave things that are “optimization hints”, such as CPU speed and network bandwidth specifications, to extension profiles.
Personally I'd agree with you that file staging should be in an extension. Though the view of the group was that most current DRM systems which would consume JSDL had file staging as a core element. I also agree on the idea of "optimization hints".
I don't see a problem if the HPC Profile decides to have data staging as part of its extended features. Data staging may be in the main JSDL namespace but that does not mean that everyone has to implement it. It's there because we felt that it is common enough that everyone would want to.
6. How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in practice? I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don’t represent functional requirements for a job – meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support, the memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node’s network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the /type/ of communications patterns employed by the application. How should this aspect of network bandwidth be reflected – if at all – in the network bandwidth values that a job requests and that compute nodes advertise?
As said above we really need to define this in a separate "profile" document.
Network bandwidth is only meant to represent the theoretical maximum of the node's NIC. The motivation was to allow at least some simple statement about what networking capabilities you want the node to have. Obviously not enough.
8. The current specification stipulates that conformant implementations must be able to parse all the elements and attributes defined in the spec, but doesn’t require that any of them be supplied. Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can’t assume that any valid JSDL document will actually be acceptable by any given job submission service.
Yes - this is true - though as the current document is a description of the JSDL "language" this is correct. These issues should all be clarified in the profile document.
Yes, I think this is a profiling decision.
9. I have a number of questions about data staging:
10. Although the notions of working directory and environment variables are defined in the posix extension, they are implicitly assuming in the data staging section of the core specification. This implies to me that either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification.
Hmm - well spotted. Personally as I've said I'd like to see it made into an extension. This probably need s some discussion on the list.
I think environment variables do not refer to data staging so could you give me a text reference? I could be forgetting something. (Both Environment variables and DataStaging have a linkage to the Filesystem resource element.) The PosixApplication working directory has a relationship with DataStaging in that it may specialize the DataStaging location. But I don't see this as reason of itself to make DataStaging an extension. Having said the above I do agree that, eventually, DataStaging should be separated out. But I think a few other things have to be done before that happens. For example we would probably have to have a way to combine what are likely to be separate jsdl documents describing the staging and execution stages and define what their dependencies are. AFAI remember this was the main reason that data staging was not made a JSDL extension in the first place. To come to the HPC profile I see no reason (and no problem) why you cannot simply restrict JSDL 1.0 usage in your base case to not include staging and define it instead in a profile extension. Or am I missing something?
12. The current definitions of the well-known file systems seem imprecise to me. In particular:
13. What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so.
Again not defined here. Though I'd assume we can easily say in the profile that you can't cd out of it.
The intention of well-known names was to provide some minimal common definitions of what could be expected. They are not normative, not intended to be normative, and in retrospect they are more of a profiling issue. Perhaps they shouldn't be in the JSDL spec at all.
20. Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world.
Why do these need to be in the core? We had problems before in a pre-release version when they were in the core as people who wanted to do database submissions (and other things) were trying to map these into such elements.
[*Marvin*] A Windows HPC job is not completely posix-compliant, yet has overlap on the above-listed set of concepts (and actually many more). So I would argue that we need /something/ that abstracts out the core concepts of a traditional HPC job. Given the presence of file data staging elements in the core specification – which I would argue are meaningless for database submissions – it seems like the above-listed elements are at least as generic as the data staging elements.
I think Donal's suggestion for a more generic ExecutableApp, together with a WindowsApp are interesting enough to look into further. Especially if the changes to the schema would be compatible with existing JSDL 1.0 documents.
Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification. In the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases.
OK for those who have made it this far - possibly not many. I'm going to propose a JSDL call on this in a new email so all can see it.
I've already set up the call as Steve asked. I think the 1.x branch should stay compatible to 1.0 and not introduce any radical changes to the underlying schema. Longer term work might result in a 2.0 with a possibly different structure. But I think we should give priority to things that can be fixed within the 1.0 structure and can let the HPC profile work progress. -- Andreas Savva Fujitsu Laboratories Ltd

Hi; Yes, I will be happy to do this. Marvin. -----Original Message----- From: Andreas Savva [mailto:andreas.savva@jp.fujitsu.com] Sent: Monday, June 12, 2006 12:34 AM To: Marvin Theimer Cc: A S McGough; JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view Hi Marvin, Steve, First off, Marvin, could I ask you to put unresolved issues into the JSDL post-v1 tracker (as separate artifacts)? https://forge.gridforum.org/sf/tracker/do/listArtifacts/projects.jsdl-wg /tracker.jsdl_post_v1 (I'm sure there are many so I, and hopefully a few others from the group, can help out with if you want. It's difficult to tell what's agreed on and what's not after wading through long email threads.) Some other comments in-line below. I've deleted some blocks of text for clarity and I didn't try to reply to everything... Andreas Marvin Theimer wrote:
Hi;
My responses are in-line below.
Marvin.
------------------------------------------------------------------------
*From:* A S McGough [mailto:asm@doc.ic.ac.uk] *Sent:* Friday, June 09, 2006 3:03 AM *To:* Marvin Theimer *Cc:* JSDL Working Group; ogsa-bes-wg@ggf.org; Ed Lassettre; Ming Xu (WINDOWS) *Subject:* Re: [jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view
Hi Marvin,
Thanks for the (fairly long) email, you've raised quite a few interesting points - which I'll address inline below. First off I'd
just
like to say that the JSDL document is meant to be a language specification document thus a large number of the issues about how JSDL should be used and what they have to support is not really in scope for that document. However, I do agree with you that such a document needs to exist - but for all uses of JSDL not just HPC. I would like to take your straw man and use it as the starting point for this document for the section on "using JSDL for HPC". Let me know what you think.
[Marvin] As long as the HPC profile specification has /some/ document/specification that it can employ to normatively define behaviors, I'm happy. Presumably "compliance" with JSDL will defined to mean compliance with this second document that you propose to create?
I agree with Steve's general statement that a number of these issues are profiling issues. But I would have thought that the "JSDL for HPC" is what the HPC profile would define through appropriate restrictions to the JSDL spec.(?)
3. How will JSDL's normative set of enumeration
values
for things like processor architecture and operating system be kept up-to-date and relevant? Also, how should things like operating system version get specified in a normative manner that will enable interoperability among multiple clients and job scheduling services? For example, things like Linux and Windows versions are constantly being introduced, each with potentially significant differences in capabilities that a job might depend on. Without a normative way of specifying these constantly evolving version sets it will be difficult, if not impossible, to create interoperable job submission clients and job scheduling services (including meta-scheduling services where multiple schedulers must interoperate with each other).
Agreed. We don't yet have a way to add to the normative enumerations. I think you suggest below to move these into a separate document so that they can be updated more easily - this would seem a good idea. As for OS versioning I have my ideas though JSDL doesn't have a central plan yet. Again input here would be appreciated.
In general I agree with the idea of separate schemas for the values. And if there is an appropriate schema already defined in CIM, or elsewhere, we could just point to it. But I think the JSDL-WG shouldn't be in the business of maintaining such lists. We should reuse what's available.
5. If one accepts the need for a variety of extension profiles then this raises the question of what should be in the base case. For example, it could be argued that data staging - with its attendant aspects such as mount points and mount sources - should be defined in an extension rather than in the core specification that
will
need to cover a variety of systems beyond just Linux/Unix/Posix. Similarly, one might argue that the base case should focus on what's /functionally/ necessary to execute a job correctly and should leave things that are "optimization hints", such as CPU speed and network bandwidth specifications, to extension profiles.
Personally I'd agree with you that file staging should be in an extension. Though the view of the group was that most current DRM systems which would consume JSDL had file staging as a core element. I also agree on the idea of "optimization hints".
I don't see a problem if the HPC Profile decides to have data staging as part of its extended features. Data staging may be in the main JSDL namespace but that does not mean that everyone has to implement it. It's there because we felt that it is common enough that everyone would want to.
6. How are concepts such as IndividualCPUSpeed and IndividualNetworkBandwidth intended to be defined and used in
I understand the concept of specifying things like the amount of physical memory or disk space that a job will require in order to be able to run. However, CPU speed and network bandwidth don't represent functional requirements for a job - meaning that a job will correctly run and produce the same results irrespective of the CPU speed and network bandwidth available to it. Also, the current definitions seem fuzzy: the megahertz number for a CPU does not tell you how fast a given compute node will be able to execute various kinds of jobs, given all the various hardware factors that can affect the performance of a processor (consider the presence/absence of floating point support,
practice? the
memory caching architecture, etc.). Similarly, is network bandwidth meant to represent the theoretical maximum of a compute node's network interface card? Is it expected to take into account the performance of the switch that the compute node is attached to? Since switch performance is partially a function of the pattern of (aggregate) traffic going through it, the network bandwidth that a job such as an MPI application can expect to receive will depend on the /type/ of communications patterns employed by the application. How should this aspect of network bandwidth be reflected - if at all - in the network bandwidth values that a job requests and that compute nodes advertise?
As said above we really need to define this in a separate "profile" document.
Network bandwidth is only meant to represent the theoretical maximum of the node's NIC. The motivation was to allow at least some simple statement about what networking capabilities you want the node to have. Obviously not enough.
8. The current specification stipulates that
conformant
implementations must be able to parse all the elements and attributes defined in the spec, but doesn't require that any of them be supplied.
Thus, a scheduling service that does nothing could claim to be compliant as long as it can correctly parse JSDL documents. For interoperability purposes, I would argue that the spec should define a minimum set of elements that any compliant service must be able to supply. Otherwise clients will not be able to make any assumptions about what they can specify in a JSDL document and, in particular, client applications that programmatically submit job submission requests will not be possible since they can't assume that any valid JSDL document will actually be acceptable by any given job submission service.
Yes - this is true - though as the current document is a description of the JSDL "language" this is correct. These issues should all be clarified in the profile document.
Yes, I think this is a profiling decision.
9. I have a number of questions about data staging:
10. Although the notions of working directory and environment
are defined in the posix extension, they are implicitly assuming in
data staging section of the core specification. This implies to me
variables the that
either (a) data staging is made an extension or (b) these concepts are made a normative, required part of the core specification.
Hmm - well spotted. Personally as I've said I'd like to see it made into an extension. This probably need s some discussion on the list.
I think environment variables do not refer to data staging so could you give me a text reference? I could be forgetting something. (Both Environment variables and DataStaging have a linkage to the Filesystem resource element.) The PosixApplication working directory has a relationship with DataStaging in that it may specialize the DataStaging location. But I don't see this as reason of itself to make DataStaging an extension. Having said the above I do agree that, eventually, DataStaging should be separated out. But I think a few other things have to be done before that happens. For example we would probably have to have a way to combine what are likely to be separate jsdl documents describing the staging and execution stages and define what their dependencies are. AFAI remember this was the main reason that data staging was not made a JSDL extension in the first place. To come to the HPC profile I see no reason (and no problem) why you cannot simply restrict JSDL 1.0 usage in your base case to not include staging and define it instead in a profile extension. Or am I missing something?
12. The current definitions of the well-known file systems seem imprecise to me. In particular:
13. What are the navigation rules associated with each? Can you cd out of the subtree that each represents? ROOT almost certainly does not allow that. Is there an assumption that one can cd out of HOME or TMP or SCRATCH? Hopefully not, since that would make these file systems even more Unix/Linux-centric, plus one would now need to specify what clients can expect to see when they do so.
Again not defined here. Though I'd assume we can easily say in the profile that you can't cd out of it.
The intention of well-known names was to provide some minimal common definitions of what could be expected. They are not normative, not intended to be normative, and in retrospect they are more of a profiling issue. Perhaps they shouldn't be in the JSDL spec at all.
20. Declare concepts such as executable path, command-line arguments, environment variables, and working directory to be generic and include them in the core JSDL specification rather than the posix extension. This may enable the core specification to support things like Windows-based jobs (TBD). The goal here is to define a core JSDL specification that in-and-of-itself could enable job submission to a fairly wide range of execution subsystems, including both the Unix/Linux/Posix world and the Windows world.
Why do these need to be in the core? We had problems before in a pre-release version when they were in the core as people who wanted to do database submissions (and other things) were trying to map these into such elements.
[*Marvin*] A Windows HPC job is not completely posix-compliant, yet has overlap on the above-listed set of concepts (and actually many more). So I would argue that we need /something/ that abstracts out the core concepts of a traditional HPC job. Given the presence of file data staging elements in the core specification - which I would argue are meaningless for database submissions - it seems like the above-listed elements are at least as generic as the data staging elements.
I think Donal's suggestion for a more generic ExecutableApp, together with a WindowsApp are interesting enough to look into further. Especially if the changes to the schema would be compatible with existing JSDL 1.0 documents.
Now, as presented above, my straw man proposal looks like suggestions for changes that might go into a JSDL-1.1 or JSDL-2.0 specification.
In
the near-term, the HPC profile working group will be exploring what can be done with just JSDL-1.0 and restrictions to that specification. The restrictions would correspond to disallowing those parts of the JSDL-1.0 specification that the above proposal advocates moving to extension profiles. It will also explore whether a restricted version of the posix extension could be used to cover most common Windows cases.
OK for those who have made it this far - possibly not many. I'm going to propose a JSDL call on this in a new email so all can see it.
I've already set up the call as Steve asked. I think the 1.x branch should stay compatible to 1.0 and not introduce any radical changes to the underlying schema. Longer term work might result in a 2.0 with a possibly different structure. But I think we should give priority to things that can be fixed within the 1.0 structure and can let the HPC profile work progress. -- Andreas Savva Fujitsu Laboratories Ltd
participants (10)
-
A S McGough
-
Alexander Papaspyrou
-
Andreas Savva
-
Christopher Smith
-
Donal K. Fellows
-
Karl Czajkowski
-
Marvin Theimer
-
Michel Drescher
-
Oxana Smirnova
-
Thomas Röblitz