Basic POSIX Component & template draft

Hi all, Jun Tatemura and I took a home work assignment about "Basic POSIX Component" for BLAST application deployment. Attached is my first-cut draft and hope to discuss at GGF17 OGSA EMS session. Please have a look and give your feedbacks before or at the GGF17. Your feedbacks are very important. Thanks and see you in Tokyo! -- Hiro Kishimoto

Hiro Kishimoto wrote:
Hi all,
Jun Tatemura and I took a home work assignment about "Basic POSIX Component" for BLAST application deployment.
Attached is my first-cut draft and hope to discuss at GGF17 OGSA EMS session. Please have a look and give your feedbacks before or at the GGF17. Your feedbacks are very important.
Thanks and see you in Tokyo!
This is really good, and I do agree, we do need a set of foundational things to deploy. One thing I would argue, having learned the lessons of both Ant and SmartFrog, is that a consistent model of file system components comes first. That is, we need components to model -files (with a liveness test that verifies the file is present) -paths (an ordered list of files/directories) -directories (with components that can create a dir on deployment, optionally to delete it and its contents on termination) -wildcarded sets of files (**/*.csv) -temporary files and directories -cached downloads -text files I'm attaching the PDF file describing the current smartfrog set of components that do this. The underlying way that this is done is that all components that consider themselves to be sources of filenames/paths set the attribute absolutePath at runtime. They usually also export an RMI interface for RPC-style operations, along side their RESTy state. A standard set of filesystem components lets you include the operations to create and clean up directories, temporary files and other housekeeping operations into the workflows of a deployment. Looking at the Posix proposal, I'd suggest starting with the reusable set of filesystem types, including a <fs:mount> component that could mount a local or remote filesystem, and which would then pass the local location to interested parties as the filesystem:absolutePath attribute. You absolutely don't need to introduce a new shared datatype FileSystemName, because that implies some kind of alternate naming/locating scheme outside of what CDL and the runtime has built in. Dynamic CDL references are sufficient to crosslink information inside the graph of deployed things, and offer flexibility as well as ease of management (you see them when you walk the graph). First, we'd need filesystem types: <tmp cdl:extends="fs:TempFileSystem"> <fs:Description> … </fs:Description> <fs:DiskSpace> <fs:LowerBoundedRange>10737418240.0</fs:LowerBoundedRange> </fs:DiskSpace> </tmp> <home cdl:extends="fs:Directory"> <fs:Description>Chris's home directory</fs:Description> <fs:dir>/home/csmith</fs:dir> </home> When deployed, both of these would add absolutePath as an attribute, so that tmp/absolutePath woudl resolve to, say, tmp/work01234/ and home:absolutePath to /home/csmith. You'd then mount a blast dir, say on an nfs filesys whose URL you provide <blastfs cdl:extends="fs:RemoteFileSystem"> <fs:url>nfs://filestore/csmith/blast</fs:url> <fs:mountUser>csmith</fs:mountUser> </tmp> The database file is under here, and you declare that it must exist. Deployment will fail if it is absent: <db cdl:extends="fs:File"> <fs:dir cdl:ref="/tmp/absolutePath" cdl:lazy="true" /> <fs:filename>db/ncbiblast/est</fs:filename> <fs:filemustexist>true</fs:filemustexist> </db> To use it, just refer to its path <blast:database cdl:ref="/db/absolutePath" cdl:lazy="true"/> Paths could be represented by some list-like element, whose entries are evaluated at deploy time and then a platform-specific path created as a result <path cdl:extends="fs:Path"> <pathentry cdl:ref="/blastfs/absolutePath" cdl:lazy="true" /> <pathentry cdl:ref="/home/absolutePath" cdl:lazy="true"/> </path> On Windows /path/absolutePath would become something like "\\filestore\csmith\blast;c:\documents and settings\csmith" while on unix it could be "/nfs/filestore/csmith/blast:/home/csmith" This information now goes down to the jsdl component <jsdl-posix:Environment name="PATH" cdl:ref="path:absolutePath" cdl:lazy="true"/> <jsdl-posix:Environment name="TMPDIR" cdl:ref="tmp/absolutePath" cdl:lazy="true"/> -Steve

Hi Steve, Your file & filesystem components are very good. They are what I am looking for. One more question I have in my mind is how to set user-id, group-id, and permissions to deployed files. Can you call in OGSA-EMS session on Wednesday? It starts 5:45pm JST = 9:45am UK. The dial-in number for this session; US: +1 718 3541071 (New York) or +1 408 9616509 (San Jose) UK: +44 (0)207 3655269 (London) Japan: +81 (0)3 3570 8225 (Tokyo) PIN: 4371991 Thanks, ---- Hiro Kishimoto Steve Loughran wrote:
Hiro Kishimoto wrote:
Hi all,
Jun Tatemura and I took a home work assignment about "Basic POSIX Component" for BLAST application deployment.
Attached is my first-cut draft and hope to discuss at GGF17 OGSA EMS session. Please have a look and give your feedbacks before or at the GGF17. Your feedbacks are very important.
Thanks and see you in Tokyo!
This is really good, and I do agree, we do need a set of foundational things to deploy.
One thing I would argue, having learned the lessons of both Ant and SmartFrog, is that a consistent model of file system components comes first. That is, we need components to model
-files (with a liveness test that verifies the file is present) -paths (an ordered list of files/directories) -directories (with components that can create a dir on deployment, optionally to delete it and its contents on termination) -wildcarded sets of files (**/*.csv) -temporary files and directories -cached downloads -text files
I'm attaching the PDF file describing the current smartfrog set of components that do this. The underlying way that this is done is that all components that consider themselves to be sources of filenames/paths set the attribute absolutePath at runtime. They usually also export an RMI interface for RPC-style operations, along side their RESTy state.
A standard set of filesystem components lets you include the operations to create and clean up directories, temporary files and other housekeeping operations into the workflows of a deployment.
Looking at the Posix proposal, I'd suggest starting with the reusable set of filesystem types, including a <fs:mount> component that could mount a local or remote filesystem, and which would then pass the local location to interested parties as the filesystem:absolutePath attribute.
You absolutely don't need to introduce a new shared datatype FileSystemName, because that implies some kind of alternate naming/locating scheme outside of what CDL and the runtime has built in. Dynamic CDL references are sufficient to crosslink information inside the graph of deployed things, and offer flexibility as well as ease of management (you see them when you walk the graph).
First, we'd need filesystem types:
<tmp cdl:extends="fs:TempFileSystem"> <fs:Description> … </fs:Description> <fs:DiskSpace> <fs:LowerBoundedRange>10737418240.0</fs:LowerBoundedRange> </fs:DiskSpace> </tmp>
<home cdl:extends="fs:Directory"> <fs:Description>Chris's home directory</fs:Description> <fs:dir>/home/csmith</fs:dir> </home>
When deployed, both of these would add absolutePath as an attribute, so that tmp/absolutePath woudl resolve to, say, tmp/work01234/ and home:absolutePath to /home/csmith.
You'd then mount a blast dir, say on an nfs filesys whose URL you provide
<blastfs cdl:extends="fs:RemoteFileSystem"> <fs:url>nfs://filestore/csmith/blast</fs:url> <fs:mountUser>csmith</fs:mountUser> </tmp>
The database file is under here, and you declare that it must exist. Deployment will fail if it is absent:
<db cdl:extends="fs:File"> <fs:dir cdl:ref="/tmp/absolutePath" cdl:lazy="true" /> <fs:filename>db/ncbiblast/est</fs:filename> <fs:filemustexist>true</fs:filemustexist> </db>
To use it, just refer to its path
<blast:database cdl:ref="/db/absolutePath" cdl:lazy="true"/>
Paths could be represented by some list-like element, whose entries are evaluated at deploy time and then a platform-specific path created as a result
<path cdl:extends="fs:Path"> <pathentry cdl:ref="/blastfs/absolutePath" cdl:lazy="true" /> <pathentry cdl:ref="/home/absolutePath" cdl:lazy="true"/> </path>
On Windows /path/absolutePath would become something like "\\filestore\csmith\blast;c:\documents and settings\csmith" while on unix it could be "/nfs/filestore/csmith/blast:/home/csmith"
This information now goes down to the jsdl component
<jsdl-posix:Environment name="PATH" cdl:ref="path:absolutePath" cdl:lazy="true"/> <jsdl-posix:Environment name="TMPDIR" cdl:ref="tmp/absolutePath" cdl:lazy="true"/>
-Steve

Hiro Kishimoto wrote:
Hi Steve,
Your file & filesystem components are very good. They are what I am looking for.
I'd say they are a start. What is important is to have things that are generally useful yet at a high enough level to avoid a deployment descriptor evolving into a workflow script listing every single pre-and post- staging operation that takes place. The TempDir component example is a good one -it not only creates a temporary directory, it cleans it up on closedown. You could imagine an extension which added purging of files over 72 hours old to keep transient directories from overflowing on a long-running deployment. The higher level the components are, the more useful declarative deployment becomes. Otherwise you are writing shell scripts in XML, which is nearly the worst of all possible worlds. The other goal is to integrate with all the other work, not, as Dave Berry indirectly hints, try and repeat the work. I'd envisage staging components that set up paths to stuff in ACS repositories or other data sources. Its not the job of the deployment descriptor to say how that stuff gets close, only that at deployment time, the deployed program needs to know the path to where the stuff they want is. All we need is re-usable components that make use of all the file system/staging/data stuff being done to set things up right at deploy time, clean up when terminating, and to check for valid configurations and health at deploy time.
One more question I have in my mind is how to set user-id, group-id, and permissions to deployed files.
For Java implementations, you wait for Java 6 or execute chmod programs. Most irritatingly, Java operations to copy a file inherit back to the current defaults, not those of the source file. Over in Ant-land, this is a continous source of support calls, right up there with "no easy way to work with symlinks" and file case sensitivity problems in terms of where Java-on-portable-filesystem's limitations poke through. -Steve
participants (2)
-
Hiro Kishimoto
-
Steve Loughran