
And hi again. The current strawman API does not contain any means to find entries in a name space (e.g. to find a specific logical file in a replica catalog). The only way to perform such a operation is currently to walk through the namespace 'by foot', and evaluating the output of 'list' and the various inspection methods (isFile, isDir...) and possible attributes attached to the entries. We left out 'find' until now for the main reason that a good and mostly complete query language seemed difficult to define. However, my opinion is that a _simple_ find can easily be defined, hence this proposal for a simple find on namespaces: 1) find on names common find operations on name spaces search for name patterns "ls *.tex" "ls data/*/summary.dat)" such operations should be covered by a find. 2) find on meta data another common find paradigma is to find entries according to some meta data "find files named summary.dat" "find files owned by Fritz" "find files older than 3 days" "find files larger of size 1024 bytes" "find files larger than 1 MB" The second set of find examples impy some knowledge about attribute semantic, and that is where the complexity for find operations comes in. e.g. "find files older than 3 days" needs a compare relation for dates and times, and a specification about metadata structure (how is date/time represented? Part one of the problem is, in my opinion, easily dealt with, in two ways: - allow shell wildcards for all 'name' arguments on the name space operations: dir.list ("*.tex"); dir.move ("data/*/summary.dat", "/data/summary/"); - add a simple find, which runs recursively (removing the burden of implementing that on application level over and over again) dir.find ("data/*.tex"); Part two of the problem is, as said, more difficult. However, I think it can be coherently dealt with, in a way which is consistent with the evolution of SAGA. The strawman API includes the AttributeSet interface. By simply adding that interface to the NameSpace, and hence adding attributes to name space entries, a find can run over these attributes. More complex find semantics (dates, owners etc.) can evolve with more complex attribute types, which are planned to get defined in the future anyway. BTW: That argument can also be made in reverse: attributes will be more useful if there is a find which can operate on them. BTW: And that same argument can be made for ACL which are to be defined in SAGA still. As attributes are currently just key/value pairs of strings, the name space would look like: interface SAGA::NSEntry : extends-all SAGA::Attributes { } interface SAGA::NSDir : extends-all SAGA::NSEntry { find (in array<string, 1> query, out array<NSEntry, 1> matches); } The query would be defined as a set of key/value pairs, such as: "name=data/*.tex" "tag=useful" "enabled=" (memaning key must be present with any value) Note that this does not allow search for "size>2MB" or "date>2days" etc. - again, such queries need to wait until the attribute interface becomes more sophisticated. I would be happy to hear feedback on the list. - does it make sense at all? - is it too simple to be of any use? - is it too complex to fit into SAGA? Cheers, Andre. PS.: That proposal does not imply that any name space instance has to implement ar bitrary persistent meta data! Attributes can be read only (file size), and the set of attributes supported can be limited by the class specification (we do the very same for the JobDefinition). -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+
participants (1)
-
Andre Merzky