NOTE: We could force further consolidation of items one (1) and
two (2) on the list above. I elected to separate each item based
on the uniqueness of their semantics.
Some additional information:
There seems to be a growing trend in the html5 communities to
adopt the use of the http 1.1 RANGE header to provide query
pagination. For example, the dojo javascript library.
An open, patent free approach to query pagination for URI query
expression does not seems to exist.
If OOCI/JSON elects to adopt that approach, it will need clear
definition describing the behaviors surrounding insertions,
deletes and modification of OCCI information records. I have
included below an example document showing a the level detail we
should consider for the OCCI specification.
Since REST is a context free approach to API and there a no
ordering and sequence constraints defined for the delivery,
modification and insertion of OCCI information, there may be
opportunities where duplicate and missing OCCI information can
occur. We should also consider a well defined approach to
reconcile inconsistencies in paged operations.
I'll be following up with more information so we can develop
recommendations to fully resolve paging issues for URI, HTTP and
OCCI operations in payload data.
cheers,
gary mazzaferro
Cloud Data Management Interface
(CDMI™) Version 1.0.2
5.13.3 Range Support
The server shall support HTTP Range headers and partial content responses (see Section 14.16 of RFC
2616).
Document Object Model (DOM) Level 2 Traversal and Range Specificationhttp://www.w3.org/TR/xquery/
Version 1.0
W3C Recommendation 13 November, 2000
1. Document Object Model Traversal
1.1. Overview
1.1.1. NodeIterators
1.1.2. NodeFilters
1.1.3. TreeWalker
1.2. Formal Interface Definition NodeIterator, NodeFilter, TreeWalker, DocumentTraversal
2. Document Object Model Range
2.1. Introduction
2.2. Definitions and Notation
2.2.1. Position
2.2.2. Selection and Partial Selection
2.2.3. Notation
2.3. Creating a Range
2.4. Changing a Range's Position
2.5. Comparing Range Boundary-Points
2.6. Deleting Content with a Range
2.7. Extracting Content
2.8. Cloning Content
2.9. Inserting Content
2.10. Surrounding Content
2.11. Miscellaneous Members
2.12. Range modification under document mutation
2.12.1. Insertions
2.12.2. Deletions
2.13. Formal Description of the Range Interface Range, DocumentRange, RangeException, RangeExceptionCode
XQuery 1.0: An XML Query Language (Second Edition)
W3C Recommendation 14 December 2010 (Link errors corrected 3 January 2011)
2.1 Path Expressionshttp://www.w3.org/TR/sparql11-query/#sparqlOffsetLimit
XQuery path expressions use the abbreviated syntax of XPath, extended with a new "dereference" operator and a new type of predicate called a "range predicate" (described below). XPath syntax is used in several XML-related applications including XSLT [XSLT] and XPointer [XPointer].
In XQuery, the result of a path expression is an ordered list of nodes (of course, each node includes its descendant nodes, so the result can be thought of as an ordered forest.) The top-level nodes in the path expression result are ordered according to their position in the original hierarchy, in top-down, left-to-right order. The result of a path expression may contain duplicate values (i.e., multiple nodes with the same type and content), but it will not contain duplicate nodes (i.e., multiple nodes with the same node identity).
A path expression consists of a series of steps. Each step represents movement through a document in a particular direction, and each step can apply one or more predicates to eliminate nodes that fail to satisfy a given condition. The result of each step is a list of nodes that serves as a starting point for the next step.
A path expression can begin with an expression that identifies a specific node, such as the function document(string), which returns the root node of a named document. A query can also contain a path expression beginning with "/" or "//" which represents an implicit root node, determined by the environment in which the query is executed.
A complete discussion of XPath abbreviated syntax can be found in [XPath]. Briefly, the following symbols are used:
. Denotes the current node.
.. Denotes the parent of the current node.
/ Denotes the root node, or a separator between steps in a path.
// Denotes descendants of the current node.
@ Denotes attributes of the current node.
* Denotes "any" (node with unrestricted name).
[ ] Brackets enclose a Boolean expression that serves as a predicate for a given step.
[n] When a predicate consists of an integer, it serves to select the element with the given ordinal number from a list of elements.
......
It is sometimes desirable to isolate a list of elements whose ordinal numbers span a certain range. For this purpose, XQuery provides a RANGE predicate that is adapted from XQL [XQL], as illustrated in the following example. The first element in a list has ordinal number 1. The ordinal numbers of elements in a list are not affected by the presence of other types of nodes such as comments or processing instructions.
(Q2) Find all the figures in chapters 2 through 5 of the document named "zoo.xml."
document("zoo.xml")/chapter[RANGE 2 TO 5]//figure
In addition to the operators of the XPath abbreviated syntax, XQuery introduces an operator called the dereference operator ("->"). When a dereference operator follows an IDREF-type attribute, it returns the element(s) that are referenced by the attribute. A dereference operator is followed by a "name test" that specifies the target element. Following the usual XPath convention, a name test of "*" allows the target element to be of any type.
2.1.2 Dynamic Context
[Definition: The dynamic context of an expression is defined as information that is available at the time the expression is evaluated.] If evaluation of an expression relies on some part of the dynamic context that has not been assigned a value, a dynamic error is raised [err:XPDY0002].
The individual components of the dynamic context are summarized below. Further rules governing the semantics of these components can be found in C.2 Dynamic Context Components.
The dynamic context consists of all the components of the static context, and the additional components listed below.
[Definition: The first three components of the dynamic context (context item, context position, and context size) are called the focus of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression.
Certain language constructs, notably the path expression E1/E2 and the predicate E1[E2], create a new focus for the evaluation of a sub-expression. In these constructs, E2 is evaluated once for each item in the sequence that results from evaluating E1. Each time E2 is evaluated, it is evaluated with a different focus. The focus for evaluating E2 is referred to below as the inner focus, while the focus for evaluating E1 is referred to as the outer focus. The inner focus exists only while E2 is being evaluated. When this evaluation is complete, evaluation of the containing expression continues with its original focus unchanged.
[Definition: The context item is the item currently being processed. An item is either an atomic value or a node.][Definition: When the context item is a node, it can also be referred to as the context node.] The context item is returned by an expression consisting of a single dot (.). When an expression E1/E2 or E1[E2] is evaluated, each item in the sequence obtained by evaluating E1 becomes the context item in the inner focus for an evaluation of E2.
[Definition: The context position is the position of the context item within the sequence of items currently being processed.] It changes whenever the context item changes. When the focus is defined, the value of the context position is an integer greater than zero. The context position is returned by the expression fn:position(). When an expression E1/E2 or E1[E2] is evaluated, the context position in the inner focus for an evaluation of E2 is the position of the context item in the sequence obtained by evaluating E1. The position of the first item in a sequence is always 1 (one). The context position is always less than or equal to the context size.
[Definition: The context size is the number of items in the sequence of items currently being processed.] Its value is always an integer greater than zero. The context size is returned by the expression fn:last(). When an expression E1/E2 or E1[E2] is evaluated, the context size in the inner focus for an evaluation of E2 is the number of items in the sequence obtained by evaluating E1.
Example: XPath="newsSection/news[position() < 6]
SPARQL 1.1 Query Language
W3C Proposed Recommendation 08 November 2012
18.2.5.5 OFFSET and LIMIThttp://www.ietf.org/rfc/rfc2696.txt
If the query contains "OFFSET start" or "LIMIT length"
M := Slice(M, start, length)
start defaults to 0
length defaults to (size(M)-start).
Network Working Group Request for Comments: 2696 LDAP Control Extension for Simple Paged Results Manipulation
2. The Controlhttp://tools.ietf.org/html/rfc2616
This control is included in the searchRequest and searchResultDone
messages as part of the controls field of the LDAPMessage, as defined
in Section 4.1.12 of [LDAPv3]. The structure of this control is as
follows:
pagedResultsControl ::= SEQUENCE {
controlType 1.2.840.113556.1.4.319,
criticality BOOLEAN DEFAULT FALSE,
controlValue searchControlValue
}
The searchControlValue is an OCTET STRING wrapping the BER-encoded
version of the following SEQUENCE:
realSearchControlValue ::= SEQUENCE {
size INTEGER (0..maxInt),
-- requested page size from client
-- result set size estimate from server
cookie OCTET STRING
}
3. Client-Server Interaction
An LDAP client application that needs to control the rate at which
results are returned MAY specify on the searchRequest a
pagedResultsControl with size set to the desired page size and cookie
set to the zero-length string. The page size specified MAY be greater
than zero and less than the sizeLimit value specified in the
searchRequest.
If the page size is greater than or equal to the sizeLimit value, the
server should ignore the control as the request can be satisfied in a
single page. If the server does not support this control, the server
MUST return an error of unsupportedCriticalExtension if the client
requested it as critical, otherwise the server SHOULD ignore the
control. The remainder of this section assumes the server does not
ignore the client's pagedResultsControl.
Each time the server returns a set of results to the client when
processing a search request containing the pagedResultsControl, the
server includes the pagedResultsControl control in the
searchResultDone message. In the control returned to the client, the
size MAY be set to the server's estimate of the total number of
entries in the entire result set. Servers that cannot provide such an
estimate MAY set this size to zero (0). The cookie MUST be set to an
empty value if there are no more entries to return (i.e., the page of
search results returned was the last), or, if there are more entries
to return, to an octet string of the server's choosing,used to resume
the search.
The client MUST consider the cookie to be an opaque structure and
make no assumptions about its internal organization or value. When
the client wants to retrieve more entries for the result set, it MUST
send to the server a searchRequest with all values identical to the
initial request with the exception of the messageID, the cookie, and
optionally a modified pageSize. The cookie MUST be the octet string
on the last searchResultDone response returned by the server.
Returning cookies from previous searchResultDone responses besides
the last one is undefined, as the server implementation may restrict
cookies from being reused.
13.5.4 Combining Byte Ranges
A response might transfer only a subrange of the bytes of an entity-
body, either because the request included one or more Range
specifications, or because a connection was broken prematurely. After
several such transfers, a cache might have received several ranges of
the same entity-body.
If a cache has a stored non-empty set of subranges for an entity, and
an incoming response transfers another subrange, the cache MAY
combine the new subrange with the existing set if both the following
conditions are met:
- Both the incoming response and the cache entry have a cache
validator.
- The two cache validators match using the strong comparison
function (see section 13.3.3).
If either requirement is not met, the cache MUST use only the most
recent partial response (based on the Date values transmitted with
every response, and using the incoming response if these values are
equal or missing), and MUST discard the other partial information.
14.16 Content-Range
The Content-Range entity-header is sent with a partial entity-body to
specify where in the full entity-body the partial body should be
applied. Range units are defined in section 3.12.
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
byte-range-resp-spec = (first-byte-pos "-" last-byte-pos)
| "*"
instance-length = 1*DIGIT
The header SHOULD indicate the total length of the full entity-body,
unless this length is unknown or difficult to determine. The asterisk
"*" character means that the instance-length is unknown at the time
when the response was generated.
Unlike byte-ranges-specifier values (see section 14.35.1), a byte-
range-resp-spec MUST only specify one range, and MUST contain
absolute byte positions for both the first and last byte of the
range.
A byte-content-range-spec with a byte-range-resp-spec whose last-
byte-pos value is less than its first-byte-pos value, or whose
instance-length value is less than or equal to its last-byte-pos
value, is invalid. The recipient of an invalid byte-content-range-
spec MUST ignore it and any content transferred along with it.
A server sending a response with status code 416 (Requested range not
satisfiable) SHOULD include a Content-Range field with a byte-range-
resp-spec of "*". The instance-length specifies the current length of
the selected resource. A response with status code 206 (Partial
Content) MUST NOT include a Content-Range field with a byte-range-
resp-spec of "*".
Examples of byte-content-range-spec values, assuming that the entity
contains a total of 1234 bytes:
. The first 500 bytes:
bytes 0-499/1234
. The second 500 bytes:
bytes 500-999/1234
. All except for the first 500 bytes:
bytes 500-1233/1234
. The last 500 bytes:
bytes 734-1233/1234
When an HTTP message includes the content of a single range (for
example, a response to a request for a single range, or to a request
for a set of ranges that overlap without any holes), this content is
transmitted with a Content-Range header, and a Content-Length header
showing the number of bytes actually transferred. For example,
HTTP/1.1 206 Partial content
Date: Wed, 15 Nov 1995 06:25:24 GMT
Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
Content-Range: bytes 21010-47021/47022
Content-Length: 26012
Content-Type: image/gif
When an HTTP message includes the content of multiple ranges (for
example, a response to a request for multiple non-overlapping
ranges), these are transmitted as a multipart message. The multipart
media type used for this purpose is "multipart/byteranges" as defined
in appendix 19.2. See appendix 19.6.3 for a compatibility issue.
A response to a request for a single range MUST NOT be sent using the
multipart/byteranges media type. A response to a request for
multiple ranges, whose result is a single range, MAY be sent as a
multipart/byteranges media type with one part. A client that cannot
decode a multipart/byteranges message MUST NOT ask for multiple
byte-ranges in a single request.
When a client requests multiple byte-ranges in one request, the
server SHOULD return them in the order that they appeared in the
request.
If the server ignores a byte-range-spec because it is syntactically
invalid, the server SHOULD treat the request as if the invalid Range
header field did not exist. (Normally, this means return a 200
response containing the full entity).
If the server receives a request (other than one including an If-
Range request-header field) with an unsatisfiable Range request-
header field (that is, all of whose byte-range-spec values have a
first-byte-pos value greater than the current length of the selected
resource), it SHOULD return a response code of 416 (Requested range
not satisfiable) (section 10.4.17).
Note: clients cannot depend on servers to send a 416 (Requested
range not satisfiable) response instead of a 200 (OK) response for
an unsatisfiable Range request-header, since not all servers
implement this request-header.
14.35 Range
14.35.1 Byte Ranges
Since all HTTP entities are represented in HTTP messages as sequences
of bytes, the concept of a byte range is meaningful for any HTTP
entity. (However, not all clients and servers need to support byte-
range operations.)
Byte range specifications in HTTP apply to the sequence of bytes in
the entity-body (not necessarily the same as the message-body).
A byte range operation MAY specify a single range of bytes, or a set
of ranges within a single entity.
ranges-specifier = byte-ranges-specifier
byte-ranges-specifier = bytes-unit "=" byte-range-set
byte-range-set = 1#( byte-range-spec | suffix-byte-range-spec )
byte-range-spec = first-byte-pos "-" [last-byte-pos]
first-byte-pos = 1*DIGIT
last-byte-pos = 1*DIGIT
The first-byte-pos value in a byte-range-spec gives the byte-offset
of the first byte in a range. The last-byte-pos value gives the
byte-offset of the last byte in the range; that is, the byte
positions specified are inclusive. Byte offsets start at zero.
If the last-byte-pos value is present, it MUST be greater than or
equal to the first-byte-pos in that byte-range-spec, or the byte-
range-spec is syntactically invalid. The recipient of a byte-range-
set that includes one or more syntactically invalid byte-range-spec
values MUST ignore the header field that includes that byte-range-
set.
If the last-byte-pos value is absent, or if the value is greater than
or equal to the current length of the entity-body, last-byte-pos is
taken to be equal to one less than the current length of the entity-
body in bytes.
By its choice of last-byte-pos, a client can limit the number of
bytes retrieved without knowing the size of the entity.
suffix-byte-range-spec = "-" suffix-length
suffix-length = 1*DIGIT
A suffix-byte-range-spec is used to specify the suffix of the
entity-body, of a length given by the suffix-length value. (That is,
this form specifies the last N bytes of an entity-body.) If the
entity is shorter than the specified suffix-length, the entire
entity-body is used.
If a syntactically valid byte-range-set includes at least one byte-
range-spec whose first-byte-pos is less than the current length of
the entity-body, or at least one suffix-byte-range-spec with a non-
zero suffix-length, then the byte-range-set is satisfiable.
Otherwise, the byte-range-set is unsatisfiable. If the byte-range-set
is unsatisfiable, the server SHOULD return a response with a status
of 416 (Requested range not satisfiable). Otherwise, the server
SHOULD return a response with a status of 206 (Partial Content)
containing the satisfiable ranges of the entity-body.
Examples of byte-ranges-specifier values (assuming an entity-body of
length 10000):
- The first 500 bytes (byte offsets 0-499, inclusive): bytes=0-
499
- The second 500 bytes (byte offsets 500-999, inclusive):
bytes=500-999
- The final 500 bytes (byte offsets 9500-9999, inclusive):
bytes=-500
- Or bytes=9500-
- The first and last bytes only (bytes 0 and 9999): bytes=0-0,-1
- Several legal but not canonical specifications of the second 500
bytes (byte offsets 500-999, inclusive):
bytes=500-600,601-999
bytes=500-700,601-999
14.35.2 Range Retrieval Requests
HTTP retrieval requests using conditional or unconditional GET
methods MAY request one or more sub-ranges of the entity, instead of
the entire entity, using the Range request header, which applies to
the entity returned as the result of the request:
Range = "Range" ":" ranges-specifier
A server MAY ignore the Range header. However, HTTP/1.1 origin
servers and intermediate caches ought to support byte ranges when
possible, since Range supports efficient recovery from partially
failed transfers, and supports efficient partial retrieval of large
entities.
If the server supports the Range header and the specified range or
ranges are appropriate for the entity:
- The presence of a Range header in an unconditional GET modifies
what is returned if the GET is otherwise successful. In other
words, the response carries a status code of 206 (Partial
Content) instead of 200 (OK).
- The presence of a Range header in a conditional GET (a request
using one or both of If-Modified-Since and If-None-Match, or
one or both of If-Unmodified-Since and If-Match) modifies what
is returned if the GET is otherwise successful and the
condition is true. It does not affect the 304 (Not Modified)
response returned if the conditional is false.
In some cases, it might be more appropriate to use the If-Range
header (see section 14.27) in addition to the Range header.
If a proxy that supports ranges receives a Range request, forwards
the request to an inbound server, and receives an entire entity in
reply, it SHOULD only return the requested range to its client. It
SHOULD store the entire received response in its cache if that is
consistent with its cache allocation policies.
14.27 If-Range
If a client has a partial copy of an entity in its cache, and wishes
to have an up-to-date copy of the entire entity in its cache, it
could use the Range request-header with a conditional GET (using
either or both of If-Unmodified-Since and If-Match.) However, if the
condition fails because the entity has been modified, the client
would then have to make a second request to obtain the entire current
entity-body.
The If-Range header allows a client to "short-circuit" the second
request. Informally, its meaning is `if the entity is unchanged, send
me the part(s) that I am missing; otherwise, send me the entire new
entity'.
If-Range = "If-Range" ":" ( entity-tag | HTTP-date )
If the client has no entity tag for an entity, but does have a Last-
Modified date, it MAY use that date in an If-Range header. (The
server can distinguish between a valid HTTP-date and any form of
entity-tag by examining no more than two characters.) The If-Range
header SHOULD only be used together with a Range header, and MUST be
ignored if the request does not include a Range header, or if the
server does not support the sub-range operation.
If the entity tag given in the If-Range header matches the current
entity tag for the entity, then the server SHOULD provide the
specified sub-range of the entity using a 206 (Partial content)
response. If the entity tag does not match, then the server SHOULD
return the entire entity using a 200 (OK) response.
3. Paged Feeds A paged feed is a set of linked feed documents that together contain the entries of a logical feed, without any guarantees about the stability of each document's contents. Paged feeds are lossy; that is, it is not possible to guarantee that clients will be able to reconstruct the contents of the logical feed at a particular time. Entries may be added or changed as the pages of the feed are accessed, without the client becoming aware of them. Therefore, clients SHOULD NOT present paged feeds as coherent or complete, or make assumptions to that effect. Paged feeds can be useful when the number of entries is very large, infinite, or indeterminate. Clients can "page" through the feed, only accessing a subset of the feed's entries as necessary. For example, a search engine might make query results available as a paged feed, so that queries with very large result sets do not overwhelm the server, the network, or the client. The feed documents in a paged feed are tied together with the following link relations: o "first" - A URI that refers to the furthest preceding document in a series of documents. o "last" - A URI that refers to the furthest following document in a series of documents. o "previous" - A URI that refers to the immediately preceding document in a series of documents. o "next" - A URI that refers to the immediately following document in a series of documents. Paged feed documents MUST have at least one of these link relations present, and should contain as many as practical and applicable. Example: Atom-formatted Paged Feed <?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <title>Example Feed</title> <link href="http://example.org/"/> <link rel="self" href="http://example.org/index.atom"/> <link rel="next" href="http://example.org/index.atom?page=2"/> <updated>2003-12-13T18:30:02Z</updated> <author> <name>John Doe</name> </author> <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id> <entry> <title>Atom-Powered Robots Run Amok</title> <link href="http://example.org/2003/12/13/atom03"/> <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id> <updated>2003-12-13T18:30:02Z</updated> <summary>Some text.</summary> </entry> </feed> This specification does not address duplicate entries in paged feeds.