Hi

This is a preliminary assessment of paging for OCCI/JSON on HTTP/REST.  This is not a completed list of findings.

There was some debate and interest in categorizing the different methods used to implement a query page mechanism.  The I've examined several specifications and standards sourced from SNIA, IETF, OMG, OGF, W3C ANSI and others. The odata specification is not considered due to protected proprietary technologies.  Pertinent specifications an excepts are noted below.

The preliminary findings discovered 9 methods for controlling the delivery of page results from query operations.
  1. HTTP Range/Content-Range header (HTTP 1.1)
  2. OFFSET/LIMIT parameters (SPARQL)
  3. RANGE predicate (Xquery/Xpath)
  4. pagedResultsControl.size/Cookie (LDAP)
  5. Links to pages (ATOM)
  6. LIMIT<offset>, <rowcount> (SQL)
  7. TOP <rowcount> (SQL)
  8. WHERE ROWNUM eval Value (SQL)
  9. OFFSET <value> (SQL)
We can distill this list down to four (4) basic approaches
  1. Query parameters specifying result row Range (start_offset, end_offset)
  2. Query parameters specifying result row offset and count
  3. Query parameters specifying count, assuming first query and a opaque handle for context
  4. Links to pages

NOTE: We could force further consolidation of items one (1) and two (2) on the list above.  I elected to separate each item based on the uniqueness of their semantics.

Some additional information:

There seems to be a growing trend in the html5 communities to adopt the use of the http 1.1 RANGE header to provide query pagination. For example, the dojo javascript library.

An open, patent free approach to query pagination for URI query expression does not seems to exist. 

If OOCI/JSON elects to adopt that approach,  it will need clear definition describing the behaviors surrounding insertions, deletes and modification of OCCI information records.  I have included below an example document showing a the level detail we should consider for the OCCI specification.  

Since REST is a context free approach to API and there a no ordering and sequence constraints defined for the delivery, modification and insertion of OCCI information, there may be opportunities where duplicate and missing OCCI information can occur. We should also consider a well defined approach to reconcile inconsistencies in paged operations.

I'll be following up with more information so we can develop recommendations to fully resolve paging issues for URI, HTTP and OCCI operations in payload data.   


cheers,
gary mazzaferro



http://snia.org/sites/default/files/CDMI%20v1.0.2.pdf
Cloud Data Management Interface
(CDMI™) Version 1.0.2
5.13.3 Range Support
The server shall support HTTP Range headers and partial content responses (see Section 14.16 of RFC
2616).


http://www.w3.org/TR/DOM-Level-2-Traversal-Range/traversal.html

Document Object Model (DOM) Level 2 Traversal and Range Specification
Version 1.0
W3C Recommendation 13 November, 2000

1. Document Object Model Traversal
1.1. Overview
1.1.1. NodeIterators
1.1.2. NodeFilters
1.1.3. TreeWalker
1.2. Formal Interface Definition NodeIterator, NodeFilter, TreeWalker, DocumentTraversal

2. Document Object Model Range
2.1. Introduction
2.2. Definitions and Notation
2.2.1. Position
2.2.2. Selection and Partial Selection
2.2.3. Notation
2.3. Creating a Range
2.4. Changing a Range's Position
2.5. Comparing Range Boundary-Points
2.6. Deleting Content with a Range
2.7. Extracting Content
2.8. Cloning Content
2.9. Inserting Content
2.10. Surrounding Content
2.11. Miscellaneous Members
2.12. Range modification under document mutation
2.12.1. Insertions
2.12.2. Deletions
2.13. Formal Description of the Range Interface Range, DocumentRange, RangeException, RangeExceptionCode
http://www.w3.org/TR/xquery/
XQuery 1.0: An XML Query Language (Second Edition)
W3C Recommendation 14 December 2010 (Link errors corrected 3 January 2011)
2.1 Path Expressions

XQuery path expressions use the abbreviated syntax of XPath, extended with a new "dereference" operator and a new type of predicate called a "range predicate" (described below). XPath syntax is used in several XML-related applications including XSLT [XSLT] and XPointer [XPointer].

In XQuery, the result of a path expression is an ordered list of nodes (of course, each node includes its descendant nodes, so the result can be thought of as an ordered forest.) The top-level nodes in the path expression result are ordered according to their position in the original hierarchy, in top-down, left-to-right order. The result of a path expression may contain duplicate values (i.e., multiple nodes with the same type and content), but it will not contain duplicate nodes (i.e., multiple nodes with the same node identity).

A path expression consists of a series of steps. Each step represents movement through a document in a particular direction, and each step can apply one or more predicates to eliminate nodes that fail to satisfy a given condition. The result of each step is a list of nodes that serves as a starting point for the next step.

A path expression can begin with an expression that identifies a specific node, such as the function document(string), which returns the root node of a named document. A query can also contain a path expression beginning with "/" or "//" which represents an implicit root node, determined by the environment in which the query is executed.

A complete discussion of XPath abbreviated syntax can be found in [XPath]. Briefly, the following symbols are used:

.    Denotes the current node.
..    Denotes the parent of the current node.
/    Denotes the root node, or a separator between steps in a path.
//    Denotes descendants of the current node.
@    Denotes attributes of the current node.
*    Denotes "any" (node with unrestricted name).
[ ]    Brackets enclose a Boolean expression that serves as a predicate for a given step.
[n]    When a predicate consists of an integer, it serves to select the element with the given ordinal number from a list of elements.
......
It is sometimes desirable to isolate a list of elements whose ordinal numbers span a certain range. For this purpose, XQuery provides a RANGE predicate that is adapted from XQL [XQL], as illustrated in the following example. The first element in a list has ordinal number 1. The ordinal numbers of elements in a list are not affected by the presence of other types of nodes such as comments or processing instructions.

(Q2) Find all the figures in chapters 2 through 5 of the document named "zoo.xml."

document("zoo.xml")/chapter[RANGE 2 TO 5]//figure
In addition to the operators of the XPath abbreviated syntax, XQuery introduces an operator called the dereference operator ("->"). When a dereference operator follows an IDREF-type attribute, it returns the element(s) that are referenced by the attribute. A dereference operator is followed by a "name test" that specifies the target element. Following the usual XPath convention, a name test of "*" allows the target element to be of any type.

2.1.2 Dynamic Context

[Definition: The dynamic context of an expression is defined as information that is available at the time the expression is evaluated.] If evaluation of an expression relies on some part of the dynamic context that has not been assigned a value, a dynamic error is raised [err:XPDY0002].

The individual components of the dynamic context are summarized below. Further rules governing the semantics of these components can be found in C.2 Dynamic Context Components.

The dynamic context consists of all the components of the static context, and the additional components listed below.

[Definition: The first three components of the dynamic context (context item, context position, and context size) are called the focus of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression.

Certain language constructs, notably the path expression E1/E2 and the predicate E1[E2], create a new focus for the evaluation of a sub-expression. In these constructs, E2 is evaluated once for each item in the sequence that results from evaluating E1. Each time E2 is evaluated, it is evaluated with a different focus. The focus for evaluating E2 is referred to below as the inner focus, while the focus for evaluating E1 is referred to as the outer focus. The inner focus exists only while E2 is being evaluated. When this evaluation is complete, evaluation of the containing expression continues with its original focus unchanged.

[Definition: The context item is the item currently being processed. An item is either an atomic value or a node.][Definition: When the context item is a node, it can also be referred to as the context node.] The context item is returned by an expression consisting of a single dot (.). When an expression E1/E2 or E1[E2] is evaluated, each item in the sequence obtained by evaluating E1 becomes the context item in the inner focus for an evaluation of E2.

[Definition: The context position is the position of the context item within the sequence of items currently being processed.] It changes whenever the context item changes. When the focus is defined, the value of the context position is an integer greater than zero. The context position is returned by the expression fn:position(). When an expression E1/E2 or E1[E2] is evaluated, the context position in the inner focus for an evaluation of E2 is the position of the context item in the sequence obtained by evaluating E1. The position of the first item in a sequence is always 1 (one). The context position is always less than or equal to the context size.

[Definition: The context size is the number of items in the sequence of items currently being processed.] Its value is always an integer greater than zero. The context size is returned by the expression fn:last(). When an expression E1/E2 or E1[E2] is evaluated, the context size in the inner focus for an evaluation of E2 is the number of items in the sequence obtained by evaluating E1.

Example:  XPath="newsSection/news[position() < 6]

http://www.w3.org/TR/sparql11-query/#sparqlOffsetLimit
SPARQL 1.1 Query Language
W3C Proposed Recommendation 08 November 2012
18.2.5.5 OFFSET and LIMIT
If the query contains "OFFSET start" or "LIMIT length"
M := Slice(M, start, length)
start defaults to 0
length defaults to (size(M)-start).
http://www.ietf.org/rfc/rfc2696.txt

Network Working Group
Request for Comments: 2696
LDAP Control Extension for Simple Paged Results Manipulation

2. The Control

This control is included in the searchRequest and searchResultDone
messages as part of the controls field of the LDAPMessage, as defined
in Section 4.1.12 of [LDAPv3]. The structure of this control is as
follows:
pagedResultsControl ::= SEQUENCE {
controlType 1.2.840.113556.1.4.319,
criticality BOOLEAN DEFAULT FALSE,
controlValue searchControlValue
}

The searchControlValue is an OCTET STRING wrapping the BER-encoded
version of the following SEQUENCE:

realSearchControlValue ::= SEQUENCE {
size INTEGER (0..maxInt),
-- requested page size from client
-- result set size estimate from server
cookie OCTET STRING
}

3. Client-Server Interaction

An LDAP client application that needs to control the rate at which
results are returned MAY specify on the searchRequest a
pagedResultsControl with size set to the desired page size and cookie
set to the zero-length string. The page size specified MAY be greater
than zero and less than the sizeLimit value specified in the
searchRequest.

If the page size is greater than or equal to the sizeLimit value, the
server should ignore the control as the request can be satisfied in a
single page. If the server does not support this control, the server
MUST return an error of unsupportedCriticalExtension if the client
requested it as critical, otherwise the server SHOULD ignore the
control. The remainder of this section assumes the server does not
ignore the client's pagedResultsControl.

Each time the server returns a set of results to the client when
processing a search request containing the pagedResultsControl, the
server includes the pagedResultsControl control in the
searchResultDone message. In the control returned to the client, the
size MAY be set to the server's estimate of the total number of
entries in the entire result set. Servers that cannot provide such an
estimate MAY set this size to zero (0). The cookie MUST be set to an
empty value if there are no more entries to return (i.e., the page of
search results returned was the last), or, if there are more entries
to return, to an octet string of the server's choosing,used to resume
the search.

The client MUST consider the cookie to be an opaque structure and
make no assumptions about its internal organization or value. When
the client wants to retrieve more entries for the result set, it MUST
send to the server a searchRequest with all values identical to the
initial request with the exception of the messageID, the cookie, and
optionally a modified pageSize. The cookie MUST be the octet string
on the last searchResultDone response returned by the server.
Returning cookies from previous searchResultDone responses besides
the last one is undefined, as the server implementation may restrict
cookies from being reused.

http://tools.ietf.org/html/rfc2616
RFC 2616                        HTTP/1.1                       June 1999

13.5.4 Combining Byte Ranges

   A response might transfer only a subrange of the bytes of an entity-
   body, either because the request included one or more Range
   specifications, or because a connection was broken prematurely. After
   several such transfers, a cache might have received several ranges of
   the same entity-body.

   If a cache has a stored non-empty set of subranges for an entity, and
   an incoming response transfers another subrange, the cache MAY
   combine the new subrange with the existing set if both the following
   conditions are met:

      - Both the incoming response and the cache entry have a cache
        validator.

      - The two cache validators match using the strong comparison
        function (see section 13.3.3).

   If either requirement is not met, the cache MUST use only the most
   recent partial response (based on the Date values transmitted with
   every response, and using the incoming response if these values are
   equal or missing), and MUST discard the other partial information.

14.16 Content-Range

   The Content-Range entity-header is sent with a partial entity-body to
   specify where in the full entity-body the partial body should be
   applied. Range units are defined in section 3.12.

       Content-Range = "Content-Range" ":" content-range-spec

       content-range-spec      = byte-content-range-spec
       byte-content-range-spec = bytes-unit SP
                                 byte-range-resp-spec "/"
                                 ( instance-length | "*" )

       byte-range-resp-spec = (first-byte-pos "-" last-byte-pos)
                                      | "*"
       instance-length           = 1*DIGIT

   The header SHOULD indicate the total length of the full entity-body,
   unless this length is unknown or difficult to determine. The asterisk
   "*" character means that the instance-length is unknown at the time
   when the response was generated.

   Unlike byte-ranges-specifier values (see section 14.35.1), a byte-
   range-resp-spec MUST only specify one range, and MUST contain
   absolute byte positions for both the first and last byte of the
   range.

   A byte-content-range-spec with a byte-range-resp-spec whose last-
   byte-pos value is less than its first-byte-pos value, or whose
   instance-length value is less than or equal to its last-byte-pos
   value, is invalid. The recipient of an invalid byte-content-range-
   spec MUST ignore it and any content transferred along with it.

   A server sending a response with status code 416 (Requested range not
   satisfiable) SHOULD include a Content-Range field with a byte-range-
   resp-spec of "*". The instance-length specifies the current length of
   the selected resource. A response with status code 206 (Partial
   Content) MUST NOT include a Content-Range field with a byte-range-
   resp-spec of "*".

   Examples of byte-content-range-spec values, assuming that the entity
   contains a total of 1234 bytes:

      . The first 500 bytes:
       bytes 0-499/1234

      . The second 500 bytes:
       bytes 500-999/1234

      . All except for the first 500 bytes:
       bytes 500-1233/1234

      . The last 500 bytes:
       bytes 734-1233/1234

   When an HTTP message includes the content of a single range (for
   example, a response to a request for a single range, or to a request
   for a set of ranges that overlap without any holes), this content is
   transmitted with a Content-Range header, and a Content-Length header
   showing the number of bytes actually transferred. For example,

       HTTP/1.1 206 Partial content
       Date: Wed, 15 Nov 1995 06:25:24 GMT
       Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
       Content-Range: bytes 21010-47021/47022
       Content-Length: 26012
       Content-Type: image/gif

   When an HTTP message includes the content of multiple ranges (for
   example, a response to a request for multiple non-overlapping
   ranges), these are transmitted as a multipart message. The multipart
   media type used for this purpose is "multipart/byteranges" as defined
   in appendix 19.2. See appendix 19.6.3 for a compatibility issue.

   A response to a request for a single range MUST NOT be sent using the
   multipart/byteranges media type.  A response to a request for
   multiple ranges, whose result is a single range, MAY be sent as a
   multipart/byteranges media type with one part. A client that cannot
   decode a multipart/byteranges message MUST NOT ask for multiple
   byte-ranges in a single request.

   When a client requests multiple byte-ranges in one request, the
   server SHOULD return them in the order that they appeared in the
   request.
  
   If the server ignores a byte-range-spec because it is syntactically
   invalid, the server SHOULD treat the request as if the invalid Range
   header field did not exist. (Normally, this means return a 200
   response containing the full entity).

   If the server receives a request (other than one including an If-
   Range request-header field) with an unsatisfiable Range request-
   header field (that is, all of whose byte-range-spec values have a
   first-byte-pos value greater than the current length of the selected
   resource), it SHOULD return a response code of 416 (Requested range
   not satisfiable) (section 10.4.17).

      Note: clients cannot depend on servers to send a 416 (Requested
      range not satisfiable) response instead of a 200 (OK) response for
      an unsatisfiable Range request-header, since not all servers
      implement this request-header.

14.35 Range

14.35.1 Byte Ranges

   Since all HTTP entities are represented in HTTP messages as sequences
   of bytes, the concept of a byte range is meaningful for any HTTP
   entity. (However, not all clients and servers need to support byte-
   range operations.)

   Byte range specifications in HTTP apply to the sequence of bytes in
   the entity-body (not necessarily the same as the message-body).

   A byte range operation MAY specify a single range of bytes, or a set
   of ranges within a single entity.

       ranges-specifier = byte-ranges-specifier
       byte-ranges-specifier = bytes-unit "=" byte-range-set
       byte-range-set  = 1#( byte-range-spec | suffix-byte-range-spec )
       byte-range-spec = first-byte-pos "-" [last-byte-pos]
       first-byte-pos  = 1*DIGIT
       last-byte-pos   = 1*DIGIT

   The first-byte-pos value in a byte-range-spec gives the byte-offset
   of the first byte in a range. The last-byte-pos value gives the
   byte-offset of the last byte in the range; that is, the byte
   positions specified are inclusive. Byte offsets start at zero.

   If the last-byte-pos value is present, it MUST be greater than or
   equal to the first-byte-pos in that byte-range-spec, or the byte-
   range-spec is syntactically invalid. The recipient of a byte-range-
   set that includes one or more syntactically invalid byte-range-spec
   values MUST ignore the header field that includes that byte-range-
   set.

   If the last-byte-pos value is absent, or if the value is greater than
   or equal to the current length of the entity-body, last-byte-pos is
   taken to be equal to one less than the current length of the entity-
   body in bytes.

   By its choice of last-byte-pos, a client can limit the number of
   bytes retrieved without knowing the size of the entity.

       suffix-byte-range-spec = "-" suffix-length
       suffix-length = 1*DIGIT

   A suffix-byte-range-spec is used to specify the suffix of the
   entity-body, of a length given by the suffix-length value. (That is,
   this form specifies the last N bytes of an entity-body.) If the
   entity is shorter than the specified suffix-length, the entire
   entity-body is used.

   If a syntactically valid byte-range-set includes at least one byte-
   range-spec whose first-byte-pos is less than the current length of
   the entity-body, or at least one suffix-byte-range-spec with a non-
   zero suffix-length, then the byte-range-set is satisfiable.
   Otherwise, the byte-range-set is unsatisfiable. If the byte-range-set
   is unsatisfiable, the server SHOULD return a response with a status
   of 416 (Requested range not satisfiable). Otherwise, the server
   SHOULD return a response with a status of 206 (Partial Content)
   containing the satisfiable ranges of the entity-body.

   Examples of byte-ranges-specifier values (assuming an entity-body of
   length 10000):

      - The first 500 bytes (byte offsets 0-499, inclusive):  bytes=0-
        499

      - The second 500 bytes (byte offsets 500-999, inclusive):
        bytes=500-999

      - The final 500 bytes (byte offsets 9500-9999, inclusive):
        bytes=-500

      - Or bytes=9500-

      - The first and last bytes only (bytes 0 and 9999):  bytes=0-0,-1

      - Several legal but not canonical specifications of the second 500
        bytes (byte offsets 500-999, inclusive):
         bytes=500-600,601-999
         bytes=500-700,601-999

14.35.2 Range Retrieval Requests

   HTTP retrieval requests using conditional or unconditional GET
   methods MAY request one or more sub-ranges of the entity, instead of
   the entire entity, using the Range request header, which applies to
   the entity returned as the result of the request:

      Range = "Range" ":" ranges-specifier

 A server MAY ignore the Range header. However, HTTP/1.1 origin
   servers and intermediate caches ought to support byte ranges when
   possible, since Range supports efficient recovery from partially
   failed transfers, and supports efficient partial retrieval of large
   entities.

   If the server supports the Range header and the specified range or
   ranges are appropriate for the entity:

      - The presence of a Range header in an unconditional GET modifies
        what is returned if the GET is otherwise successful. In other
        words, the response carries a status code of 206 (Partial
        Content) instead of 200 (OK).

      - The presence of a Range header in a conditional GET (a request
        using one or both of If-Modified-Since and If-None-Match, or
        one or both of If-Unmodified-Since and If-Match) modifies what
        is returned if the GET is otherwise successful and the
        condition is true. It does not affect the 304 (Not Modified)
        response returned if the conditional is false.

   In some cases, it might be more appropriate to use the If-Range
   header (see section 14.27) in addition to the Range header.

   If a proxy that supports ranges receives a Range request, forwards
   the request to an inbound server, and receives an entire entity in
   reply, it SHOULD only return the requested range to its client. It
   SHOULD store the entire received response in its cache if that is
   consistent with its cache allocation policies.

14.27 If-Range

   If a client has a partial copy of an entity in its cache, and wishes
   to have an up-to-date copy of the entire entity in its cache, it
   could use the Range request-header with a conditional GET (using
   either or both of If-Unmodified-Since and If-Match.) However, if the
   condition fails because the entity has been modified, the client
   would then have to make a second request to obtain the entire current
   entity-body.

   The If-Range header allows a client to "short-circuit" the second
   request. Informally, its meaning is `if the entity is unchanged, send
   me the part(s) that I am missing; otherwise, send me the entire new
   entity'.

        If-Range = "If-Range" ":" ( entity-tag | HTTP-date )

   If the client has no entity tag for an entity, but does have a Last-
   Modified date, it MAY use that date in an If-Range header. (The
   server can distinguish between a valid HTTP-date and any form of
   entity-tag by examining no more than two characters.) The If-Range
   header SHOULD only be used together with a Range header, and MUST be
   ignored if the request does not include a Range header, or if the
   server does not support the sub-range operation.

   If the entity tag given in the If-Range header matches the current
   entity tag for the entity, then the server SHOULD provide the
   specified sub-range of the entity using a 206 (Partial content)
   response. If the entity tag does not match, then the server SHOULD
   return the entire entity using a 200 (OK) response.

http://www.ietf.org/rfc/rfc5005.txt
Feed Paging and Archiving
Request for Comments: 5005                                September 2007
3.  Paged Feeds

   A paged feed is a set of linked feed documents that together contain
   the entries of a logical feed, without any guarantees about the
   stability of each document's contents.

   Paged feeds are lossy; that is, it is not possible to guarantee that
   clients will be able to reconstruct the contents of the logical feed
   at a particular time.  Entries may be added or changed as the pages
   of the feed are accessed, without the client becoming aware of them.

   Therefore, clients SHOULD NOT present paged feeds as coherent or
   complete, or make assumptions to that effect.

   Paged feeds can be useful when the number of entries is very large,
   infinite, or indeterminate.  Clients can "page" through the feed,
   only accessing a subset of the feed's entries as necessary.
 
   For example, a search engine might make query results available as a
   paged feed, so that queries with very large result sets do not
   overwhelm the server, the network, or the client.

   The feed documents in a paged feed are tied together with the
   following link relations:

   o  "first" - A URI that refers to the furthest preceding document in
      a series of documents.

   o  "last" - A URI that refers to the furthest following document in a
      series of documents.

   o  "previous" - A URI that refers to the immediately preceding
      document in a series of documents.

   o  "next" - A URI that refers to the immediately following document
      in a series of documents.

   Paged feed documents MUST have at least one of these link relations
   present, and should contain as many as practical and applicable.

   Example: Atom-formatted Paged Feed

   <?xml version="1.0" encoding="utf-8"?>
   <feed xmlns="http://www.w3.org/2005/Atom">
    <title>Example Feed</title>
    <link href="http://example.org/"/>
    <link rel="self" href="http://example.org/index.atom"/>
    <link rel="next" href="http://example.org/index.atom?page=2"/>
    <updated>2003-12-13T18:30:02Z</updated>
    <author>
      <name>John Doe</name>
    </author>
    <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
    <entry>
      <title>Atom-Powered Robots Run Amok</title>
      <link href="http://example.org/2003/12/13/atom03"/>
      <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
      <updated>2003-12-13T18:30:02Z</updated>
      <summary>Some text.</summary>
    </entry>
   </feed>

   This specification does not address duplicate entries in paged feeds.