Did we omit these by mistake? They seem quite important even though in DFDL they could only be used for a single character (because we only allow single node return sequences).

I have a data format I am modeling where there is a sort of ad-hoc encoding using 5 bits.

The codepoints are values 0 - 31 assigned to the letters 0-7 A-H J-N P-Z (that's characters 0-7 and A-Z without I and O). The best I can do without functions that create characters from codepoint integers is a 32-leaf  if-then-else tree statement on an inputValueCalc.

This is ok, but it seems we should have a function to convert from a character to its codepoint as an integer, and back.


7.2 Functions to Assemble and Disassemble Strings

Function Meaning
fn:codepoints-to-string Creates an xs:string from a sequence of Unicode code points.
fn:string-to-codepoints Returns the sequence of Unicode code points that constitute an xs:string.

7.2.1 fn:codepoints-to-string

fn:codepoints-to-string($arg as xs:integer*) as xs:string

Summary: Creates an xs:string from a sequence of [The Unicode Standard] code points. Returns the zero-length string if $arg is the empty sequence. If any of the code points in $arg is not a legal XML character, an error is raised [err:FOCH0001].

7.2.1.1 Examples
  • fn:codepoints-to-string((2309, 2358, 2378, 2325)) returns "अशॊक"

7.2.2 fn:string-to-codepoints

fn:string-to-codepoints($arg as xs:string?) as xs:integer*

Summary: Returns the sequence of [The Unicode Standard] code points that constitute an xs:string. If $arg is a zero-length string or the empty sequence, the empty sequence is returned.

7.2.2.1 Examples
  • fn:string-to-codepoints("Thérèse") returns the sequence (84, 104, 233, 114, 232, 115, 101)



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy