July 2014 - dfdl-wg - lists.ogf.org

still no need for fn:error
by Mike Beckerle 03 Aug '16

03 Aug '16

Here's the example as suggested by Jonathan of how to issue an error that I previously was suggesting needed fn:error. I'm convinced this is sufficient and we can avoid the need for fn:error now. <xs:element name="magic_number" type="ex:uint32" dfdl:byteOrder="bigEndian"> <xs:annotation> <xs:appinfo source="*http://www.ogf.org/dfdl/*<http://www.ogf.org/dfdl/dfdl-1.0/> "> <dfdl:assert test="{ (xs:unsignedInt(.) eq *dfdl:hUInt('0xa1b2c3d4')*) | (xs:unsignedInt(.) eq *dfdl:hUInt('0xd4c3b2a1')*) }" message="{ fn:concat( 'Magic number ', dfdl:hexBinary(dfdl:unsignedInt(.)), ' was not 0xA1B2C3D4 (for bigEndian) or 0xD4C3B2A1 (for littleEndian).' }" /> <dfdl:setVariable ref="ex:bOrd">{ xs:unsignedInt(.) }<dfdl:setVariable> </xs:appinfo> </xs:annotation> </xs:element> -- Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com

2 11

bit order documents updated
by Mike Beckerle 01 Aug '14

01 Aug '14

There are now several working documents having to do with bit-order. The prior document has been taken down and replaced by 3 others. These are all up on redmine: http://redmine.ogf.org/dmsf/dfdl-wg?folder_id=5485 1) draft-gwdi-bit-order-features-v2.05.docx - Just describes the new features + TDML extension. The biggest section is organized like an errata specifying where things change in the spec. This should get incorporated into errata, or perhaps just referenced from an errata. Section 3 of this doc is the proposed errata and spec language for review. There is one major open issue here, which is it is unclear whether the mixed-endian byte order that was previously proposed is actually needed or not. The swapping of 16-bit words appears to be not a per-element thing, but something done as a pre-processing of an entire message body before parsing. This isn't something we can handle in DFDL, much like base64-encoded data and so forth. This may be the only place that needed the 16-bit word swapping. 2) Understanding Bit Orderings - draft-gwdi-mil-std-2045-understanding-bit-order-v2.05.docx - Material about bit order - the Wire model vs. Number model material. This is effectively just archiving this material for posterity. If you already read this, you don't need to read it again. 3) draft-gwdi-mil-std-2045-additional-features-v2.05.docx - Material about additional DFDL features that would be helpful in modeling MIL-STD-2045. This can also be reviewed. Not as urgent as (1) above. 4) draft-gwdi-dfdl-standard-encodings-v03.docx - This is material that will be integrated back into the spec as an appendix, but this incorporates feedback on prior versions, and adds a 6-bit ascii encoding that is used by the same binary format standards as the 7-bit one. Works same way, just 6 bits not 7, so there's some codepoint changes. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

2 7

draft r20
by Mike Beckerle 01 Aug '14

01 Aug '14

I have posted draft r20 to redmine. - Resolved a few comment bubbles raised by steve h. - moved a few things from Appendix on standard charsets to glossary, improved examples there. - Changed header and footer to August 2014. - Added missing "Code Point" glossary entry. - Minor editorial fixes. I believe the only thing left is to get the bit-order materials finalized and in. One open: Steve H suggested appendix tables and figures should be numbered A.1, A.2, B.1, etc. I am uncertain how to do this, so r20 is content changes, if I can figure out how to do the numbering that will be another revision. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Thu, Jul 31, 2014 at 12:31 PM, Steve Hanson <smh(a)uk.ibm.com> wrote: > Mike > > Reviewed draft r18, and created draft r19. A few editorial changes, plus > some editorial comments, plus a comment about the Glossary definition of > Bit Position for when bitOrder is added. I'll let you post to Redmine in > case you want to make any more edits. > > > > Regards > > Steve Hanson > Architect, *IBM DFDL* > <http://www.ibm.com/developerworks/library/se-dfdl/index.html> > Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> > IBM SWG, Hursley, UK > *smh(a)uk.ibm.com* <smh(a)uk.ibm.com> > tel:+44-1962-815848 > > > > From: Mike Beckerle <mbeckerle.dfdl(a)gmail.com> > To: "dfdl-wg(a)ogf.org" <dfdl-wg(a)ogf.org>, > Date: 25/07/2014 02:34 > Subject: [DFDL-WG] draft r18 on redmine > Sent by: dfdl-wg-bounces(a)ogf.org > ------------------------------ > > > > I posted spec r18 to redmine. > > This incorporates all the errata as of the 1.5 experience doc 1. > > - One exception - the errata about implementation-defined and > implementation-dependent. I went through and updated all the places where > this was not correctly stated per Jonathan's document. > - However, I did not create an appendix also listing all these things. > That turns out to be hard to do, and redundant (you end up copying text > blocks - then they're in two places....) > - I think if we want an index to everyplace that > "implementation-defined" and "implementation-dependent" appear in the > document, that's what an index is for, so we should generate one and put > those in as indexed terms. > > I added the appendix of DFDL Standard Character Set Encodings, which > includes both 7-bit and 6-bit flavors now. > > I believe the last thing to do is bit order, but that should be reviewed > as a separate document first. > > > Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | > *www.tresys.com* <http://www.tresys.com/> > Please note: Contributions to the DFDL Workgroup's email discussions are > subject to the *OGF Intellectual Property Policy* > <http://www.ogf.org/About/abt_policies.php> > -- > dfdl-wg mailing list > dfdl-wg(a)ogf.org > https://www.ogf.org/mailman/listinfo/dfdl-wg > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >

2 1

Fw: DFDL schemas for 4690 Tlog (SA) published on GitHub
by Steve Hanson 31 Jul '14

31 Jul '14

The repository in DFDLSchemas GitHub which contains DFDL schemas for the Toshiba Commerce (formerly IBM) 4690 Transaction Log (Tlog) point-of-sale format has been updated. It now includes schemas for SuperMarket Application (SA) in addition to the existing ACE schemas. The schemas are compatible with IBM DFDL 1.0 and 1.1 releases, as shipped in IBM WebSphere Message Broker 8.0.0.2 onwards and IBM Integration Bus 9.0.0.1 onwards, respectively. Example data streams and XML infosets are included. Full details in the repository readme at https://github.com/DFDLSchemas/IBM4690-TLOG. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh(a)uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

clarification on dfdl:occursIndex() function
by Mike Beckerle 31 Jul '14

31 Jul '14

This function says it can be called on non-array elements. However, it does not say what the result is. If called when "." is not itself an array element there are only two possible behaviors consistent with the fact that it is explicitly allowed on non-array elements. The result has to be either (a) 1 (b) the occursIndex of the nearest enclosing array parent, or 1 if there is no enclosing array parent. I claim (a) is fairly pointless. You will just end up having to create newVariableInstances to carry the array current index downward into expressions. I cannot think of a use case where one would want to call occursIndex() polymorphically, i.e., where you want a number in the case of an array, but 1 otherwise. So (b) is the preferable behavior. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

3 2

Rescheduled: DFDL Working Group Call (4 Aug 15:00 GDT in DE3S09)
by Steve Hanson 31 Jul '14

31 Jul '14

OGF DFDL WG weekly call dial-in details. Passcode for Participants: 5381214 Canada Toll-Free 888-426-6840 China Toll-Free 10-800-711-1071 CHINA NETCOM GROUP USERS China Toll-Free 10-800-110-0996 CHINA TELECOM SOUTH USERS France Toll-Free 0800-94-0558 Germany Toll-Free 0800-000-1018 India Toll-Free 000-800-100-1176 Ireland Toll-Free 1-800-943-427 Israel Toll-Free 1-809-417-783 United Kingdom Caller Paid 0-20-30596451 United Kingdom Toll-Free 0800-368-0638 USA Caller Paid 215-861-6239 USA Toll-Free 888-426-6840 Other international numbers available - e-mail smh(a)uk.ibm.com. OGF DFDL Home: http://www.ogf.org/dfdl Redmine DFDL: http://redmine.ogf.org/projects/dfdl-wg Rescheduling as Mike and Steve can't make 5th

1 0

more xpath functions: bitwise operations
by Mike Beckerle 30 Jul '14

30 Jul '14

There seems to be no reasonable way to deal with binary data in our XPath subset. Ex: This formula, for unsigned binary integer n: (C-programming language) (n >> 1) ^ (-(n & 1)) This is the decoder expression for what are called "zig zag integers" which is a way of encoding signed integers into unsigned integers so that there is no sign-bit that must be extended to the full length. (A trick used in Google Protocol Buffers - an increasingly popular format). There is no reason DFDL cannot encode and decode these zig-zag integers, except that there is no bit-wise arithmetic to do the calculation on inputValueCalc and outputValueCalc. The shift right isn't hard, and the & 1 becomes a simple enough if-then-else and a division. But there is no reasonable way to do a bitwise XOR (The ^ operator). I suspect we need a library of bitwise operations that interpret their arguments as binary integers: My first cut would be: arithmetic-shift-right(width, count) arithmetic-shift-left (width, count) logical-shift-right(width, count) logical-shift-left(width, count) bitwise-and(width, n1, n2) bitwise-or(width, n1, n2) bitwise-not(width, n1) bitwise-xor(width, n1, n2) Where width is 8, 16, 32, or 64. The arithmetic shifts and the logical shift left are for symmetry only. They can easily be implemented by multiplies and divides by 2. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

2 1

OGF DFDL WG Call Minutes 2014-07-29
by Steve Hanson 30 Jul '14

30 Jul '14

Please find minutes from the above call at http://redmine.ogf.org/dmsf_files/13313?download= Please review spec draft r18 on Redmine. Email any comments to dfdl-wg(a)ogf.org. Regards Steve Hanson Architect, IBM DFDL, Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh(a)uk.ibm.com tel:+44-1962-815848

1 0

OGF DFDL WG Call Agenda 2014-07-29
by Steve Hanson 29 Jul '14

29 Jul '14

Please find agenda for call on Redmine at http://redmine.ogf.org/dmsf_files/13313?download= Please review spec draft r18 on Redmine. Email any comments to dfdl-wg(a)ogf.org. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh(a)uk.ibm.com tel:+44-1962-815848

1 0

Review of draft-gwdi-mil-std-2045-additional-features
by Steve Hanson 28 Jul '14

28 Jul '14

IBM has continued its review of the proposed additions to lengthKind and occursCountKind to simplify the modelling of MIL-STD-2045 formats. The email below carries on from an earlier email but has removed everything to do with bitOrder etc. New stuff is in blue. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh(a)uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 28/07/2014 11:31 ----- 1) Proposed new dfdl:lengthKind 'fixedLengthOrTerminated'. A new enum implies that it can be used in any scenario, so the following need to be specified. dfdl:terminator must be set and can not be empty string or contain ES on its own If xs:string or xs:hexBinary, can maxLength facet be used instead of dfdl:length? (Suggest no - this is variable length data so min/maxLength are for validation only). Can dfdl:length be an expression? (Suggest no unless specific use case identified) My use case needs only constants as the maximum, hence enum name contains "fixed" prefix, not "explicit". Any special rules for emptyValueDelimiterPolicy and nilValueDelimiterPolicy ? Since a terminator must be set, then these cannot be "none" or "initiator". SMH: Doesn't follow. Today, if I specify a terminator, it must be present, modulo EVDP/NVDP. So why is the same not true for the new enum? If we add a new enum, it has to work in a way that is consistent with other lengthKinds and not just for MIL-STD-2045 use cases. Use on complex element. Presumably dfdl:length is first used to extract a 'box' but within that box does parser immediately scan for the dfdl:terminator or does it descend into the complex type and parse the children, expecting to either consume all the box or to find the terminator at the end? (Suggest the latter). I have no use case that requires this for complex types at all. Perhaps we can dodge this by having it be simpleFixedLengthOrTerminated, and restricting it to simple types only. ? SMH: Perhaps, but that makes this lengthKind enum different from all the others, and that doesn't seem right. Use on complex element. Last child can not be dfdl:lengthKind 'endOfParent'. Scanning rules: Use of this new dfdl:lengthKind switches off any in-scope stack of terminating markup in force at that point. Put another way, when we are scanning for the dfdl:terminator, we are not looking for any markup from an outer scope. So there's plenty to think about with this new dfdl:lengthKind. A good rule for deciding whether a new dfdl:length or dfdl:occursCountKind should be added is whether it bends some other part of the spec out of shape. The new dfdl:lengthKind looks ok so far. However we *think* we have come up with an alternative model which is simpler than you one you state in the document. Example for field 'varstr' with max length 100: <xs:sequence dfdl:terminator="{if (fn:str-len(varstr) eq 100) then '%ES;' else '%DEL'}" ...> <xs:element name="varstr" type="xs:string" dfdl:lengthKind="pattern" dfdl:pattern="([^\x7F].\x7F)|(.{100})" ... /> </xs:sequence> Can't put dfdl:terminator with a self-referencing expression on the element. Might need fn:exists in the dfdl:terminator expression to handle optionality. Does that work? I don't think this will work as %ES isn't allowed in terminators. There is a proposal to allow it, but only when length kind is such that one is not scanning for delimiters (same restriction as for WSP*). Let's assume that we allow %ES for now. SMH: This has been incorporated as an update to erratum 2.148 and is the latest spec draft. One beauty of your idea here is that unparsing will "just work", so that's nice. But I think your pattern has a bug: I think it should be dfdl:pattern="[^\x7F]{0,99}(?=\x7F)| .{100}" This will not capture more than 99 characters prior to the DEL, and will not include the DEL as part of the string in the case where a DEL is found (uses lookahead in regex). Hence, the DEL will be available to be picked off as the terminator. Without this you end up with the DEL in the payload. With that I think your approach would work. So thanks for that idea. SMH: Yes my pattern was wrong, thanks for correcting. SMH: Also realised that the dfdl:terminator expression is illegal, as it looks downwards. The correct DFDL is: <xs:sequence ...> <xs:element name="varstr" type="xs:string" dfdl:lengthKind="pattern" dfdl:pattern="[^\x7F]{0,9}(?=\x7F)|.{10}" ... /> <xs:sequence dfdl:terminator="{if (fn:string-length(./varstr) eq 10) then '%ES;' else '%DEL'}" .../> </xs:sequence> I have tested this (using {if (fn:string-length(./varstr) eq 10) then '%WSP*;' else '%DEL;'} as %ES; not yet allowed in terminator) and it works ok both parse and unparse. It was noted that if the terminator expression was allowed to refer to the value of its own element then this could be simplified to: <xs:element name="varstr" type="xs:string" dfdl:lengthKind="pattern" dfdl:pattern="[^\x7F]{0,9}(?=\x7F)|.{10}" dfdl:terminator="{if (fn:string-length(.) eq 10) then '%ES;' else '%DEL'}" .../> Clearly this relaxation could only occur when lengthKind was not delimited. (That is, the same condition that we have proposed allowing %ES; for terminator/separator). But I think it also violates the known-to-exist rules ? Certainly IBM DFDL says it can't find '.' in the infoset when I tried this. So perhaps this is not a good idea. 2) Proposed new dfdl:occursCountKind 'prefixed'. The motivation here is to avoid the explosion of global groups needed for the hidden presence indicators. It was observed that a single global group could be used if the expression used a predicate when referring to the FPI element, though obviously that makes the schema very fragile. At first glance the new enum would appear to be symmetric with lengthKind 'prefixed', but on closer examination this is not true: Presumably the new enum would apply to optional elements and arrays. It would have to fit into the grammar thus: Array = [ [PrefixOccursCount Separator] EnclosedElement [ Separator EnclosedElement ]* [ Separator StopValue] ] PrefixOccursCount = SimpleNormalRep It would be wrong to couple the prefix more tightly to the first occurrence (by more tightly I mean like prefix length where the length occurs after the element's left framing region). When parsing, if the value is 0 then nothing else is expected in the data - zero occurrences, so no other DFDL properties are even examined. It must therefore occur ahead of all occurrences. If it is doing that, then it may as well have its own left and right framing, hence use of SimpleNormalRep rather than SimpleContent, and work with delimiters. However IBM questions the need for the enum as it can also be modelled using a choice of two sequences which, if you put the discriminator on the hidden FPI element itself, means you can get away with just two global groups. And you don't need outputValueCalc as you can just use defaults. ...  <xs:choice> <xs:sequence> <xs:sequence dfdl:hiddenGroupRef="vmdfdl:gh_mil_std_2045_FPI_true" /> <xs:element name="unit_name" type="..." ... /> </xs:sequence> <xs:sequence dfdl:hiddenGroupRef="vmdfdl:gh_mil_std_2045_FPI_false" /> </xs:choice>  <xs:choice> <xs:sequence> <xs:sequence dfdl:hiddenGroupRef="vmdfdl:gh_mil_std_2045_FPI_true" /> <xs:element name="unit_type" type="..." ... /> </xs:sequence> <xs:sequence dfdl:hiddenGroupRef="vmdfdl:gh_mil_std_2045_FPI_false" /> </xs:choice> ... <xs:group name="vmdfdl:gh_mil_std_2045_FPI_true" > <xs:sequence> <xs:element name="FPI" type="xs:boolean" default="true" ... > <dfdl:discriminator test="{. eq fn:true()}" </xs:element> </xs:sequence> </xs:group> <xs:group name="vmdfdl:gh_mil_std_2045_FPI_false" > <xs:sequence> <xs:element name="FPI" type="xs:boolean" default="false" ... > </xs:element> </xs:sequence> </xs:group> 3) Proposed new dfdl:occursCountKind 'repeatUntil'. It seems to IBM that the only practical effect of the new enum 'repeatUntil' is to simplify the discriminator. It doesn't remove it nor does it remove the need for the hidden FRI element. IBM does not see the benefit of the new enum in its proposed form. Further... If the above proposal is used for the FPI, the dfdl:occursIndex() branch of the discriminator simplifies to fn:true(). The FRI is local to the array element so, when parsing at least, there is no need for a globally unique group for each array. That simplifies the discriminator to the following and means you only need one global group for FRI. <dfdl:discriminator> if (dfdl:occursIndex() eq 1 then fn:true() else ../<array>[dfdl:occursIndex()-1]/vmfdfdl:gh_mil_std_2045_FRI <dfdl:discriminator> For that to work on unparsing there needs to be a generic way to set the (Boolean) FRI from within the hidden group. Something like dfdl:outputValueCalc="{dfdl:occursIndex() eq fn:count(..)}" There is a problem with this though. The property is on the FRI element so what does dfdl:occursIndex() return? The spec says it returns "the position of the current item within an array" but also says "this function may be used on non-array elements". I'm not clear what it would return for the latter case - does it return 1 or does it look back to its parent or ... ? Here we want the index of the parent. Perhaps this function needs to take an argument to be unambiguous, eg, . or .. or ../.., ie, it can only refer back up to the root. (In fact this problem applies whether or not there is a single FRI or one per array). A counter proposal... One way to really simplify this type of occurrence indicator is to consider it as part of the element, in the same way as a length prefix. This tight binding makes sense here, because there is an indicator per occurrence. dfdl:occursCountKind="stopIndicator' dfdl:occursStopIndicatorType="<type>" The stop indicator type must be derived from xs:boolean. True means the occurrence is the last. False means it is not. Or we can do it the other way round) The DFDL Boolean properties of the type can always be used to compensate. The parser would work a bit like it does for 'stopValue' - it keeps parsing speculatively until it finds an occurrence which indicates the end of the array - the difference being that in this case it is added to the infoset. The oddity about this is that it applies to arrays only and does not work with optional elements, so it can not be used with minOccurs = '0'. Grammar becomes: SimpleNormalRep = LeftFraming StopIndicator PrefixLength SimpleContent RightFraming ComplexNormalRep = LeftFraming StopIndicator PrefixLength ComplexContent ElementUnused RightFraming StopIndicator = SimpleContent Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 1