1. 16.2 scannablility with lengthKind
pattern:
2. Current Actions:
3 Steve H issues with draft 039
4 Tim's (major) issues with draft
039
5 Status of specification (for
OGF28)
1. 16.2 scannablility with
lengthKind pattern:
In summary, you can use a data
pattern on any element (complex, simple text, simple binary) as long as
the bytes are legal in the stated encoding, which where binary data is
involved in practice means an 8-bit ASCII encoding.
Binary data can be handled using some
of the conveniences of text by way of treating it as text with encoding="iso-8859-1".
In this case literal text, such as length patterns, is interpreted as in
the iso-8859-1 character encoding, and the correspondence of byte values
in the data to a string in the DFDL infoset is one to one. That is, byte
with value N, produces an infoset character with character code N.
2. Current Actions:
No
| Action
|
049
| 20/05 AP Built-in specification description
and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It
seemed that the main value is it define a schema location for downloading
'known' defaults from the web.
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it
can be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress. The predefined formats do not need to be available
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't
look as though the way text numbers are define is very usable. He will
document for next call
03/02: No progress
10/02: No progress
17/03: No progress
24/03: No progress
|
066
| Investigate format for defining test
cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still investigating
10/02: IBM is still investigating
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
|
079
| MB:Encoding for binary fields when lenghtkind
is pattern
17/02: Discussed but no conclusion
24/03: Mike has found an encoding that matches the first 255 codepoints
of iso 10646. Will document its use for binary fields.
|
080
| AP:Clarify semantics of fn:poisition
and fn:count
17/02: no progress
24/03: No progress
|
083
| MB:To correct syntax diagram for FinalUnused
and suggest wording for the Sequence section |
3 Steve H issues with draft 039
1) Name of property dfdl:textNumberRepresentation
is not consistent with dfdl:binaryNumberRep, dfdl:binaryFloatRep,
etc.
2) The dfdl:numberPattern etc properties
that have been moved from the defunct dfdl:textNumberFormat annotation
to dfdl:element etc should be called dfdl:textNumberPattern etc.
Otherwise users will think they apply to binary numbers too.
3) In section 14.3 on sequences, there
are several sub-sections that talk about parsing according to different
ways of specifying length (ie, lengthKind). But dfdl:sequence no longer
carries dfdl:lengthKind so I think these sub-sections are not in the right
place. I think they should be in section 12, under the correct 12.3.x
lengthKind sub-section.
4) Section 19 on built-in specifications.
Given that we don't have any for public comment phase we should reword
this section.
4 Tim's (major) issues with draft
039
12.2 Delimiters: Text Markup
- The term 'Delimiters' is not
accurate. Most readers will not think of an initiator as a 'delimiter'.
- It's not 'Text' markup any more -
especially since v0.39 has allowed lengthKind="delimited" for
elements with binary representation.
Title should be 'Markup' and explanation
can then deal with what it really is, rather than justifying the innaccurate
title :-)
Syntax for specifying markup:
It's not clear from this description
that each item in the space-separated list is a DFDL string literal.
initiator ( and all other space-separated
properties )
It is not clear whether the order of
the space-separated properties matters. Must the parser test them in the
order in which they are specified?
( Q: What if %ES; is the first in the
list? )
terminator:
is it OK if the final terminator is
missing within the scope of a known-length parent? Seems like a reasonable
extension of the rule ( in all other scenarios, the end of a known-length
parent acts like the end of the data stream for items with its scope ).
documentFinalTerminatorCanBeMissing:
Let's try to avoid creating another
property for the postfix separator scenario. I think this property provides
a way of modelling the data naturally.
We can recommend use of infix-with-a-terminator
rather than 'postfix' if the final terminator can be missing.
outputNewLine
Should we validate that the 'characterOrCharacters'
are all newline characters from the set described by the %NL; mnemonic?
Otherwise the DFDL serializer will output data which cannot be parsed by
the DFDL parser.
dfdl:lengthKind endOfParent
'endOfParent' has almost the same meaning
as 'delimited' so should have the same semantics.
· the
item’s terminator (if specified)
· an
enclosing construct’s separator or terminator
· the
end of an enclosing construct designated by its known length
· the
end of the data stream
The effect would be the the element could
be ended by the nearest known length parent not just the immediate parent.
Also the immediate parent could have lengthKind 'implicit'
choiceKind 'Fixed'
When lengthKind='implicit' all alternative
branches of the choice are padded to the fixed length of the largest one
so that overall the entire choice construct is fixed length
There must be a restriction that the
length of at least one choice must be statically defined.
Regards
Alan Powell
Development - MQSeries,
Message Broker, ESB
|
IBM Software Group, Application and
Integration Middleware Software
|
-------------------------------------------------------------------------------------------------------------------------------------------
|
IBM
|
MP211, Hursley Park
|
Hursley, SO21 2JN
|
United Kingdom
|
Phone: +44-1962-815073
|
e-mail: alan_powell@uk.ibm.com |
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU