de·lim·it·er (dĭ-lĭm'ĭ-tər)
n. Computer Science A character or sequence of characters marking the beginning or end of a unit of data. |
delimiter character
A character
or string
used to separate, or mark the start and end of, items of data in, e.g.,
a database,
source
code, or text
file.
See also: record.
(2001-03-16)
These definitions are consistent with our usage of the
term.
I suggest no change in our terminology here.
<TK>
Point taken re: the term 'delimiter'.
I still have reservations about calling it 'Text Markup' in the title,
though. I think the intro paragraph should explain the common usage ( intiators,
separators, terminators for text formats ) and the exceptional usage (
handling delimited binary data and other non-text markup )
</TK>
Syntax for specifying markup:
It's not clear from this description that each item in the space-separated
list is a DFDL string literal.
These have always bugged me. Any better solution is welcome.
XML/XSD does tend to make space separated the standard way to specify more
than one thing.
<TK>
In a future revision of
the spec we need a list of property value types which can then be used
consistently in the
tables which describe properties.
- Enumeration
- DFDL string literal
- List of DFDL string literals
- DFDL expression
- DFDL regular expression
- Boolean
- Non-negative integer
- any more?
In some cases it will be
necessary to place restrictions on the type of content allowed in the string
literal
( disallow raw byte values
/ raw byte values must represent a character / etc )
</TK
initiator ( and all other space-separated
properties )
It is not clear whether the order of the space-separated properties matters.
Must the parser test them in the order in which they are specified?
( Q: What if %ES; is the first in the list? )
I think the order should not matter, and it should test
them longest first.
<TK>
Good idea.
I have another related suggestion
below.
</TK>
terminator:
is it OK if the final terminator is missing within the scope of a known-length
parent? Seems like a reasonable extension of the rule ( in all other scenarios,
the end of a known-length parent acts like the end of the data stream for
items with its scope ).
I believe this should be true. "Final" is relative
in my mind.
<TK>
Good - it's much easier to implement
if end of known length parent is always equivalent to end of data stream,
from the point of view of enclosed elements.
But see next point...
</TK>
documentFinalTerminatorCanBeMissing:
Let's try to avoid creating another property for the postfix separator
scenario. I think this property provides a way of modelling the data naturally.
We can recommend use of infix-with-a-terminator rather than 'postfix' if
the final terminator can be missing.
Copasetic.
<TK>
Had to look up 'copasetic'. I'm
amazed that my Mum never came out with that one - she's a walking dictionary.
This property has caused problems
with naming and interpretation all along the line. Last time we discussed
it, I don't
think we considered this option
( we did talk about something like it ):
- If %ES; is included in the list
of values for separator or terminator then
a) The parser ignores it while
performing ordinary scanning ( otherwise it would always cause a zero-length
string to be scanned ).
b) The parser accepts 'end of
data stream' as a match for the %ES; mnemonic. That makes this property
( and the equivalent one for separators ) redundant.
c) Other usages of %ES; remain
unchanged.
</TK>
outputNewLine
Should we validate that the 'characterOrCharacters' are all newline characters
from the set described by the %NL; mnemonic? Otherwise the DFDL serializer
will output data which cannot be parsed by the DFDL parser.
Nice catch.
dfdl:lengthKind endOfParent
'endOfParent' has almost the same meaning as 'delimited' so should have
the same semantics.
· the item’s
terminator (if specified)
· an enclosing
construct’s separator or terminator
· the end of
an enclosing construct designated by its known length
· the end of
the data stream
The effect would be the the element could be ended by the nearest known
length parent not just the immediate parent. Also the immediate parent
could have lengthKind 'implicit'
Agreed.
choiceKind 'Fixed'
When lengthKind='implicit' all alternative branches of the choice are padded
to the fixed length of the largest one so that overall the entire choice
construct is fixed length
There must be a restriction that the length of at least one choice must
be statically defined.
Also good catch.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU