
I'm going to send this and then duck - we've discussed the subject of missing-ness and defaulting at considerable length already. However, I genuinely do have some new information for your consideration so please hear me out. I'm seeking the opinion of the working group on the following questions: a) can an element reliably be categorised as 'missing' when separatorPolicy='suppressed'? b) is it possible for an element to be 'missing' if it has lengthKind='explicit' and its length is a static, non-zero value? c) is it possible for an element to be 'missing' if it has a discriminator that has already evaluated to 'true'. For reference, the specification ( v0.42 ) says this concerning missing elements: Definition 'missing element' On parsing, an element is missing if its content region in the data stream is empty. The initiator and terminator regions of a missing element may, or may not, also be empty as controlled by the dfdl:emptyValueDelimiterPolicy property (simple and complex element), or dfdl:nilValueDelimiterPolicy property (simple element), . Question a), Compare the following data streams. In both cases, assume that - separator is comma and separatorPosition is 'infix' - missingValueDelimiterPolicy is set to 'none' so a 'missing' value should not have an initiator. - the initiators are A:, B: and C: - values are a,b,c. separatorPolicy='required' : A:a,,C:c separatorPolicy='suppressed' : A:a,C:c In the 'required' case, the parser detects that the initiator is missing, then looks to see whether the content region is zero-length. It is, so the element is 'missing'. In the 'suppressed' case, the parser detects that the initiator is missing, then looks to see whether the content region is zero-length. It looks for a delimiter at the current position and finds 'C'. 'C' is not a delimiter, so the content region is not zero-length. So the parser throws a processing error - "initiator for element B was not found in the data". I don't think the 'suppressed' behaviour is what a user will expect, nor what the WG intended when these rules were drawn up. The problem is that the parser cannot reliably determine the length of the content region when separatorPolicy='suppressed'. It can, however, reliably detect whether the element is present - the initiator gives a strong hint about that. Somebody may say "well duh!. Of course the content region is empty if the initiator is not present". That may be a reasonable rule, but it is not the rule currently given in the specification. Note that the content region has not been looked at, so that rule relies on the parser speculatively parsing the element and then backtracking because the initiator is not found. If we allow that, then why not allow default values to be applied after other types of processing error ( even for cases where no initiator was defined )? There are good reasons for not applying defaults after normal backtracking ( hence the current rule ) so any such 'missing initiator implies empty content' rule would have to made explicit in the specification. Possible refinements of the rules: a) IF the length of the content region cannot reliably be determined ( lengthKind='delimited and separatorPolicy=suppressed ) AND emptyValueDelimiterPolicy does not include the initiator AND the element has an initiator AND the initiator was not found THEN assume that the content length is zero and treat the element as missing. or b) IF (the element has an initiator AND the initiator was not found )THEN IF the parent group has initiatedContent='yes' THEN the element is missing else apply the existing rules. b) would provide a way to get defaults applied in situations where the content region's length is either fixed or undefined. Quite a lot of users might assume this behaviour anyway. Question b) A similar situation can arise when lengthKind='explicit' and the length is fixed ( i.e. is not a DFDL expression ). Unless the missing field occurs at the end of a known-length structure the length of the content region will never be zero. I think a similar rule is required for this case also: - IF the length of the content region is fixed ( lengthKind='explicit' and length is a static, non-zero value ) AND emptyValueDelimiterPolicy does not include the initiator AND the element has an initiator AND the initiator was not found THEN assume that the content length is zero and treat the element as missing. ...or apply suggestion b) above. Question c) Suppose that an element has a discriminator, and it has already evaluated to 'true' ( it must have been a backward reference to some previously-parsed field ). The discriminator has unambiguously stated that the element *is* present in the data. If it is subsequently found to have a zero-length content region, should the parser treat it as 'missing' and attempt to apply a default?. I don't think so. Please tell me that I'm missing something obvious here - it's starting to sound complicated again. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU