
I am looking for guidance here while editing the DFDL spec. An open topic has been the RFC2119 keywords. Many specs just refer to these and say "we use RFC2119 conventions." We can't do that. We use keywords MUST, SHOULD, SHALL, etc. in 3 ways. 1) Discussing requirements/desires for all implementations - e.g., all implementations MUST detect schema definition errors. 2) Discussing "requirements" a DFDL schema author must ensure are followed in order to avoid getting a SDE. E.g., "The DFDL fillByte property must be a single byte or single character." Many times we follow this sort of requirement by "and it is an SDE otherwise." It is implied that implementations must detect this SDE and provide a diagnostic to the user. 3) Discussing "requirements" that data must conform to, in order for the data to conform to a DFDL schema. E.g., "The representation must be followed by the terminating delimiter." The assumption in these cases is that it is a processing error if the terminator is not found. It is the way these runtime-behaviors are expressed that is of note. They are not phrased in terms of how implementations must behave, but rather in terms of what users should expect. My use of the word "should" here is intentional - we do in fact use "should" in many cases to imply required behavior, not suggested behavior. These different categories of requirement areas are normal to a description of a language standard, and category (3) here is to some degree unique to a description language that describes other formal objects that may or may not in fact correspond to the description. E.g., there are unit test frameworks for code where one writes things like "add(1, 1) should equal 2", where "should" means the same thing as "shall" or "must". Next: the term "may" is used to mean (1) optional per RFC2119 (2) "allows" or "admits" when describing DFDL schemas E.g., "If the nil representation may not be zero length, but the empty representation is zero length, then the absent representation cannot occur because zero-length will be interpreted as the empty representation." (3) discussing possible alternatives for data E.g, "Encodings which are 5, 6, 7, and 9 bits wide are rare, but do exist, so the overall length of the content region may not be a multiple of 8 bits wide." My current working draft (r27, not yet on redmine, but soon) has put all the category 1 usages in all upper case (for MUST, SHOULD, SHALL, MAY, etc.) I have left all the category (2) and (3) as lower case, as there are many of them, and they really do need to be distinguished from category (1). We have to explain these 3 different kinds of usage if we're going to reference RFC2119 at all. But I am seriously considering whether we should just say we DON"T use RFC2119 conventions but rather explain these 3 categories of usage. Next the terms required and optional in RFC2119 are used (1) about requirements that MUST be implemented vs are OPTIONAL but RECOMMENDED, and if implemented are REQUIRED to be implemented in a conforming manner. (2) about language features that are optional for use. E.g., the testKind property of a dfdl:assert annotation is optional, and assumed to be "expression", not "pattern" if not provided. (3) as formal terms to discuss when default values are used or not for array/optional elements. I propose that category (1) above is again all caps, but categories (2) and (3) are not. I have not made any such changes as yet. So, what I'm looking for is feedback on these ideas. What I am planning to do is in the section on Terminology and Conventions to explain these 3 cateogories of "requirements" and mention that RFC2119 convention applies only to the first category. Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>