[DFDL-WG] RFC2119 keywords and capitalization e.g., must vs. MUST

15 Sep 2020

      I am looking for guidance here while editing the DFDL spec.

An open topic has been the RFC2119 keywords. Many specs just refer to these
and say "we use RFC2119 conventions."

We can't do that. We use keywords MUST, SHOULD, SHALL, etc. in 3 ways.

1) Discussing requirements/desires for all implementations - e.g., all
implementations MUST detect schema definition errors.

2) Discussing "requirements" a DFDL schema author must ensure are followed
in order to avoid getting a SDE. E.g., "The DFDL fillByte property must be
a single byte or single character."  Many times we follow this sort of
requirement by "and it is an SDE otherwise." It is implied that
implementations must detect this SDE and provide a diagnostic to the user.

3) Discussing "requirements" that data must conform to, in order for the
data to conform to a DFDL schema. E.g., "The representation must be
followed by the terminating delimiter." The assumption in these cases is
that it is a processing error if the terminator is not found. It is the way
these runtime-behaviors are expressed that is of note. They are not phrased
in terms of how implementations must behave, but rather in terms of what
users should expect. My use of the word "should" here is intentional - we
do in fact use "should" in many cases to imply required behavior, not
suggested behavior.

These different categories of requirement areas are normal to a description
of a language standard, and category (3) here is to some degree unique to a
description language that describes other formal objects that may or may
not in fact correspond to the description. E.g., there are unit test
frameworks for code where one writes things like "add(1, 1) should equal
2", where "should" means the same thing as "shall" or "must".

Next: the term "may" is used to mean

(1) optional per RFC2119

(2) "allows" or "admits" when describing DFDL schemas E.g., "If the nil
representation may not be zero length, but the empty representation is zero
length, then the absent representation cannot occur because zero-length
will be interpreted as the empty representation."

(3) discussing possible alternatives for data E.g, "Encodings which are 5,
6, 7, and 9 bits wide are rare, but do exist, so the overall length of the
content region may not be a multiple of 8 bits wide."

My current working draft (r27, not yet on redmine, but soon) has put all
the category 1 usages in all upper case (for MUST, SHOULD, SHALL, MAY, etc.)

I have left all the category (2) and (3) as lower case, as there are many
of them, and they really do need to be distinguished from category (1).

We have to explain these 3 different kinds of usage if we're going to
reference RFC2119 at all. But I am seriously considering whether we should
just say we DON"T use RFC2119 conventions but rather explain these 3
categories of usage.

Next the terms required and optional in RFC2119 are used

(1) about requirements that MUST be implemented vs are OPTIONAL but
RECOMMENDED, and if implemented are REQUIRED to be implemented in a
conforming manner.

(2) about language features that are optional for use. E.g., the testKind
property of a dfdl:assert annotation is optional, and assumed to be
"expression", not "pattern" if not provided.

(3) as formal terms to discuss when default values are used or not for
array/optional elements.

I propose that category (1) above is again all caps, but categories (2) and
(3) are not.

I have not made any such changes as yet.

So, what I'm looking for is feedback on these ideas.  What I am planning to
do is in the section on Terminology and Conventions to explain these 3
cateogories of "requirements" and mention that RFC2119 convention applies
only to the first category.

Comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>