The following describes the conventions for the various types of source files.
There is great advantage to be had by obeying certain source code conventions.
Among the benefits of these source conventions is are ease of reading and of editing,
the ability to determine the source of current content in the standard, and the enhanced
ease with which multiple editors can work on the same documents.
Editors are requested to abide by these conventions and to correct deviation from these conventions that they find.
Source lines should not be longer than 100 characters.
Explicit line breaks should be used to maintain this.
However, there are exceptions in XML Source Files and XSL Source Files.
See the relevant sections for details.
The line break sequence should be CR LF
That is: the Unicode characters \u000D and \u000A.
This is known as "Windows line ending convention".
Source lines should not be longer than 100 characters.
Explicit line breaks should be used to maintain this.
An exception to this rule is text within the tags:
<code>
... </code>
<hostcode>
... </hostcode>
<xmlcode>
... </xmlcode>
<schemacode>
... </schemacode>
In this text spaces and newlines are significant and must be preserved.
Thus lines which must appear as one in the output must be kept on a single line.
Another exception is text within the <URI>
... </URI>
tag.
A URI may not contain spaces and so, in exceptional cases, the contained text may exceed
the 100 character limit.
On very rare occasions this rule may also need to be broken.
In these cases the source which does not conform must be preceded by the line:
<!-- LINE LENGTH CHECK OFF -->
and followed by the line:
<!-- LINE LENGTH CHECK ON -->
Indentation of child elements is not used.
Partially because of the potentially arbitrarily deep nesting that might occur,
and partially because it avoids the problem of the use of spaces vs tabs.
New sentences should begin a new line.
The use of the ' and " characters in text is discouraged as it sometimes leads to corruption when
source is moved between different systems.
Simple quoted text should normally use the <quote>
... </quote>
tags that give curly double
quotes but the <squote>
... </squote>
tags are also available for single curly quotes
(useful inside curly double quoted text).
To represent character strings the tags <string>
... </string>
are available for marking
strings in languages that use single quotes, the tags <stringd>
... </stringd>
are available
for marking strings in languages that use double quotes, and to represent delimited identifiers,
the tags <delimId>
... </delimId>
are available.
For a punctuation apostrophe (e.g. as a possessive) the tag <apos/>
should be used and if
a prime is need, use the <prime/>
tag.
There are a couple of exceptions:
<comment>
, <bar>
, and <mergeInstr>
elements the above markup is not available and
so the ' character is acceptable. <lblitem>
elements if the label includes " then the @label attribute must use the '
character as its delimiter.The tag <ellipsis/>
should be used in preference to three full stops (...)
Two blank lines should precede every <clause>
element, every <annex>
element, every
<subClause*>
element, every <foreword>
element, every <intro>
element, and every
member of the %subSection.class; parameter entity (such as Format or SyntaxRules).
One blank line precedes every <para>
element, every <note>
element, every <*list>
element, every <item>
element, and every other member of the %block.class; parameter entity.
The only exception to this rule is that each such element might be immediately preceded by
one or more <comment>
elements, in which case the one blank line precedes the first such
<comment>
element, which are themselves not separated from the immediately following member of
the %block.class; parameter entity.
Every paragraph, list item, note, editor's note, Clause/subClause, etc. should be preceded by
one or more comments.
These comments document which proposal, email or other message, or formal ballot comment caused
the text of those items to come into being or to reach their current wording.
These comments should be as close to the actual change as possible.
New <comment>
elements should use the convention that their content follows one of the
following formats:
Editorial: <Name>, <Date in ISO 8601> <Optional Description>
Editorial: <Name>, <Date in ISO 8601> <Paper number at the time of resolution>
<Ballot comment number> <Optional Description>
Email from: <Name>, <Date in ISO 8601> <HHMM> <Optional Description> <Optional Paper number>
Message from: <Name>, <Date in ISO 8601> <HHMM> <Optional Description>
<Optional Paper number>
<Paper number> <Ballot comment number> <Optional Description>
<Paper number> <Optional Description>
<Explanatory comment regarding the XML document itself unrelated to semantical changes>
<comment>
s documenting additional changes caused by an Email or similar message that is related
to the application of a concrete paper should always end with the paper number of that paper.
<comment>
s documenting changes caused by an Email or similar message need to be backed by a
stored copy of that Email or message in Correspondence/
according to the rules laid out in
Correspondence.
Paper numbers should always start with a leading prefix that ends in a colon (such as WG3:
)
that identifies the type of paper.
<comment>
elements should be arranged in ascending chronological order.
Every item, para, note, and ednote should have the initial text on the same line and have the
end tag on a separate line.
For example:
<item><specref ref="gql_refer"/>, identifies additional standards that, through reference
in this document, constitute provisions of this document.
</item>
This allows the element to be collapsed in an XML editor while still showing enough content to allow it to be identified. This is equivalent to the practice of quoting the initial text of rules in change proposals to allow a check on the correct location.
An exception to this is the <\defn>
tag that must directly follow the last non-space character
in the <defn>
.
A further exception is <tableTitle>
which must have no space or newline directly after it and
</tableTitle>
which must have no space or newline directly before it.
Id attributes for Clause, Subclause, paragraph, item, etc., excluding PPs and LOs (see later) are constructed as follows:
An id attribute shall not contain a "-", "-" characters in names shall be replaced by "_".
The id attribute of a Clause or Subclause that is not a modification of a corresponding
Clause or Subclause in some other part of the standard will be <part>
_<tag>
, where:
<part>
is the value of the "Id code" for the standard/part
(See IdCodes) <tag>
is a shortish sequence of characters with some sort of relationship to the Clause
or Subclause’s title, giving some clue about what Clause or Subclause is when referenced
by a use of the id in a "ref" attribute. For example, “fnd_names” is the id associated with Subclause 5.4, “Names and identifiers”, in SQL/Foundation.
The id attribute of a Clause or Subclause that is a modification of a corresponding
Clause or Subclause in some other part of the standard will be <part>
_<tag>
, where:
<part>
is the value of the "Id code" for the standard/part
(See IdCodes) <tag>
is the shortish sequence of characters following the underscore in the part that is
being modified. For example, “xml_names” is the id associated with Subclause 5.2, “Names and identifiers”, in SQL/XML; that Subclause modifies Subclause 5.4, “Names and identifiers”, in SQL/Foundation, so the part of the id following “xml_” in SQL/XML is identical to the part of the corresponding id following “fnd_” in SQL/Foundation.
The id attribute of a Rule, Description, paragraph, etc. in one part that is modified by a
Rule, Description, paragraph, etc. in some other part will be
<part>
_<tag>
_<code>_<tag2>
, where:
<part>
is the value of the "Id code" for the standard/part in which it is specified
(See IdCodes) <tag>
is the the <tag2>
is a shortish sequence of characters that captures some aspect of Rule,
Description, paragraph, etc., giving some clue about the content and/or meaning of the
Rule, Description, etc. being when referenced by a use of the id in a "ref" attribute. For example, “fnd_names_SR_localname” is the id associated with Syntax Rule 4)
(“If a <local or schema qualifier>
,
then”) in Subclause 5.4, “Names and identifiers”, in SQL/ Foundation.
Note that only list items that are contained in ordered lists can be referenced symbolically;
list items contained in unordered (e.g., bulleted) lists cannot be referenced symbolically.
Clauses/subClauses in parts that modify other parts.
Agreements with ISO/CS require that each such clause/subclause must contain a <modifiesPart>
tag (e.g. <modifiesPart part="02" ref="fnd_tables"/>
) that should be the first item after
<bodyMatter>
in the clause or subclause.
Use of the newpage attribute in the <subClause>
element.
Subclauses in Clauses prior to the first Clause defining syntax, in the Clause named
"Conformance", and in the Annexes are NEVER coded with newpage="true".
Subclauses subordinate to a <subClause>
element, e.g. <subClause2>
, are (in principle) never
coded with newpage="true" even though the DTD allows it.
In Clauses where <subClause>
elements with newpage="true" are permitted, the first subClause
in a Clause is NEVER coded with newpage="true", but all subsequent <subClause>
elements in the
Clause are ALWAYS coded with newpage="true".
Merge instructions should be carefully crafted to be as future proof as possible.
This generally means that the any rule being inserted is inserted before the rule that is to
follow it.
This avoids unfortunate situations such as “Insert after GR 5)f)iii)” (assume that the existing
rule 5 ends with sub-rule f, which in turn ends with sub-rule iii) when the inserted rule is not
intended to become GR 5)f)iv) or even GR 5)g), but to become GR 6) instead; in this case, the
instruction should read “Insert before GR 6)” or (“Insert after the last GR”).
However, if the inserted rule is to become GR 5)f)iv), then the proper formulation is
“Insert after GR 5)f)iii)”.
A more difficult situation occurs when the inserted rule is intended to become GR 5)g), in which
case the wording is probably most appropriately “Insert after GR 5)f)”.
When in doubt, consult the other editors for guidance.
Possible Problems and Language Opportunities
In the Editor’s Notes for each standard or part thereof, near the top of the .xml file, there is
a comment containing a list of Possible Problems and Language Opportunities that have not yet
been resolved for that part.
At the end of that list, there is a comment that identifies the greatest PP or LO number so far
assigned; refer to this number as GN.
GN is three characters long, if the number of PPs and LOs exceed 999 then use uppercase Latin
letters (e.g., A12).
When a new PP or LO is added to the Editor’s Notes, the value of GN must be incremented by one.
Following the comment containing the number GN is a sequence of five templates that should be
used for creating the new PP or LO.
Select the appropriate template (Major Technical, Minor Technical, Major Editorial,
Minor Editorial, or Language Opportunity) and copy it to follow the last open PP or LO,
i.e. immediately prior to the "Placed closed PPs and LOs after this line" comment.
The starting tag for a Possible Problem takes four mandatory attributes, “id”, “number”,
“severity” and “realm” (the “severity” and “realm” attributes have already been given values in
the template that was copied.
The starting tag for a Language Opportunity takes only the “id” and “number” mandatory
attributes.
The id attribute will be PP<part><GN>
, where:
<part>
is the value of the "Id code" for the standard/part
(See IdCodes). <GN>
is the new value of GN just computed. The number attribute will be <part>-<GN>
, where:
<part>
is the value of the "PP/LO code" for the standard/part
See IdCodes) <GN>
is the new value of GN just computed. When a PP or LO is resolved it should be preceded by a <comment>
of the form "Resolved by ..."
and moved to a place after the "Placed closed PPs and LOs after this line" comment, preferably
in order of the number attribute.
Note: PPs and LOs are referenced by using the <PPref>
element where of the “ref” attribute
is the value of the “id” attribute of the <PP>
or <LO>
element.
Normally these occur is sentences such as "See
<kw>
tags should only be applied to words that occur in the list of <key word>
s (and,
specifically, to words that are actually used in SQL grammar) and not to other objects such a
column names.
There are many column names, field names, and other object names that are spelled identically to
various keywords, but those must not be marked using
Examples of things that must not be marked using
Dashes
are used in several ways in WG 3 documents.
-
, called hyphen-minus
is used as a
hyphen to separate compound words, such as off-site
.−
(minus
; the Unicode character U+2212).
NOTE: Until
2021-08-06, the character entity –
(en dash
; the
Unicode
character U+2013) was used for this purpose, but it was changed
after agreement with ISO Central Secretariat. &fdash;
(figure dash
; the
Unicode character U+2012). from-to
text (such as Average values are 0 - 10
or
The next meeting will be July 10 - July 15
) is represented
by the character entity –
(the Unicode character
U+2013).—
(the Unicode character
U+2014) is used to separate distinct phrases within sentences.
The grammar of most programming languages distinguishes between "keywords"
(words used in the language to have specific meaning in that
language) and "identifiers" (words used in code written in
that language that have no inherent meaning to the language
itself).
Identifiers are normally used to identify objects
created by application programs, such as data structures and
their components.
The grammar of many languages is defined such that some
keywords must be used only for the purpose defined by the
language itself; that is, no identifier can be spelled
exactly the same as those keywords.
Such keywords are commonly called "reserved keywords" or "reserved words".
Other keywords cannot be confused in their programming
language context with identifiers; those keywords are
commonly called "unreserved words" or "non-reserved words".
It is frequently difficult for a human being examining the grammar of a programming language to determine whether a particular keyword must be reserved or can be non-reserved. A parser for that language is often used to automatically determine which keywords belong in which class.
Code examples (e.g. in rules defining syntactic substitution) must always be expressed in the following form:
<code>
blah, blah, blah
</code>
Change bars.
Change bars should be added before (<bar/>
) and after (<endbar/>
) every textual change,
except for extremely trivial editorial changes such as the correction of spelling mistakes.
These tags may incorporate explanatory text such as the paper number that caused the change.
Deletions of text should be marked with a <delbar/>
, which requires explanatory text.
All change bars should be deleted whenever a new version of the document is published.
Additional conventions for the actual text are to be found in WritingConventions.
Vector graphic files should be in SVG format, other graphic files should use the PNG format.
Source lines should not be longer than 100 characters.
Explicit line breaks should be used to maintain this.
On very rare occasions this rule may need to be broken.
In these cases the source which does not conform must be preceded by the line:
<!-- LINE LENGTH CHECK OFF -->
and followed by the line:
<!-- LINE LENGTH CHECK ON -->
Code should not be disabled (commented out) unless accompanied by a comment stating under what circumstances it is to be re-enabled.
A change log should not be maintained in the document.
The GIT commit messages are sufficient.
Tags included within other tags should be indented with 2 spaces (not with tabs),
except for those immediately included within the'<xsl:stylesheet>
tag and unless both
the included start tag and end tag can be included on the same line.
For example:
<xsl:choose>
<xsl:when test="self::clause">
<xsl:text>Clause </xsl:text>
</xsl:when>
<xsl:when test="self::annex">
<xsl:text>Annex </xsl:text>
</xsl:when>
</xsl:choose>
Tags with attributes that do not fit on one line should have those attributes start on another
line but aligned with the first attribute.
For example:
<fo:block font="italic 9pt Cambria" text-align="start"
space-before.optimum="12pt" space-before.precedence="force">
Two blank lines should separate each template.
Two blank lines should precede all entity definitions.
Parameter entities should be defined first and include the string ".class" in the name.
Every parameter entity should be preceded by a general description and a description of the entities that it incorporates in the form shown by the following example:
<!-- comments.class is the class comprising comments and the three change bar elements
* comment -
* bar defines the start of a change bar
* endbar defines the end of a change bar
* delbar marks the place where text has been deleted
-->
<!ENTITY % comments.class "comment | bar | endbar | delbar">
A change log should not be maintained in the document.
The GIT commit messages are sufficient.
All documentation shall be written in Markdown format (*.md files)..
Do not use `<tag>`
where "tag" is code, hostcode, schemacode, mono or URI as this confuses
the line length checker. instead use the form <tag>
.
See Markdown for help on Markdown syntax.
The documentation generation pipeline supports the generation of mermaid diagrams.
Mermaid is a popular markup syntax for specifying various kinds of diagrams, flowcharts, and
the like. To add another diagram, create a mermaid file with extensions .mmd
in doc/
and
update the Makefile
accordingly. This will make sure that the doc build renders your diagram
to SVG. Generated mermaid diagrams may be included in the documentation using the regular markdown
syntax for embedding images.
See Mermaid for help on Mermaid syntax.
This file: Copyright © 2021 Editors of ISO/IEC JTC 1/SC 32/WG 3: Jim Melton, Stephen Cannan, Jörn Bartels, Stefan Plantikow