Every "8-bit-ascii" encoding I can find has holes in the code page. That is, values that don't have a corresponding character codepoint assigned.

Example: iso-8859-X are a bunch of 8-bit ascii-based encodings that are popular.

If you lookup iso-8859-1 it has this language:

Code values 00–1F, 7F–9F are not assigned to characters by ISO/IEC 8859-1.

The lower range 20 to 7E (the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as ASCII), ...

They're saying 7-bit ascii is included, and some other codes are there, but they don't assign a codepoint generally.

So, to me suggesting use of any particular code page for this purpose is somewhat ambiguous. E.g., what does &#x01 mean in a string if the encoding is iso-8859-1? There appears to be a set of translation tables that assign this to unicode in standard ways that one can find on the web. But the codepoint doesn't have an assigned meaning in iso-8859-X standards.

Two possible clarifications:

1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa.

2) define dfdl:encoding="bytes" as a special character set name which has the above property.

Personally, I prefer 2. It is simpler to explain what is going on, and when people are depending on bytes it will be clearer that they are.

...mike