
Next question: What should happen if a regular expression uses a Unicode block or category to match against a string in a non-Unicode encoding? Should that be a Schema Definition Error, or should the regular expression just silently fail to match anything? I would prefer a Schema Definition Error, even though detecting such a condition would be difficult. For example: <?xml encoding="UTF-8"?> ... <xs:element name="foo" type="xs:string" dfdl:encoding="Shift_JIS"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert testKind="pattern" testPattern="\p{InGreek}"/> </xs:appinfo> </xs:annotation> </xs:element> ... \p{InGreek} matches a character in the Greek Unicode block, but such a category is incongruent with the Shift_JIS encoding. In short, any Unicode block or category would not make sense against a non-Unicode encoding; for example, \p{Lu} matches a character in the uppercase letter category, but the list of Unicode characters in that category cannot be easily compared to a Shift_JIS encoding. Thanks in advance, -- Jonathan W. Cranford Senior Information Systems Engineer The MITRE Corporation (http://www.mitre.org)