Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"

It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations. I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned). For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point". People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with if (fn:exists(path/int)) then path/int else path/dec Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant. Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers. Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was. Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation. Thoughts? Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com

We've handled a really wide range of floats and integers in X3D graphics models, and have found that xsd:schema types are very useful. Unusual edge cases (for advanced error detection) can be handled with patterns (in the case of X3D, we have regex). Only limitation with this approach is that you typically have to pick one or the other, since regex within XSD Schema only applies to xs:string types. Sometimes using xs:schema as primary with separate regex evaluation is useful in a tool. You may have more flexibility about hybrid approaches in DFDL. * X3D Regular Expressions (regexes) * X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene. * https://www.web3d.org/specifications/X3dRegularExpressions.html Opinion: the worst errors are the ones that remain undetected. Season's Greetings! 8) all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman From: dfdl-wg <dfdl-wg-bounces@lists.ogf.org> On Behalf Of Mike Beckerle Sent: Tuesday, December 19, 2023 1:56 PM To: DFDL-WG <dfdl-wg@ogf.org> Subject: [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text" It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations. I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned). For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point". People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with if (fn:exists(path/int)) then path/int else path/dec Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant. Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers. Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was. Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation. Thoughts? Mike Beckerle Apache Daffodil PMC | <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdaffodil.a pache.org%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00d d6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnkn own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC I6Mn0%3D%7C3000%7C%7C%7C&sdata=TldDgQhMRxWID8ZvQZwzc%2B6dD4%2BxkfuakuzBXhuEu KY%3D&reserved=0> daffodil.apache.org OGF DFDL Workgroup Co-Chair | <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.or g%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.e du%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0 %7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dB pieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyb erdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08d c00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7C Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BD AEkQM8%3D&reserved=0>

Unless I am misunderstanding (quite possible!) I don't think the proposal will work, because by the time validation is applied, the DFDL parser will be using the logical value from the infoset, and not the original lexical representation. That's why spec section 5.3.4 Pattern has this ... "Note: in XSD, pattern is about the lexical representation of the data, and since all is text there, everything has a lexical representation. In DFDL only strings are guaranteed to have a lexical and logical value that is identical." I'd prefer exploring whether the dfdl:textNumberPattern property could be a list. So for your example, the pattern would be "00 00.0###". On parsing the patterns are tried in order. On unparsing the same, and it's only an error if none work. My example uses space as list item separator, I think that works as I don't think a space character is allowed as part of a number pattern. Is it possible to prototype this in Apache Daffodil to see whether ICU fails when we think it should do? Regards Steve On Wed, Dec 20, 2023 at 4:18 AM Brutzman, Donald (Don) (CIV) < brutzman@nps.edu> wrote:
We’ve handled a really wide range of floats and integers in X3D graphics models, and have found that xsd:schema types are very useful. Unusual edge cases (for advanced error detection) can be handled with patterns (in the case of X3D, we have regex).
Only limitation with this approach is that you typically have to pick one or the other, since regex within XSD Schema only applies to xs:string types. Sometimes using xs:schema as primary with separate regex evaluation is useful in a tool. You may have more flexibility about hybrid approaches in DFDL.
- X3D Regular Expressions (regexes) - X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene. - https://www.web3d.org/specifications/X3dRegularExpressions.html
Opinion: the worst errors are the ones that remain undetected.
Season’s Greetings! 8)
all the best, Don
--
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman
*From:* dfdl-wg <dfdl-wg-bounces@lists.ogf.org> *On Behalf Of *Mike Beckerle *Sent:* Tuesday, December 19, 2023 1:56 PM *To:* DFDL-WG <dfdl-wg@ogf.org> *Subject:* [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"
It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations.
I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned).
For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point".
People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with
if (fn:exists(path/int)) then path/int else path/dec
Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant.
Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers.
Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was.
Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation.
Thoughts?
Mike Beckerle
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.org%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dBpieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0>
Owl Cyber Defense | www.owlcyberdefense.com <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyberdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BDAEkQM8%3D&reserved=0>
-- dfdl-wg mailing list dfdl-wg@lists.ogf.org https://lists.ogf.org/mailman/listinfo/dfdl-wg
-- Regards Steve

Don said "...regex within XSD Schema only applies to xs:string types." I checked P. Walmsley's book "Definitive XML Schema" (which is my Bible) and it says pattern facets are allowed on all the string, numeric, and date/time types. To be clear, this is not allowed in DFDL v1.0 (because binary numbers are not text, so pattern facets wouldn't make sense in those cases), but pattern facets are allowed in in XSD 1.0, because well, everything is text in XML. Am I misunderstanding something? On Tue, Dec 19, 2023 at 11:18 PM Brutzman, Donald (Don) (CIV) < brutzman@nps.edu> wrote:
We’ve handled a really wide range of floats and integers in X3D graphics models, and have found that xsd:schema types are very useful. Unusual edge cases (for advanced error detection) can be handled with patterns (in the case of X3D, we have regex).
Only limitation with this approach is that you typically have to pick one or the other, since regex within XSD Schema only applies to xs:string types. Sometimes using xs:schema as primary with separate regex evaluation is useful in a tool. You may have more flexibility about hybrid approaches in DFDL.
- X3D Regular Expressions (regexes) - X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene. - https://www.web3d.org/specifications/X3dRegularExpressions.html
Opinion: the worst errors are the ones that remain undetected.
Season’s Greetings! 8)
all the best, Don
--
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman
*From:* dfdl-wg <dfdl-wg-bounces@lists.ogf.org> *On Behalf Of *Mike Beckerle *Sent:* Tuesday, December 19, 2023 1:56 PM *To:* DFDL-WG <dfdl-wg@ogf.org> *Subject:* [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"
It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations.
I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned).
For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point".
People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with
if (fn:exists(path/int)) then path/int else path/dec
Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant.
Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers.
Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was.
Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation.
Thoughts?
Mike Beckerle
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.org%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dBpieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0>
Owl Cyber Defense | www.owlcyberdefense.com <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyberdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BDAEkQM8%3D&reserved=0>
-- dfdl-wg mailing list dfdl-wg@lists.ogf.org https://lists.ogf.org/mailman/listinfo/dfdl-wg

No, I think what you say is correct. My point is that DFDL 1.0 does not just disallow pattern facets on numbers because some elements might have binary rep, but also because even if the rep is text, the original text is not in the infoset, and therefore not available at validation time, which is when the facet is applied. On Wed, Dec 20, 2023 at 2:24 PM Mike Beckerle <mbeckerle@apache.org> wrote:
Don said "...regex within XSD Schema only applies to xs:string types."
I checked P. Walmsley's book "Definitive XML Schema" (which is my Bible) and it says pattern facets are allowed on all the string, numeric, and date/time types.
To be clear, this is not allowed in DFDL v1.0 (because binary numbers are not text, so pattern facets wouldn't make sense in those cases), but pattern facets are allowed in in XSD 1.0, because well, everything is text in XML.
Am I misunderstanding something?
On Tue, Dec 19, 2023 at 11:18 PM Brutzman, Donald (Don) (CIV) < brutzman@nps.edu> wrote:
We’ve handled a really wide range of floats and integers in X3D graphics models, and have found that xsd:schema types are very useful. Unusual edge cases (for advanced error detection) can be handled with patterns (in the case of X3D, we have regex).
Only limitation with this approach is that you typically have to pick one or the other, since regex within XSD Schema only applies to xs:string types. Sometimes using xs:schema as primary with separate regex evaluation is useful in a tool. You may have more flexibility about hybrid approaches in DFDL.
- X3D Regular Expressions (regexes) - X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene. - https://www.web3d.org/specifications/X3dRegularExpressions.html
Opinion: the worst errors are the ones that remain undetected.
Season’s Greetings! 8)
all the best, Don
--
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman
*From:* dfdl-wg <dfdl-wg-bounces@lists.ogf.org> *On Behalf Of *Mike Beckerle *Sent:* Tuesday, December 19, 2023 1:56 PM *To:* DFDL-WG <dfdl-wg@ogf.org> *Subject:* [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text"
It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations.
I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned).
For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point".
People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with
if (fn:exists(path/int)) then path/int else path/dec
Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant.
Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers.
Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was.
Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation.
Thoughts?
Mike Beckerle
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.org%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HAbHua95VB4dBpieyOBe8EyRAm8UpmCrHe0xtmbYAj0%3D&reserved=0>
Owl Cyber Defense | www.owlcyberdefense.com <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyberdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cc7ae9b8da3e941536dba08dc00dd6668%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386198122346427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FCzzLQQAf%2FYro8MSq7t1gu8zNO0oG5dX0Oq%2BDAEkQM8%3D&reserved=0>
-- dfdl-wg mailing list dfdl-wg@lists.ogf.org https://lists.ogf.org/mailman/listinfo/dfdl-wg
-- dfdl-wg mailing list dfdl-wg@lists.ogf.org https://lists.ogf.org/mailman/listinfo/dfdl-wg
-- Regards Steve

Agreed that DFDL being able to have multiple forms of value validation seems like a good idea, as indicated previously. Agreed Priscilla’s work is always great. Looking at the authoritative reference for XML Schema: * W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes * W3C Recommendation 5 April 2012 * https://www.w3.org/TR/xmlschema11-2 * Appendix G. Regular expressions * https://www.w3.org/TR/xmlschema11-2/#regexs * “A ·regular expression· R is a sequence of characters that denote a set of strings L(R). When used to constrain a ·lexical space·, a regular expression R asserts that only strings in L(R) are valid ·literals· for values of that type.” For a few XML schema-capable tools, have found that either the xsd datatype expressions or the regex pattern (but not both) are supported at validation time. YMMV. * X3D 4.0 XML Schema * http://www.web3d.org/specifications/x3d-4.0.xsd * excerpt: <xs:simpleType name="SFFloat"> <xs:annotation> <xs:appinfo> <xs:attribute name="defaultValue" type="SFFloat" default="0.0"/> <!-- https://stackoverflow.com/questions/10516967/regexp-for-a-double --> <xs:pattern value="\s*([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*"/> SFFloat is a single-precision floating-point type. </xs:appinfo> <xs:documentation source="https://www.web3d.org/specifications/X3Dv4Draft/ISO-IEC19775-1v4-IS.proof/Part01/fieldsDef.html#SFFloatAndMFFloat"/> </xs:annotation> <xs:restriction base="xs:float"/> </xs:simpleType> Meanwhile, just tested with XMLSpy: defining both at once is allowed as a valid XML schema construct. <xs:restriction base="xs:float"> <xs:pattern value="\s*([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*"/> </xs:restriction> The detailed requirements for how XML Schema validation can support multiple requirements is provided in the following paragraphs. * 4.1.4 Simple Type Definition Validation Rules * https://www.w3.org/TR/xmlschema11-2/#defn-validation-rules * 4.1.5 Constraints on Simple Type Definition Schema Components * https://www.w3.org/TR/xmlschema11-2/#defn-coss all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman From: Steve Hanson <smhdfdl@gmail.com> Sent: Wednesday, December 20, 2023 8:49 AM To: mbeckerle@apache.org Cc: Brutzman, Donald (Don) (CIV) <brutzman@nps.edu>; DFDL-WG <dfdl-wg@ogf.org> Subject: Re: [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text" No, I think what you say is correct. My point is that DFDL 1.0 does not just disallow pattern facets on numbers because some elements might have binary rep, but also because even if the rep is text, the original text is not in the infoset, and therefore not available at validation time, which is when the facet is applied. On Wed, Dec 20, 2023 at 2:24 PM Mike Beckerle <mbeckerle@apache.org <mailto:mbeckerle@apache.org> > wrote: Don said "...regex within XSD Schema only applies to xs:string types." I checked P. Walmsley's book "Definitive XML Schema" (which is my Bible) and it says pattern facets are allowed on all the string, numeric, and date/time types. To be clear, this is not allowed in DFDL v1.0 (because binary numbers are not text, so pattern facets wouldn't make sense in those cases), but pattern facets are allowed in in XSD 1.0, because well, everything is text in XML. Am I misunderstanding something? On Tue, Dec 19, 2023 at 11:18 PM Brutzman, Donald (Don) (CIV) <brutzman@nps.edu <mailto:brutzman@nps.edu> > wrote: We’ve handled a really wide range of floats and integers in X3D graphics models, and have found that xsd:schema types are very useful. Unusual edge cases (for advanced error detection) can be handled with patterns (in the case of X3D, we have regex). Only limitation with this approach is that you typically have to pick one or the other, since regex within XSD Schema only applies to xs:string types. Sometimes using xs:schema as primary with separate regex evaluation is useful in a tool. You may have more flexibility about hybrid approaches in DFDL. * X3D Regular Expressions (regexes) * X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene. * https://www.web3d.org/specifications/X3dRegularExpressions.html Opinion: the worst errors are the ones that remain undetected. Season’s Greetings! 8) all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutzman@nps.edu <mailto:brutzman@nps.edu> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman From: dfdl-wg <dfdl-wg-bounces@lists.ogf.org <mailto:dfdl-wg-bounces@lists.ogf.org> > On Behalf Of Mike Beckerle Sent: Tuesday, December 19, 2023 1:56 PM To: DFDL-WG <dfdl-wg@ogf.org <mailto:dfdl-wg@ogf.org> > Subject: [DFDL-WG] Future feature? allow pattern facet on numbers when textNumberRep="standard" and representation="text" It has come up often now that DFDL cannot be strict enough about text number formats because our ICU-based textNumberPattern isn't strict enough or expressive enough of subtle syntax variations. I suggest this could be fixed by just allowing the XSD pattern facet to be used on numeric types when they are known textual and standard (not zoned). For example dfdl:textNumberPattern="00.####" will allow the number "99." to be accepted. There's currently no way to say "when it's an integer, there cannot be a decimal point". People are resistant to the notion that this requires a complex type with a bunch of different elements with different textNumberFormats so that you have an '<int>99</int>' or <dec>99.9</dec> element. They really don't want there to be different paths to this value in the infoset just because of this format issue about the decimal point. It's a painful loss of polymorphism in these path expressions. Instead of a simple path expression to obtain such a value you end up with if (fn:exists(path/int)) then path/int else path/dec Note that DFDL's expression language has no let statement, so in the above if "path" is actually "a/b/c/d/e/f/g" i.e., a typical deep path (which commonly have much longer path steps than my single-letters), then that path is going to be repeated 3 times in the expression. This is pretty unpleasant. Rather than come up with a bunch of ICU mods to tighten up all the places it is lax, and to add features for suppressed decimal points, etc. we could just allow the pattern facet on textual numbers. Then the pattern facet could be "\d\d|\d\d\.\d{1,4}" which would achieve the goal of enforcing the precise pattern desired if you validate after parsing and before unparsing. It would not prevent conversion of the text to the corresponding numeric type, but it would allow an additional tighter check on what the text was. Regular XML Schema allows the pattern facet on all the numeric types, so we would be eliminating what is currently a DFDL restriction, on condition of only when the numeric types have standard text representation. Thoughts? Mike Beckerle Apache Daffodil PMC | <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdaffodil.apache.org%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cd12604f544984d4c386508dc017ba2b9%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386880180039030%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C62000%7C%7C%7C&sdata=QsWiYfP65asrD%2FHDbsqVTw5yjA3wbaMwem98SRhMezU%3D&reserved=0> daffodil.apache.org OGF DFDL Workgroup Co-Chair | <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ogf.org%2Fogf%2Fdoku.php%2Fstandards%2Fdfdl%2Fdfdl&data=05%7C02%7Cbrutzman%40nps.edu%7Cd12604f544984d4c386508dc017ba2b9%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386880180039030%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C62000%7C%7C%7C&sdata=gHAG%2BYhA4hxXeRzk0b%2BXSmhJnSvgSMqL2jWuc6k%2BaUE%3D&reserved=0> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com <https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.owlcyberdefense.com%2F&data=05%7C02%7Cbrutzman%40nps.edu%7Cd12604f544984d4c386508dc017ba2b9%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386880180039030%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C62000%7C%7C%7C&sdata=YvQTb35VkLJ4H1u2Skr1To0ZFqf%2FlVpVlArxx6z3T5g%3D&reserved=0> -- dfdl-wg mailing list dfdl-wg@lists.ogf.org <mailto:dfdl-wg@lists.ogf.org> https://lists.ogf.org/mailman/listinfo/dfdl-wg <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ogf.org%2Fmailman%2Flistinfo%2Fdfdl-wg&data=05%7C02%7Cbrutzman%40nps.edu%7Cd12604f544984d4c386508dc017ba2b9%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386880180039030%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C62000%7C%7C%7C&sdata=AR9DlYJSR7SG75Atax5uPDGC%2F%2BHdE6Ooqz8I6p58oc0%3D&reserved=0> -- dfdl-wg mailing list dfdl-wg@lists.ogf.org <mailto:dfdl-wg@lists.ogf.org> https://lists.ogf.org/mailman/listinfo/dfdl-wg <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ogf.org%2Fmailman%2Flistinfo%2Fdfdl-wg&data=05%7C02%7Cbrutzman%40nps.edu%7Cd12604f544984d4c386508dc017ba2b9%7C6d936231a51740ea9199f7578963378e%7C0%7C0%7C638386880180039030%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C62000%7C%7C%7C&sdata=AR9DlYJSR7SG75Atax5uPDGC%2F%2BHdE6Ooqz8I6p58oc0%3D&reserved=0> -- Regards Steve
participants (3)
-
Brutzman, Donald (Don) (CIV)
-
Mike Beckerle
-
Steve Hanson