Re: [DFDL-WG] Action 14: Propose DFDL entity scheme v5

All Latest proposal incorporating comments One question: should DFDL support standard XML entities? I have always assumed so but it is not listed in supported XML schema functions. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Ian W Parkinson/UK/IBM 22/01/2008 15:29 To Alan Powell/UK/IBM@IBMGB cc DFDL-Technical-Core@IBMGB, dfdl-wg@ogf.org, "Mike Beckerle" <mbeckerle@OCO-INC.COM> Subject Re: [DFDL-WG] Action 14: Propose DFDL entity scheme Hi guys, For reference, and from wikipedia (search "newline"): "The Unicode standard addresses the problem by defining a large number of characters that conforming applications should recognize as line terminators: LF: Line Feed, U+000A CR: Carriage Return, U+000D CR+LF: CR followed by LF, U+000D followed by U+000A NEL: Next Line, U+0085 FF: Form Feed, U+000C LS: Line Separator, U+2028 PS: Paragraph Separator, U+2029" ... so I guess, during parse, any of these sequences should match %NL; (maybe excluding FF and PS as being more significant than a single new line?). I agree with Mike, for unparse we'd presumably need a new property to specify this. Again, from wikipedia, this time regarding whitespace: "In Unicode (Unicode Character Database) the following codepoints are defined as whitespace: U0009-U000D (Control characters, containing TAB, CR and LF) U0020 SPACE U0085 NEL U00A0 NBSP U1680 OGHAM SPACE MARK U180E MONGOLIAN VOWEL SEPARATOR U2000-U200A (different sorts of spaces) U2028 LSP U2029 PSP U202F NARROW NBSP U205F MEDIUM MATHEMATICAL SPACE U3000 IDEOGRAPHIC SPACE" ....so presumably &WSP; would match any of these characters on parse. What should it generate on unparse? Cheers, Ian Ian Parkinson WebSphere ESB Development Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK From: Alan Powell/UK/IBM@IBMGB To: "Mike Beckerle" <mbeckerle@OCO-INC.COM> Cc: dfdl-wg@ogf.org, DFDL-Technical-Core%IBMGB@uk.ibm.com Date: 22/01/2008 14:41 Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme Hi Mike %NL; is a single character <LF> on those target platforms where that is the convention or <CR><LF> on others, etc. This is intended to make it easier for the same dfdl schema to parse messages from different platforms. I know we avoided target platform in DFDL and was expecting that this would cause some debate. This will be a good discussion for tomorrow's call Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 "Mike Beckerle" <mbeckerle@OCO-INC.COM> 22/01/2008 13:29 To Alan Powell/UK/IBM@IBMGB, <DFDL-Technical-Core%IBMGB@uk.ibm.com>, <dfdl-wg@ogf.org> cc Subject RE: [DFDL-WG] Action 14: Propose DFDL entity scheme Is the &NL; supposed to represent a single character? Or can it be a CRLF? There?s no notion of ?the target platform? in DFDL. We?ve specifically avoided this notion on purpose. So we need a separate property like newline=?&CR;&LF;? or newline=?&LF;? if we want &NL; to be meaningful, unless some other property is suitable. There are some other Unicode whitespace and Unicode line-ending characters. Do we want to include those in the definitions of WSP and NL ? I recall there are 4 line-endings in Unicode. ?mikeb From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Alan Powell Sent: Tuesday, January 22, 2008 6:32 AM To: DFDL-Technical-Core%IBMGB@uk.ibm.com; dfdl-wg@ogf.org Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme All Attached is the latest proposal for DFDL 'entities' The main changes are: - No longer using XML entities as this proved to not meet all the requirements - New generic mnemonics for <NL> and others to represent the NL on the target platform. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Alan Powell