All
Latest proposal incorporating comments
One question: should DFDL support standard
XML entities? I have always assumed so but it is not listed in supported
XML schema functions.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Ian W Parkinson/UK/IBM
22/01/2008 15:29
|
To
| Alan Powell/UK/IBM@IBMGB
|
cc
| DFDL-Technical-Core@IBMGB, dfdl-wg@ogf.org,
"Mike Beckerle" <mbeckerle@OCO-INC.COM>
|
Subject
| Re: [DFDL-WG] Action 14: Propose DFDL
entity schemeLink |
|
Hi guys,
For reference, and from wikipedia (search
"newline"):
"The Unicode
standard addresses the problem by defining a large number of characters
that conforming applications should recognize as line terminators:
LF:
Line Feed, U+000A
CR:
Carriage
Return, U+000D
CR+LF:
CR followed by LF,
U+000D followed by U+000A
NEL:
Next Line, U+0085
FF:
Form Feed, U+000C
LS:
Line Separator, U+2028
PS:
Paragraph Separator, U+2029"
... so I guess, during parse, any of
these sequences should match %NL; (maybe excluding FF and PS as being more
significant than a single new line?). I agree with Mike, for unparse we'd
presumably need a new property to specify this.
Again, from wikipedia, this time regarding
whitespace:
"In Unicode
(Unicode Character Database) the following codepoints are defined as whitespace:
- U0009-U000D (Control characters, containing TAB, CR
and LF)
- U0020 SPACE
- U0085 NEL
- U00A0 NBSP
- U1680 OGHAM SPACE MARK
- U180E MONGOLIAN VOWEL SEPARATOR
- U2000-U200A (different sorts of spaces)
- U2028 LSP
- U2029 PSP
- U202F NARROW NBSP
- U205F MEDIUM MATHEMATICAL SPACE
- U3000 IDEOGRAPHIC SPACE"
....so presumably &WSP; would match
any of these characters on parse. What should it generate on unparse?
Cheers,
Ian
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
From:
| Alan Powell/UK/IBM@IBMGB
|
To:
| "Mike Beckerle" <mbeckerle@OCO-INC.COM>
|
Cc:
| dfdl-wg@ogf.org, DFDL-Technical-Core%IBMGB@uk.ibm.com
|
Date:
| 22/01/2008 14:41
|
Subject:
| Re: [DFDL-WG] Action 14: Propose DFDL
entity scheme |
Hi Mike
%NL; is a single character <LF> on those target platforms where
that is the convention or <CR><LF> on others, etc. This is
intended to make it easier for the same dfdl schema to parse messages from
different platforms. I know we avoided target platform in DFDL and was
expecting that this would cause some debate.
This will be a good discussion for tomorrow's call
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
"Mike Beckerle"
<mbeckerle@OCO-INC.COM>
22/01/2008 13:29
|
To
| Alan Powell/UK/IBM@IBMGB,
<DFDL-Technical-Core%IBMGB@uk.ibm.com>, <dfdl-wg@ogf.org>
|
cc
|
|
Subject
| RE: [DFDL-WG] Action 14: Propose DFDL
entity scheme |
|
Is the &NL; supposed to represent a single character? Or can it be
a CRLF?
There’s no notion of “the target platform” in DFDL. We’ve specifically
avoided this notion on purpose. So we need a separate property like newline=”&CR;&LF;”
or newline=”&LF;” if we want &NL; to be meaningful, unless some
other property is suitable.
There are some other Unicode whitespace and Unicode line-ending characters.
Do we want to include those in the definitions of WSP and NL ? I recall
there are 4 line-endings in Unicode.
…mikeb
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On
Behalf Of Alan Powell
Sent: Tuesday, January 22, 2008 6:32 AM
To: DFDL-Technical-Core%IBMGB@uk.ibm.com; dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme
All
Attached is the latest proposal for DFDL 'entities'
The main changes are:
- No longer using XML entities as this proved to not meet all the requirements
- New generic mnemonics for <NL> and others to represent the NL on
the target platform.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU