Hi guys,

For reference, and from wikipedia (search "newline"):

"The Unicode standard addresses the problem by defining a large number of characters that conforming applications should recognize as line terminators:

 LF:    Line Feed, U+000A
 CR
:    Carriage Return, U+000D
 CR
+LF: CR followed by LF, U+000D followed by U+000A
 NEL
:   Next Line, U+0085
 FF
:    Form Feed, U+000C
 LS
:    Line Separator, U+2028
 PS
:    Paragraph Separator, U+2029"

... so I guess, during parse, any of these sequences should match %NL; (maybe excluding FF and PS as being more significant than a single new line?). I agree with Mike, for unparse we'd presumably need a new property to specify this.


Again, from wikipedia, this time regarding whitespace:

"In Unicode (Unicode Character Database) the following codepoints are defined as whitespace:


....so presumably &WSP; would match any of these characters on parse. What should it generate on unparse?

Cheers,

Ian

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK



From: Alan Powell/UK/IBM@IBMGB
To: "Mike Beckerle" <mbeckerle@OCO-INC.COM>
Cc: dfdl-wg@ogf.org, DFDL-Technical-Core%IBMGB@uk.ibm.com
Date: 22/01/2008 14:41
Subject: Re: [DFDL-WG] Action 14: Propose DFDL entity scheme






Hi Mike


%NL;  is a single character <LF> on those target platforms where that is the convention or <CR><LF> on others, etc. This is intended to make it easier for the same dfdl schema to parse messages from different platforms. I know we avoided target platform in DFDL and was expecting that this would cause some debate.


This will be a good discussion for tomorrow's call


Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell@uk.ibm.com  
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898



"Mike Beckerle" <mbeckerle@OCO-INC.COM>

22/01/2008 13:29


To
Alan Powell/UK/IBM@IBMGB, <DFDL-Technical-Core%IBMGB@uk.ibm.com>, <dfdl-wg@ogf.org>
cc
Subject
RE: [DFDL-WG] Action 14: Propose DFDL entity scheme







 
Is the &NL; supposed to represent a single character? Or can it be a CRLF?

 
There’s no notion of “the target platform” in DFDL. We’ve specifically avoided this notion on purpose. So we need a separate property like newline=”&CR;&LF;” or newline=”&LF;” if we want &NL; to be meaningful, unless some other property is suitable.

 
There are some other Unicode whitespace and Unicode line-ending characters. Do we want to include those in the definitions of WSP and NL ? I recall there are 4 line-endings in Unicode.

 
…mikeb

 
From:
dfdl-wg-bounces@ogf.org [
mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Alan Powell
Sent:
Tuesday, January 22, 2008 6:32 AM
To:
DFDL-Technical-Core%IBMGB@uk.ibm.com; dfdl-wg@ogf.org
Subject:
Re: [DFDL-WG] Action 14: Propose DFDL entity scheme

 

All


Attached is the latest proposal for DFDL 'entities'


The main changes are:

- No longer using XML entities as this proved to not meet all the requirements

- New generic mnemonics for <NL> and others to represent the NL on the target platform.




Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell@uk.ibm.com  
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898






 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU










Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
http://www.ogf.org/mailman/listinfo/dfdl-wg






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU