July 2008 - dfdl-wg - lists.ogf.org

DFDL: Minutes from OGF WG call, 22 July 2008
by Alan Powell 23 Jul '08

23 Jul '08

Open Grid Forum: Data Format Description Language Working Group Weekly Working Group Conference Call 13:00 GMT, 22 July 2008 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) 1. Update to Decimal supplement Discussed MB comments on SH decimal supplement document. Agreed - added zeroSign to packedDecimalSignCodes which allows zero to be represented as all zero nibble - Move numberCheckPoicy out of dfdl:DefineNumberFormat (renamed to dfdl:DefineTextNumberFormat) so it can apply to binary representation. - add numberZeroRep to provide a special value for zero. er 'zero' 2. Discuss lengthKind issues Discussed lengthKind/scanability email distributed by AP Need to add to table truncation, fill for binary, more logical/physical type combinations such as logical number with text representation. lengthKind=Implicit no pad/ fill for sequences Long discussion on what the implicit length should be on unparse for various logical/physical type combinations. Decimal/integer with text representations look a problem. Action: SH to propose algorithm For number/text decided that min/maxInclusive would only be used for validation and not for any physical attributes. Action: Check all simple types vs representation and investigate if numberFormat can be used. . LengthKind=explicit Decided for choice dfdl=length will be used, choiceKind is only used for lengthKind=implicit Discussion of truncation is logical length is too big. Decided that truncation is only valid for fixed length strings and we need a new property to control whether it is valid or not. Action: propose new fixed length string truncation property. For text numbers if numberpattern produces a text string that is too long at runtime then it is a runtime error. lengthKind=prefixed Decided that there is no padding for variable length lengthKinds Will carry on discussion on next call 3. Hidden proposal Not discussed 4. Unresolved WTX issues Not discussed 5. AOB Next weeks (30 July) call will start at 13:00 UK for 2 hours in stead of an additional call. Meeting closed, 14:30 GMT Actions raised at this meeting No Action 009 SH: propose algorithm unparse length lengthKind=implicit decimal type/binary representation 010 AP: Document Fixed length string truncation property 011 SH: Update decimal supplement Current Actions: No Action 003 AP: Update spec from WTX document 16/7: in progress 004 MB: Mike will also author a new section in response to comments from Sandy Gao and Suman Kalia, explicitly connecting syntax with DFDL semantics 16/7: no progress 006 SH: Distribute hidden proposal 16/7: Done. MB to review 007 AP: enum + expression wording 16/7: no progress 033 Work items: No Item 001 String XML type (Ian P) - Apr 30, 2008 002 Escape schemes (Ian P) - Apr 30, 2008 003 Variables - ??, 2008 (Mike) 004 Selectors (Suman) - Apr 30, 2008 005 Improvements on property descriptions - ??, 2008 (All - split TBD) 006 Envelopes and Payloads (Steve) - Apr 30, 2008 007 (from draft 32) valueCalc (Mike) - ??, 2008 *Mostly complete* 008 (from draft 32) Property precedence for writing (Steve) - *complete but under review* 009 (from draft 32) Variable markup (Steve) - Mar 31, 2008 *proposal needs writing up* 010 (from draft 32) Assertions, discriminators and choice, including discussion of timing option (Suman) - Mar 31, 2008 * in progress * 011 (from draft 32) How speculative parsing works (combining choice and variable-occurence - currently these are separate) ??, 2008 (IBM) * in progress * 012 (from draft 32) Reordering the properties discussion: move representation earlier, improve flow of topics ??, 2008 (Alan) * not started * 013 (from F2F) New scoping rules 014 (from F2F) Occurs, OccurSeparator changes 015 (from F2F) choices and Output (Mike) 016 (from F2F) xpath forward references (Alan) 017 (IBM WTX review) Minor agreed updates (Alan) 018 (IBM WTX review) Review generateNewLine (Alan) 020 (IBM WTX review) Special value for zero seen eg 'zero'. (Steve) 021 (IBM WTX review) 'EndOfData' changes (Alan) 022 (IBM WTX review) Unresolvable choices - infoset changes 023 (IBM WTX review) separatorKind=?prefix? ?infix? or ?postfix? (Alan) 024 (IBM WTX review) StopValue clarification (Alan) 025 Augmented infoset and unparsing (Alan) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Resolution of the remaining WTX spec issues
by Alan Powell 21 Jul '08

21 Jul '08

Steve and I went through the WTX issues document and decided which issues and been resolved and which require further discussion. I have updated the latest version (033) of the DFDL specification for (most of) the resolved issues. We should discuss the remaining unresolved issues on the next DFDL_WG call. I have added a table to make it easier to track the unresolved issues Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

DFDL: Minutes from OGF WG call, 16 July 2008
by Alan Powell 16 Jul '08

16 Jul '08

Open Grid Forum: Data Format Description Language Working Group Weekly Working Group Conference Call 14:00 GMT, 16 July 2008 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) 1. Go through actions status updated below 003: AP is updating spec from WTX review and will produce a list of remaining unresolved items 007: Discussion about how expressions can be allowed in enumerated fields such as encoding. Simplest would be to convert enum properties to string but that wouldn't allow schema validation to catch errors. 2. Version 003 work items The list of work items had been distributed and was discussed. Items requiring discussion will be converted to actions. Work items will also be tracked in this meeting. 3. Update to Decimal supplement SH had distributed an updated proposal which was discussed. General agreement on the changes with the main point of discussion being the use of numberCheckPolicy to control the behaviour of the new numberZeroRep property. MB to review 4. Discuss lengthKind issues identified by WTX Not discussed 5. Hidden proposal Not discussed 6. AOB Agreed an additional weekly WG call is needed. MB to set up Meeting closed, 15:00 GMT Actions raised at this meeting No Action 008 MB: set up additional WG call Current Actions: No Action 001 MB: update outputvalueCalc doc Done 002 AP: Send out updated work item list 16/7: done 003 AP: Update spec from WTX document 16/7: in progress 004 MB: Mike will also author a new section in response to comments from Sandy Gao and Suman Kalia, explicitly connecting syntax with DFDL semantics 16/7: no progress 005 AP: update spec with unparsing and augmented infoset. 16/7: no progress. Converted to work item 006 SH: Distribute hidden proposal 16/7: Done. MB to review 007 AP: enum + expression wording 16/7: no progress 033 Work items: No Item 001 String XML type (Ian P) - Apr 30, 2008 002 Escape schemes (Ian P) - Apr 30, 2008 003 Variables - ??, 2008 (Mike) 004 Selectors (Suman) - Apr 30, 2008 005 Improvements on property descriptions - ??, 2008 (All - split TBD) 006 Envelopes and Payloads (Steve) - Apr 30, 2008 007 (from draft 32) valueCalc (Mike) - ??, 2008 *Mostly complete* 008 (from draft 32) Property precedence for writing (Steve) - *complete but under review* 009 (from draft 32) Variable markup (Steve) - Mar 31, 2008 *proposal needs writing up* 010 (from draft 32) Assertions, discriminators and choice, including discussion of timing option (Suman) - Mar 31, 2008 * in progress * 011 (from draft 32) How speculative parsing works (combining choice and variable-occurence - currently these are separate) ??, 2008 (IBM) * in progress * 012 (from draft 32) Reordering the properties discussion: move representation earlier, improve flow of topics ??, 2008 (Alan) * not started * 013 (from F2F) New scoping rules 014 (from F2F) Occurs, OccurSeparator changes 015 (from F2F) choices and Output (Mike) 016 (from F2F) xpath forward references (Alan) 017 (IBM WTX review) Minor agreed updates (Alan) 018 (IBM WTX review) Review generateNewLine (Alan) 020 (IBM WTX review) Special value for zero seen eg 'zero'. (Steve) 021 (IBM WTX review) 'EndOfData' changes (Alan) 022 (IBM WTX review) Unresolvable choices - infoset changes 023 (IBM WTX review) separatorKind=?prefix? ?infix? or ?postfix? (Alan) 024 (IBM WTX review) StopValue clarification (Alan) 025 Augmented infoset and unparsing (Alan) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

DFDL Decimal - final proposal
by Steve Hanson 16 Jul '08

16 Jul '08

Here's the revised decimal supplement again for final approval. Please can we discuss on the call today for inclusion in draft 33. This has been updated to reflect the debate below around properties dfdl:decimalFormat and dfdl:integerFormat (because either could be used with xs:int and xs:decimal, and at runtime the parser does not know which one to apply). So dfdl:decimalFormat has been removed, and replaced by dfdl:numberFormat - defined below. Property Name Description numberFormat String Valid values are ?text?, ?zoned?, ?packed?, ?BCD?, 'twosComplement' When the representation is ?text? then the allowable values are ?text? and ?zoned?. When the representation is ?binary? then the allowable values are ?packed?, ?BCD? and 'twosComplement'. I'd also like to propose that we rename dfdl:defineNumberFormat to dfdl:defineTextNumberFormat, to prevent confusion. The other change is around the packed decimal convention, sometimes used, that zero is indicated by all bytes being hex zero, even though this is not technically a valid packed decimal number. I had said that on parsing, whether to tolerate this is governed by the numberCheckPolicy property, and on unparsing, this convention is not used. That won't work because we are talking about (binary) packed decimals and numberCheckPolicy is a property within (text) dfdl:defineNumberFormat. One solution is to move numberCheckPolicy outside of dfdl:defineNumberFormat and have it apply to both text and binary numbers. However it can be observed that numberCheckPolicy is getting rather bloated and is covering several behaviours. There's yet another behaviour that could be added - the TX team review want a dfdl:defineNumberFormat property called numberZeroRep to handle special zero representations. That's fine - but on parsing whether to allow just the zero rep or both the rep and '0' is a requirement from TX - which we could accomodate by extensing numberCheckPolicy. Question is, are we overloading numberCheckPolicy, or is it time to make it more granular? Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 16/07/2008 12:15 ----- Steve Hanson/UK/IBM 09/04/2008 15:44 To <mbeckerle.dfdl(a)gmail.com> cc dfdl-wg(a)ogf.org Subject RE: Fw: DFDL Decimal - proposal - correcting wrong attachment Hi Mike - answers in-line below. Regards, Steve Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 "Mike Beckerle" <mbeckerle.dfdl(a)gmail.com> 09/04/2008 15:05 Please respond to <mbeckerle.dfdl(a)gmail.com> To Steve Hanson/UK/IBM@IBMGB cc <dfdl-wg(a)ogf.org> Subject RE: Fw: DFDL Decimal - proposal - correcting wrong attachment Thanks for these clarifications. Do we have a way to represent ?unpacked? decimal numbers. This is like zoned, except the ?zones? are zero instead of ?F? (in ebcdic encodings). <smh>No we don't. Neither MRM nor TX support that. Have you seen such an example? Is it encoding sensitive? Also, can a BCD number have a sign? <smh>What we are calling a BCD can not have a sign, as far as I know. That's where packed decimal comes in. ?mikeb From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Wednesday, April 09, 2008 10:00 AM To: mbeckerle.dfdl(a)gmail.com Cc: 'Mike Beckerle'; Alan Powell; Ian W Parkinson Subject: RE: Fw: DFDL Decimal - proposal - correcting wrong attachment Hi Mike - answers in-line below. Regards, Steve Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 "Mike Beckerle" <mbeckerle.dfdl(a)gmail.com> 09/04/2008 01:43 Please respond to <mbeckerle.dfdl(a)gmail.com> To Steve Hanson/UK/IBM@IBMGB, Alan Powell/UK/IBM@IBMGB cc Ian W Parkinson/UK/IBM@IBMGB, 'Mike Beckerle' Subject RE: Fw: DFDL Decimal - proposal - correcting wrong attachment I prefer one property dfdl:numberFormat, the valid values of which depend on dfdl:representation <smh>The advantage of two properties is that you can set scoping for text and binary numbers separately. I like the analysis that text formats are ones which depend on encoding, and not byteOrder, and binary depend on byte order, and NOT encoding. <smh>Me too. There?s also format specifiers for floating point. Should those also go on here, be allowed only for representation=?binary?? <smh>I did think about this, but I think we are better off keeping floats separate. Otherwise people might think you can declare a logical float to be rep'd by physical integer. MRM allows this, and I wish it didn't. It also exacerbates the problem noted above - I couldn't set a default float format, which is something that would almost certainly never vary within a data stream. The rest of the proposal looks fine. I found decimalVirtualPoint an odd name, but it is clear and obeys the conventions. <smh>I agree it's a bit odd. An alternative is 'decimalimpliedPlaces' which uses TX terminology - but that doesn't match the 'V' pattern character we are proposing in the ICU pattern (which matches COBOL) I was a bit unclear on how do you represent an unsigned packed decimal. This is common. There is no sign nibble at all. It lets you do an even number of digits. MMDDYY is commonly this, 3 unsigned packed numbers. <smh>What you have described is dfdl:numberFormat="BCD". An unsigned packed decimal is dfdl:numberFormat="packed" with the sign nibble always unsigned, so dfdl:packedDecimalSignCodes="F F F". ?mikeb From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Wednesday, April 02, 2008 11:54 AM To: Alan Powell Cc: Ian W Parkinson; Mike Beckerle Subject: Re: Fw: DFDL Decimal - proposal - correcting wrong attachment Alan, Ian and myself reviewed this today. The main issue was that the loss of dfdl:representation="decimal" means that it is no longer clear when to use dfdl:integerFormat and dfdl:decimalFormat, because an xs:decimal can have a binary integer rep and an xs:int can have a binary decimal rep. It was noted that both IBM models (MRM and TX type tree) handle this by having a single property. I don't want to re-introduce rep=decimal, I think we shoiuld stick with text (implying encoding sensitive) and binary (potentially byte order sensitive). Options: a) One property dfdl:numberFormat with values "text", "zoned", "packed", "BCD", "twosComplement", "onesComplement", "signMagnitude". - "text" and "zoned" when dfdl:representation="text" - "packed", "BCD", "twosComplement", "onesComplement", "signMagnitude" when dfdl:representation="binary" Number xs:int, xs:decimal text => numberFormat xs:float, xs:double text => xs:int, xs:decimal binary => numberFormat xs:float binary => floatFormat b) Two properties dfdl:textNumberFormat and dfdl:binaryNumberFormat, allowable enums split as above. - this means the existing dfdl:textNumberFormat property gets renamed to dfdl:textNumberPattern or dfdl:textNumberScheme Number xs:int, xs:decimal text => textNumberFormat xs:float, xs:double text => xs:int, xs:decimal binary => binaryNumberFormat xs:float binary => floatFormat Other suggestions? Regards, Steve Steve Hanson WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 Alan Powell/UK/IBM 28/03/2008 16:45 To Steve Hanson/UK/IBM@IBMGB cc Ian W Parkinson/UK/IBM@IBMGB, mbeckerle(a)oco-inc.com Subject Re: Fw: DFDL Decimal - proposal - correcting wrong attachmentLink Steve Technically seems OK. Need quite a bit of editorial work before it can be included in the spec which I have started. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: Steve Hanson/UK/IBM To: mbeckerle(a)oco-inc.com Cc: Alan Powell/UK/IBM, Ian W Parkinson/UK/IBM Date: 28/03/2008 13:59 Subject: Fw: DFDL Decimal - proposal - correcting wrong attachment Here's an attempt at a revised decimal supplement, that takes into account the stuff in my mail below. [attachment "ggf-dfdl-supplement-advanced-decimal-properties-v1.0-003.doc" deleted by Alan Powell/UK/IBM] Some discussion points: 1) I've removed the representation 'Decimal' - a decimal is either 'Text' or 'Binary'. Property decimalFormat says whether it is text or zoned (for text) or packed or BCD (for binary). 2) There's no need for a decimalSigned property, as zoned uses numberPattern for this, BCD is always unsigned, and packed indicates this via sign code 3) I've added VDP property for BCD and packed - zoned uses numberPattern for this. However, VDP property is also needed for binary integers - this is missing from spec. COBOL PIC 99V99 COMP will create an xs:decimal with binary integer rep, so we need to support this. I suggest we have a single VDP property that applies to all binary reps that can be used to represent xs:decimal. So my VDP property gets removed to main spec. 4) The resultant properties are less than before. I'm not sure that a separate supplement is justified. 5) I would like to remove numberCheckPolicy from dfdl:DefineNumberFormat, and make it a separate property. Two reasons: - I think the decision to use strict/lax checking is not an attribute of the number format but more an attribute of the schema as a whole. - It means we can control packed decimal sign nibble oddities with the same property as other strict/lax number checking, Let's review on next OGF WG call. Regards, Steve Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 28/03/2008 12:33 ----- Steve Hanson/UK/IBM 27/03/2008 15:29 To Mike Beckerle (Work) cc Subject DFDL Decimal - proposal Hi Mike I've finally got round to looking at the decimal supplement, and I'd like to get your opinion on something. The WTX team have been reviewing draft 031 and had the following observation (actually they had quite a few good ones, and when they've finished we need to discuss them all on a OGF WG call). "13.3. Is a zoned decimal textual or non-textual? If all overpunched variants result in well-known characters then the data is scannable and therefore more like a textual field." It turns out that the type hierarchy in TX for decimal looks like below. They consider Zoned as text as it always consists of reasonable characters and is subject to encoding conversion, padding, justification, etc. There's a lot of appeal in that. It's always bothered me a bit that MRM viewed it as a binary type. Number -> Character -> Decimal (meaning text decimal) Integer (meaning text integer) Zoned -> Binary -> Integer (meaning binary integer) Float Packed BCD Also, their Zoned does not have separate sign option. They point out that a separate signed Zoned is just a Text decimal. And they are correct. We got the separate sign thing from MRM, which after some digging turns out to have got it from the CAM Type Descriptor model, which had no other way of representing a text decimal number with a separate sign. As part of my rework of the decimal supplement, I'd like to take both these into account. The implications are: - Zoned => overpunched only - Zoned decimal can pick up on the textNumberxxx properties, including textNumberFormat => use the numberPattern (ie, ICU pattern) property to say which end the (overpunched) sign goes => can get away without a separate pattern language for binary decimals, which as you point out has endian-ness issues - Binary decimals are packed and BCD - There are a lot fewer properties for decimals - dfdl:representation = "text" can have subdivisions - that's not occurred until now (we could think about making dfdl:representation = "xml" a subdivision of "text"?) If you think there is merit in this approach then let me know by return and I'll see if I can write something up tomorrow. I'm WAH on +44-1794-340899 if you want to discuss. Your "crazy idea" below is interesting - but I think is a tooling thought rather than a core spec thing. (Sorry about call yesterday - I thought I mailed something out a couple of calls ago about DST mismatch, but perhaps I didn't). Regards, Steve Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh(a)uk.ibm.com Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 27/03/2008 15:04 ----- Mike Beckerle/Worcester/IBM@IBMUS 21/11/2007 15:26 To Steve Hanson/UK/IBM@IBMGB cc DFDL-Technical-Core, Suman Kalia/Toronto/IBM@IBMCA Subject DFDL Decimal - was Re: DFDL & length prefixes - proposalLink I think decimal has signed and unsigned variants based on dfdl:decimalSigned boolean. If this is false then it's unsigned and packedUnsignedRep specifies the sign nibble used for unsigned. The doc doesn't specify that one can say "" for this indicating no sign nibble at all. I've been rereading the decimal properties supplement and starting v002 of it based on changes to dfdl:representation in the core spec. This needs a general clean up. There's errors here in that there is a decimalType="zoned", or "packed" or "BCD" and also a bcdIsPacked, and bcdUnpackedRep="ebcdic", which is the same as zoned I think. We need there to be one way to express these things. Right now the bias is a set of orthogonal flags: signed or unsigned, what's the sign nibble for unsigned, what sign nibbles for signed, packed or unpacked, what's in the zones - the unused nibbles - (ebcdic, i.e., "F", ascii, i.e., "3", or zero - but that's not enough as I've seen data with "2" in the zones - some non IBM cobol compiler does this.). A better choice may be to specify decimalType as a larger enum which includes most of these properties, so that we don't end up with too much ability to express variants that have simply never existed. A list of the use cases needs to be added to the doc also. Here's a few: -1234 as expressed as bytes in hex in increasing position order, i.e., LSB first. packed ibm, signed, D01234 zoned ibm, overpunched leading sign D1F2F3F4 (are signs usually leading or trailing.... I think trailing actually.) big endian zoned ascii, ascii-translated overpunched leading sign 4A323334 (yuck - so much for treating decimal as "binary" data). Here's a crazy idea: I believe there is a set of magic numbers which if you give me their translations in bytes, I can determine exactly what the encoding properties are. E.g., if you give me the bytes for +0000, -1234, +789 I believe I can determine all of the properties. This might be a better way to specify decimal formats. I.e., give me those byte patterns expressed as hex, and I reverse engineer all the property settings. e.g., decimalFormat="+0000=C00000-1234=D01234 +789=C789" (signed, packed, leading sign, padded to even number of nibbles, big endian, zero carries a sign, "C" is plus, "D" is minus) or decimalFormat="+0000=00000000 -1234=D1F2F3F4 +789=C7F8F9" (ebcdic zoned, leading overpunched sign, big endian, zero is allowed to have zero as sign and all zero bytes, "C" is plus, "D" is minus) This may make more sense for the tooling than the DFDL language though. I.e., point it at some data and it tries to guess these properties. Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Platform and Solutions Westborough, MA 01581 direct: voice and FAX 508-599-7148 assistant: Pam Riordan priordan(a)us.ibm.com 508-599-7046 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 2

Info on BEAs Format Builder - Java GUI to convert binary to/from XML
by RPost 10 Jul '08

10 Jul '08

Hi, Open mouth - insert foot. At my current contract site I happened to mention to some people that BEA has a Format Builder app that can convert binary to/from XML. Seems they had a team of people halfway through a project writing custom Java code to read Mainframe files (mostly copybook) and load the data into Oracle but hadn't known that the BEA Integration suite they had been using for years already had that functionality. Reminds me of my favorite definition: Contractor - someone who borrows your watch to tell you what time it is. I had assumed you all were aware of BEAs Format Builder but figured I would pass the info on anyway. 1. From the BEA docs page: http://e-docs.bea.com/ 2. Select BEA WebLogic Integration under the BEA WebLogic section (http://e-docs.bea.com/wli/docs102/index.html) 3. Select 'Using Format Builder' under the Data Transformation section (http://e-docs.bea.com/wli/docs102/fbhelp/index.html) You can select 'view as PDF' to save a local copy. There is also a tutorial on creating a WebLogic Integration Process to extract mainframe data (cobol copybook) and inserting the data into an Oracle database: http://www.dev2dev.co.kr/pub/a/2004/06/Bukhari_WLIProcess.jsp If you download the author's files (http://ftpna2.bea.com/pub/downloads/WLIProcess_Bukhari.zip) the zip file will have the copybook format (employee.cpy) and the BEA MFL (message format language) file (Employee.mfl) which shows how the cobol format was mapped to the schema file. The zip also contains various *.java files that show the bean files that were created. The 'Mainframe Data Extraction' doc in the zip file contains the tutorial text and diagrams as a Word document. The Format Builder GUI lets you build the XML tree graphically and can launch a test GUI that lets you test the conversion in both directions. The BEA Integration Suite is available for free evaluation (as is all of Oracle's products) on the Oracle edelivery site at http://edelivery.oracle.com/ if anyone is interested in trying the tool. I can also email some screenshots if there is any interest. Rick

2 1

DFDL Spec 033 Completeion Items
by Alan Powell 01 Jul '08

01 Jul '08

Below is the list of work items for the next version (033) of the DFDL specification Draft 32: Published 25/5/2008 valueCalc (Mike) - ??, 2008 *Mostly complete* Remaining aspects of null/default/optionals (Alan) - Mar 31, 2008 *complete* 2-level description of schema components, including UML (Simon) - *complete* Property precedence for writing (Steve) - *complete but under review* Variable markup (Steve) - Mar 31, 2008 *porposal needs writing up* Regular expressions for lengths - Mar 31, 2008 (Alan) *complete* Bring supplements up-to-date (Steve) - Mar 31, 2008 *complete* Assertions, discriminators and choice, including discussion of timing option (Suman) - Mar 31, 2008 * in progress * How speculative parsing works (combining choice and variable-occurence - currently these are separate) ??, 2008 (IBM) * in progress * Reordering the properties discussion: move representation earlier, improve flow of topics ??, 2008 (Alan) * not started * Draft 33: Escape schemes (Ian P) - Apr 30, 2008 String XML type (Ian P) - Apr 30, 2008 Variables - ??, 2008 (Mike) Selectors (Suman) - Apr 30, 2008 Improvements on property descriptions - ??, 2008 (All - split TBD) Envelopes and Payloads (Steve) - Apr 30, 2008 (from draft 32) valueCalc (Mike) - ??, 2008 *Mostly complete* (from draft 32) Property precedence for writing (Steve) - *complete but under review* (from draft 32) Variable markup (Steve) - Mar 31, 2008 *porposal needs writing up* (from draft 32) Assertions, discriminators and choice, including discussion of timing option (Suman) - Mar 31, 2008 * in progress * (from draft 32) How speculative parsing works (combining choice and variable-occurence - currently these are separate) ??, 2008 (IBM) * in progress * (from draft 32) Reordering the properties discussion: move representation earlier, improve flow of topics ??, 2008 (Alan) * not started * (from F2F) New scoping rules (from F2F) Occurs, OccurSeparator changes (from F2F) choices and Output (from F2F) xpath forward references (IBM WTX review) Minor agreed updates (IBM WTX review) Review generateNewLine (IBM WTX review) Special value for zero seen eg 'zero'. (IBM WTX review) 'EndOfData' changes (IBM WTX review) Unresolvable choices - infoset changes (IBM WTX review) separatorKind=?prefix? ?infix? or ?postfix? (IBM WTX review) StopValue clarification Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0