
A goal/need is for our next revision of the DFDL spec to be in ISO format. Bill Ash (ISO JTC1 SC38) put me in contact with Jim Melton (ISO SQL standard) about their process which starts from an XML source to create their document. Jim has passed along a chapter of their XML, the XSLT, and other documentation about how they do it, which we can learn from. Below is the email thread with Jim Melton. I'm putting the zip-file attachment materials mentioned into our github repo. Now all we have to do is somehow convert the DFDL v1.0 MS-Word document into a marked up XML. Interestingly, practices for doing this kind of thing to large data format specifications at my employer are to first convert the MS-Word to text using PDFBox, then use DFDL to mine the text. The resulting DFDL infoset is converted to XML, then Scala or XSLT or both transformations can turn it into something maintainable. ---------- Forwarded message --------- From: Jim Melton <SheltieJim@xmission.com> Date: Thu, Jun 6, 2024 at 12:46 PM Subject: Re: Creating ISO document from XML source To: <mbeckerle@apache.org> Cc: <SheltieJim@xmission.com> Mike, Sorry I took several days to reply, but I'm leaving for a WG 3 meeting in China today and have been busy preparing for that meeting. I've attached a ZIP file to this message. In that ZIP archive, you will find some XML files and some XSL (XSLT, actually) files, as well as a DTD. These should have enough information to give you an idea of how we're using XML. The ZIP also contains some HTML files (generated from MarkDown) that serve as our documentation. The file named sql-psm.xml is the source file for one part of the SQL standard, ISO/IEC 9075-4:2023. I did not include the PDF that results from compiling sql-psm.xml, in large part because it's over 7MB. If you want that file, I can send it separately, but not for a couple of days (because I'm leaving shortly for the airport). If you want to talk about any of this, ask questions about how/why we did something, etc., I'll be back from China on June 17 or 18. Hope this helps, Jim On 2024-06-03 09:31, Mike Beckerle wrote: Jim, Sure. Please send the bundle of things you suggested would be of interest. We can start by looking at what you have done and seeing how our situation matches and differs. If it's too large for a single big zip attachment, you can split it up how you like, or we can come up with some alternative path. Thank you Mike Beckerle On Fri, May 31, 2024 at 4:11 PM Jim Melton <SheltieJim@xmission.com> wrote:
Mike,
Pleased to meet you 😉
In JTC 1/SC 32/WG 3, we never developed our standards in Word, although other WGs in SC 32 do so. ISO/IEC 9075 (Database language SQL) was originally developed in some primitive IBM markup language whose name I have forgotten. When I became editor of the standard, I converted it to an "SGML-like" product named DECdocument (I worked at Digital Equipment Corp at the time). Later, after I left Digital, I converted the DECdocument markup to XML, created an XML vocabulary (in the form of a DTD), generated XSL-FO from the XML, then found a rendering engine (XEP, by RenderX) to convert to final PDF form.
At this point (mid-2024), we have a highly functional system that generated PDF documents with massive numbers of internal and inter-document hot links, smaller "pointer documents" that allow a given document to be referenced in multiple meetings without duplicating the storage, artifacts (such as the BNF of the language), etc. We also have numerous programs doing source code validity checking (i.e., following our coding standards and ISO's Directives). I realize that not everything we have done will be relevant to other standards, but it will provide assistance in figuring out how to do what you want and may serve as a starting point for you.
I would be happy to send you a standard document in both PDF and XML versions, as well as the DTD and primary stylesheets, and a MarkDown document that explains our coding standards and more. Just let me know! And, of course, I'm happy to answer any questions you might have about all of this stuff.
Hope this helps, Jim
On 2024-05-31 09:15, Mike Beckerle wrote:
Jim,
I have heard you have an ISO-document creation system that starts from an XML source.
We have a monster MS-Word document that is unwieldy, and not useful for generating any other formal artifacts.
I hope to convert it to XML and then generate the ISO document from that XML, as well as other artifacts like online-doc for GUIs, annotated versions of the spec, etc.
Can you point me at how your team(s) work uses XML?
Any help greatly appreciated.
Mike Beckerle OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Apache Daffodil PMC | daffodil.apache.org Owl Cyber Defense | www.owlcyberdefense.com
-- Jim Melton SheltieJim at xmission dot com ------------------------------------------- Owner and Captain: Passport 40 #018 Dream SeQueL Co-Owner and Co-Captain Passport 42 #040 Turtle Blues ------------------------------------------- Shelties since 1969; ASSA member since 1992
-- Jim Melton SheltieJim at xmission dot com ------------------------------------------- Owner and Captain: Passport 40 #018 Dream SeQueL Co-Owner and Co-Captain Passport 42 #040 Turtle Blues ------------------------------------------- Shelties since 1969; ASSA member since 1992