
Thought this might be of interest to the workgroup, or we might spawn a thread from this in the dfdl workgroup. ---------- Forwarded message --------- From: Mike Beckerle <mbeckerle@apache.org> Date: Fri, Feb 21, 2025 at 1:22 PM Subject: AI/LLMs - was: .... To: <users@daffodil.apache.org> Consistent with my experience. There's nowhere near enough DFDL materials on github or stack overflow (yet) for a LLM to really create DFDL from a spec, and I've had little luck uploading format specs and asking it questions about them. Just asking GPT4o about DFDL is quite interesting though. Here's a chat I did: https://chatgpt.com/share/67b8bf26-1814-800f-95d6-d08ac63e14b8 I actually tried to use GPT4o to help with generating a property index into the PDF so that one could quickly jump to properties. It can't even list the properties in DFDL, but asking this was very interesting though. DFDL has a property named "finalTerminatorCanBeMissing". If you read that once, grokked the concept, but couldn't remember the exact name, well "terminatorCanBeRequired" is not a bad guess, and that's what GPT4o's guess was, which shows that it does have this stuff organized somehow as concepts, not a bunch of snippets to regurgitate. When I followed up with: " I haven't heard of the dfdl:terminatorCanBeRequired before. How does that work?" it corrected its mistake, and did get the name dfdl:finalTerminatorCanBeMissing this time. So the challenge is, if you know DFDL well, you can spot its mistakes, but if you are a naive user trying to learn DFDL, it gets quite a bit right, but the mistakes it does make are perhaps problematic. I believe I can prompt GPT4.0 to create a DFDL schema for me. Since I know DFDL, this is not faster than typing for me. But if you don't know DFDL, you can ask it to do lots of things that would save time like 'move that local simple type definition to a separate file and name it "intType2"'. And it is going to do that sort of thing just fine, particularly because that is mostly just using knowledge of XSD. This is all pretty cool. On Sat, Feb 15, 2025 at 7:37 AM Claude Mamo <claude.mamo@gmail.com> wrote:
I don't know if it's possible to capture the DICOM format in DFDL
Hah, a research team generated DICOM DFDL schemas from LLMs:
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DI...
The schema from gemini particularly stands out ;)
...