
On Monday 26 October 2009 15:23:06 stephen.burke@stfc.ac.uk wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
I don't know the details here but I'd imagine that, if we supported UTF-8 then publishing arbitrary UTF-8 information would just work.
Hopefully yes, but right now the LDAP schema uses IA5, and even if we change glue 2 we'll probably leave glue 1 alone (?).
That sounds reasonable: we can sell Glue 2 on its i18n :)
Unfortunately that particular trick doesn't seem to work, it translates [...]
I'm guessing you're typing stuff into the command line here, right? Also, it would be useful if you could you pipe the output through "hexdump - C". Diagnosing these kind of problems when some programs are "helpfully" mapping strings back into UTF-8 (e.g., the email program, the terminal, Perl, etc). So, ü (u with dots) is Unicode 00FC, which (according to my terminal) is C3 BC in UTF-8. Misinterpreting this 2-byte sequence as Latin-1 would give ü. I'm not sure where the upside-down question-mark ½-symbol comes from, though.
Wide character in print at encode.pl line 6, <> line 5. globus-gridftp-server (PID 3522) wird ausgef�hrt...
This looks like a perl problem. Does the program know it's getting UTF-8 input?
The output could be from some i18n software, which could be localised to their local language. Wouldn't this force GLUE clients to understand all possible languages?
By clients do you mean computers or people? In many cases, including this one, string attributes are designed to be human-readable and I'm not sure we should in general be forcing everyone to use English
That's what I was wondering: if it's for local human consumption then supporting l18n text is desirable.
... clearly if things are supposed to be digested by a program, e.g. the various enumerated lists, then you can't easily localise them without losing interoperability.
Yup, agreed! Cheers, Paul.