Re: [glue-wg] Strings

26 Oct 2009

      On Monday 26 October 2009 15:23:06 stephen.burke@stfc.ac.uk wrote:
...
Paul Millar [mailto:paul.millar@desy.de] said:
...
I don't know the details here but I'd imagine that, if we
supported UTF-8 then
publishing arbitrary UTF-8 information would just work.
Hopefully yes, but right now the LDAP schema uses IA5, and even if we
 change glue 2 we'll probably leave glue 1 alone (?).
That sounds reasonable: we can sell Glue 2 on its i18n :)
...
Unfortunately that particular trick doesn't seem to work, it
 translates [...]
I'm guessing you're typing stuff into the command line here, right?

Also, it would be useful if you could you pipe the output through "hexdump -
C".  Diagnosing these kind of problems when some programs are "helpfully" 
mapping strings back into UTF-8 (e.g., the email program, the terminal, Perl, 
etc).

So, ü (u with dots) is Unicode 00FC, which (according to my terminal) is C3 BC 
in UTF-8.

Misinterpreting this 2-byte sequence as Latin-1 would give Ã¼.  I'm not sure 
where the upside-down question-mark ½-symbol comes from, though.
...
Wide character in print at encode.pl line 6, <> line 5.
globus-gridftp-server (PID 3522) wird ausgefï¿½hrt...
This looks like a perl problem.  Does the program know it's getting UTF-8 
input?
...
...
The output could be from some i18n software, which could be
localised to their local language.  Wouldn't this force GLUE
clients to understand all possible languages?
By clients do you mean computers or people? In many cases, including this
 one, string attributes are designed to be human-readable and I'm not sure
 we should in general be forcing everyone to use English
That's what I was wondering:  if it's for local human consumption then 
supporting l18n text is desirable.
...
... clearly if
 things are supposed to be digested by a program, e.g. the various
 enumerated lists, then you can't easily localise them without losing
 interoperability.
Yup, agreed!

Cheers,

Paul.