unicode question for tcl

Will Coleda will at coleda.com
Tue Jun 9 03:54:11 UTC 2009


The tcl spec test "string.test" fails with:

  Invalid character for UTF-8 encoding

This is due to the following tcl test:

test string-24.5 {string reverse command - shared unicode string} {
    set x abcde\udead
    string reverse $x
} \udeadedcba

I can't see a way to generate this string (\udead) in parrot, with
various failures on utf16, utf8, and ucs2.

According to the tcl spec:

\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) give a
sixteen-bit hexadecimal value for the Unicode character that will be
inserted.

parrot mentions \u in docs/book/ch03_pir.pod and
docs/pdds/pdd19_pir.pod, which I assume should correspond directly to
the \u used in tclsh.

.sub main :main
  $S1 = utf16:unicode:"abcd\udead" # Illegal escape sequence in uxxx
escape - too short
  $S1 = utf8:unicode:"abcd\udead"  # Malformed UTF-8 string
  $S1 = ucs2:unicode:"abcd\udead"  # ucs2 unimpl
.end

using \Ud800dead instead of \udead fails in the same way for each encoding.

Thanks in advance for any help.

-- 
Will "Coke" Coleda


More information about the parrot-dev mailing list