[Parrot-users] validate and Unicode

NotFound julian.notfound at gmail.com
Thu Jul 22 16:58:05 UTC 2010


> I believe this is because "validate" isn't a valid operation on a string, so
> I'm wondering how I invoke the validate function that I find in
> parrot/src/string/charset/unicode.c? That validate:
>
> static UINTVAL validate(PARROT_INTERP, const STRING *src)
>
> appears to be registered in association with the character set for Unicode,
> but does that mean that I have to have an explicitly Unicode string in hand?
> How do I mark this string as Unicode, even if I think it might be invalid?
> The goal, here, is to provide a test in Perl which takes a string and
> returns true if it contains valid Unicode codepoints. Is there a better way
> to do that in PIR?

A Parrot string always contains codepoints valid for its charset and
encoding, except for internal usages during string building and the
like.

If you want, for example, check if a binary string contains valid utf8
you can assign it to a bytebuffer and get a utf8 string for it. That
operations throws if the content binary string can't be interpreted as
utf8.

.local pmc bb
.local string s2
bb = new 'ByteBuffer'
bb = s
s2 = bb.'get_string'('unicode', 'utf8')

-- 
Salu2


More information about the Parrot-users mailing list