FileHandle.read and multi-byte encodings

Nick Wellnhofer wellnhofer at aevum.de
Fri Jan 7 19:18:44 UTC 2011


On 07/01/11 19:47, Andrew Whitworth wrote:
> Is that how Rakudo reads from all files, by reading in binary mode
> directly into an encoding-less byte buffer?

No, Rakudo also uses readline with utf8 and other encodings.

> I'm strongly in favor of some kind of read() method that respects
> encodings. If we can do it in-place I would prefer that, but adding a
> new method as necessary would be an acceptable second option.

read() does respect encodings. It's only hard to do when it's given byte 
counts and it's expected not to fail if it encounters partial multi-byte 
chars.

If users like Rakudo want to read binary and string data from the same 
handle, they could temporarily switch encodings. But it'd be much nicer 
to have a 'read_bytes' method, especially if it can directly return a 
ByteBuffer.

Nick


More information about the parrot-dev mailing list