Access the internal pointer to parrot strings

Bart Wiegmans bartwiegmans at gmail.com
Mon Jun 4 20:24:31 UTC 2012


Hi everybody,

I want to access the pointer to the internal buffer of a parrot
string, so I can pass it to relevant IO functions outside of parrot.
(The 'relevant IO function' in question is ap_rwrite, which takes a
void pointer, a size_t of bytes, and a pointer to a request_rec
structure. I wanted to paste the link to its definition but it is
frightfully long. Its signature is size_t (void*, size_t, request_rec
*)).

My reason for wanting this is that I want to minimise the amount of
copying and string scanning in general. Consider what happens from the
inside of mod_parrot to apache:

* A script calls say("Hello world"), because it likes newlines
* This string is copied (or not, I'm not sure how stringhandle works)
into the stringhandle buffer
* Then, the script ends, and I read the stringhandle for its contents,
which are copied (or not, again no idea how this works) into a
Parrot_String),
* And I export this Parrot_String to ascii, also adding a copy and zero.
* And then finally I give it to ap_rwrite, which copies the string
again into a buffer of its own.
* If I were to use ap_rputs, which is easier for me to use, apache
needs to scan the /entire/ string for the zero to know how many bytes
to send.

I count at least 3 and at most 5 copies, never minding the useless
adding of a zero.
Only one is ever needed, and that is from parrot to the apache buffer,
which may or may not directly copy it to the kernel buffer. I do not
know nor care much about that, but I believe the developers of Apache
though Really Hard about that function and that It Works.

Note that, as this is HTTP, and I can (and should) send the correct
character encoding alongside the data, there is really not even much
reason to worry about 'converting' to the correct format. Nor do I
need a zero-terminated string, because I know the length. In short,
the raw pointer to the buffer is perfect for my needs.

Now, I see the following possible issues. First of all, I don't
suppose that a string buffer, once allocated, will remain there
forever when the pointer is exported. The pointer is not really 'safe'
to use outside of a very limited scope, because it may be garbage
collected (or concurrently accessed, although that might not be so
serious. I don't know.). In my example it might be safe to use
because, while calling, it will never go out of scope and the copying
will be safe. Unless ap_rwrite is implemented asynchronously, or the
garbage collector is of the concurrently copying type.

The next issue is that it is not guaranteed - nor should it be - that
the buffer containing the string is contiguous. It might be now, but
there are reasons for it not to be. If it isn't, get_pointer might
have to do compaction, and that might be problematic in itself because
the buffer might be used by another string (e.g. a substring).

Anyway, I was wondering what your thoughts where, before I go off to a
wild goose chase and implement something that might be a really bad
idea.  Especially people who are rewriting the IO subsystem might
chime in with good advice :-).

Regards,
Bart Wiegmans


More information about the parrot-dev mailing list