embed_api discussion at PDS
Andrew Whitworth
wknight8111 at gmail.com
Fri Dec 3 16:15:36 UTC 2010
Attached is a write-up of the status and design ideas in the new
embed_api branch that we've been working on. I would like to talk
about this branch work at PDS, and would like to both solicit feedback
and get new developers interested in helping with the effort.
The new Embedding API that we have written is a subset of what it
really should be. We've basically written only enough so that the
Parrot executable and some of the other utility programs (pbc_merge,
pbc_disassemble, pbc_to_exe) can use it exclusively. There is more to
write, which can be done up-front or on a per-request basis from new
embedding applications.
The branch is currently failing a few tests, especially those
involving the formatting of error messages. The embed_api branch is
going to fix all these test failures before merging, so that we can
avoid a deprecation boundary, so many of those tests really become
moot anyway.
I look forward to seeing everybody at PDS, and talking about this work.
--Andrew Whitworth
-------------- next part --------------
On Sunday at PDS I would like to take some time to talk about the new Embedding API work I (and bluescreen) have been doing in the embed_api and embed_api2 branches. This writeup is going to act like a primer for the conversation so everybody can come to the meeting knowing exactly what I am trying to do, and everybody can take the time to look over the code and generate some feedback.
The new Embedding API is being created as a new layer over existing functions. The current functions in src/embed.c and elsewhere will still be exported, and I am not adjusting the behavior. If you have an embedding application that uses the old API, that application should continue to work in the presence of the new API. I think everybody will want to upgrade when they see the new capabilities, but that's just a hunch. Once the embed_api2 branch is approved and merged, we can talk about deprecating the older interface, among other things (which I strongly recommend). First I'll talk about the general design of the new API, and then I will talk about some of the specific changes made to the Parrot internals.
The new embedding API is located entirely in the src/embed/ directory. The new header file is "include/parrot/api.h". api.h is to be used ONLY by the embedding applications, and should be the only header file used. In the new system, the embedding app should NEVER #include "parrot/parrot.h". Likewise, internal development should never use api.h, because it is only intended for embedding applications. All API functions are named "Parrot_api_*", and all of them are decorated with the "PARROT_API" macro (currently defined identically to PARROT_EXPORT, but it may change).
All API functions, without exception, have the following form:
PARROT_API
Parrot_Int
Parrot_api_some_func(Parrot_PMC interp_pmc, <args>, <&returns>)
{
EMBED_API_CALLIN(interp_pmc, interp)
... Logic goes here ...
EMBED_API_CALLOUT(interp_pmc, interp)
}
There are several things to notice here. First, every API function without exception returns a Parrot_Int (INTVAL, for you internals-junkies). This is a status flag that indicates normal success or some other kind of situation. For instance, normal behavior is to "exit 0" from a PIR application, or a ".return()" from a Parrot Sub executed directly. An "exit 1" indicates a non-normal return, but it's not necessarily an error. Any other kind of unhandled exception is an error.
Once an API function returns 0, we can use the Parrot_api_get_result function to gather information about it. That function has the following signature:
PARROT_API
Parrot_Int
Parrot_api_get_result(Parrot_PMC interp, Parrot_Int *is_error, Parrot_PMC *exception, Parrot_Int *exit_code, Parrot_String *errmsg);
It contains a flag to say whether we have an error condition (typically an unhandled exception that isn't an EXCEPT_exit). It also returns the Exception PMC. As a convenience, it currently returns the exit_code value from the PMC and also it's string message.
Because all API functions return a boolean success, we can chain them together in embedding applications, and share error-handling routines:
if (Parrot_api_do_one_thing(...) && Parrot_api_do_another_thing) {
...
} else if (Parrot_api_get_result(...)) {
... Handle Error ...
}
Notice also that the Parrot_api_get_result function returns a boolean. If that fails, we're in a pretty catastrophic situation and can't even get information about the error that caused us to crash.
As a design decision, I have been trying to use the 4 native Parrot types exclusively in API function signatures: Parrot_PMC, Parrot_String, Parrot_Int, and Parrot_Float. There are a handful of places where this doesn't make sense, especially during interpreter initialization and when taking C strings from the embedding application. Places where the input string is likely available as a constant for instance, or where it is only used once, it didn't make sense to me to force the user to wrap it up into a Parrot_String. We can talk about those kinds of details if people are interested.
Another thing to point out is that I never use a raw Parrot_Interp pointer directly. Interpreters are always passed as ParrotInterpreter PMCs. Since the interpreter structure is considered opaque, and since the ParrotInterpreter PMC has a number of useful methods that the embedding app may want to use, this seemed to me to be the most natural choice. It also opens the possibility that the embedding application could substitute in a *subclass* of ParrotInterpreter to get some custom behavior. I don't think we support ParrotInterpreter subclasses in the API yet, but if people are interested in the feature it shouldn't be too hard to add.
I've used the new API in most of the executables in the embed_api branch. If you look at src/main.c, src/pbc_disassemble.c, src/merge.c, or the fakecutables generated by pbc_to_exe, you'll see the new embedding API in action. There are a handful of places where it needs work, of course, but we have plenty of time to sort out any issues.
As for the internal design of the system, several things have changed to enable this new API. I'll list them for convenience.
1) The interpreter configuration hash is now set as a PMC, instead of as a raw stream of bytes. Also, the config hash can be set at any time after the interpreter is created (instead of having to be set before the first interpreter is created), and we can assign a new config hash to each new interpreter created. Where a config hash is not provided, some sane default values are provided. Internally the config hash is mostly used to set up search paths.
Since the config hash can be set as a PMC by the embedding application, there's no real reason why it would have to be a Hash at all. The embedding application has complete freedom to set anything they want here (including an HLL-friendly subclass).
As a caveat, once the config hash is set on the interpreter, there is currently no good clean way to change it. That is, you can change the PMC itself, but there is no good way to undo the changes made to the library search paths array. If this is a feature that people want we can work on it, but for now it seems like a non-issue to me.
2) Similarly to the config hash, the command-line args (available as IGLOBALS_ARGV_ARRAY in the interpreter) are no longer static. The arguments to :main can be any arbitrary PMC, and are passed as an argument to the new Parrot_api_run_bytecode function. You can set this value fresh on every call to Parrot_api_run_bytecode so individual interpreters in your program can all take different argument PMCs. There's no real reason why you would have to pass your :main function an array of strings either, if you don't want. It can be any PMC type, including a Hash or an HLL-friendly subclass.
3) Parrot_exit no longer calls "exit()". The new API sets a jump point (the EMBED_API_CALLIN and EMBED_API_CALLOUT macros handle all this). When you "exit" the interpreter's program you jump immediately back to the API call and return the status information back to the embedding application. In fact, Parrot_exit (recently renamed to Parrot_x_exit) should no longer be called in most situations. Parrot_x_exit runs several exit handlers, some of which may be destructive (like finalizing GC). Parrot_x_exit should now only be called in conjunction with Parrot_destroy when we are actually cleaning up the interpreter and not planning to execute anything else with it.
There is now a function "Parrot_x_jump_out" in src/exit.c that should be used most times when you want to "exit" your program. This returns control directly to the embedding application and communicates the current status.
4) die_from_exception, the fallback when we throw an exception but cannot find any handlers for it, no longer prints error or backtrace information to stderr. All the necessary information is packaged up in the Exception and passed back to the embedding application through the Parrot_api_get_result function. The embedding application can disect that PMC and handle all the necessary output operations. Besides debugging situations, it's my goal that libparrot should NEVER use fprintf to communicate error information directly to the user. libparrot should communicate with the embedding application, and that application is in charge of interfacing with the user.
5) the longopt family of functions in src/longopt.c is not linked in with libparrot anymore. We do still compile the object file, and embedding applications (like parrot.exe) may link with it if they want it. Some of the code changes here were a little bit ugly. We are still trying to tease out some of the argument processing code from IMCC for instance.
6) We still have a ways to go with this, but IMCC no longer executes the :main program directly. Instead, it returns a PBC PMC (currently an UnManagedStruct with a pointer to a PackFile structure). The user can use the Parrot_api_run_bytecode routine to run the PBC PMC. There is also a Parrot_api_load_bytecode_file
that can be used to load in a pre-compiled bytecode file to get the PBC PMC, and then run it from there. This creates the opportunity that any front-end which can produce a PBC PMC of some form can be used in place of IMCC.
The goal of the embedding API work is to start approaching a new vision for what Parrot could be. We should really be thinking about Parrot as two parts: libparrot (a language-agnostic bytecode interpreter and runtime) and the Parrot executable (the IMCC PIR/PASM front-end for libparrot). The Parrot executable now embeds libparrot using the new embedding API. And if the Parrot executable can do it, anybody else can too. The idea is that any application can embed libparrot and use it to execute any bytecode without making assumptions about the language that the program was written in, and without including all sorts of infrastructure that the application does not need. We don't assume that code came from IMCC. We don't assume we are using conventions from PIR/PASM. We give control to the embedding application to set the environment and all inputs, which makes for a much more flexibile and powerful tool.
At the time of writing this, the embed_api2 branch has seen a few major changes and is failing some tests. I'm going to get it fixed up again tonight so people can start playing with it if they want. I would very much like to get some feedback this week and during PDS so I can focus my efforts to make this work acceptable to the community and hopefully get it merged before 3.0.
More information about the parrot-dev
mailing list