[Parrot-users] [UNSURE] lua: bugfixes, PIR best practices, Unicode, doc questions

François Perrad francois.perrad at gadz.org
Sat Aug 6 08:56:18 UTC 2011


2011/8/4 Aaron Faanes <dafrito at gmail.com>:
> This email is specific to lua, but the questions are pretty basic, so
> please answer if you know about precedent in other HLLs.
>
> I've forked fperrad's lua implementation on github, so that I can
> publish commits and issue pull requests, etc. It can be found here:
>
> https://github.com/dafrito/lua
>
> I have a few questions/comments/etc. regarding some things I've ran
> into. Sorry for the digest-style format, but I didn't want to spam the
> list with lots of little emails. ;)
>
> ## Bugfix regarding naive optimization
>
> https://github.com/dafrito/lua/commit/8ea8e825c4a932af83b87a08e1d47cd858d5634f
>
> There's an optimization that breaks some specially crafted tables. For
> a quick example, consider the following snippet:
>
> function Foo()
>        return 42
> end
>
> return { Foo() }
>
> The optimization removes a temporary variable on returned function
> invocations, but this variable is used if the returned value is a
> slurped table literal. As a result, the optimization's removal means
> the resulting PIR code refers to a null register.
>
> I fixed it in the following commit by removing the optimization. I
> wasn't sure how to make it more specific.
>
> As a corollary, is this sort of optimization necessary or expected in
> PIR code? i would expect Parrot's optimizer to catch it.
>
> ## Unicode problems
>
> I ran into a "Lossy conversion to single byte encoding" when trying to
> parse a lua file containing some special characters (such as
> '←','↑','→','↓', etc.). Changing the encoding to utf8 fixed this
> issue. My patches can be found here:
>
> https://github.com/dafrito/lua/commit/ee17af1c560171e3f0fc0264e86f73c070416212
> https://github.com/dafrito/lua/commit/fcf40eed4f939c018e1b435ba85a3ce1acb7e908
>
> This fixes the issue for compliation and for installable_luap.
> installable_lua exits normally, but incorrectly displays the
> characters. I imagine there's another "encoding" somewhere to find.

installable_lua can handle Lua source script and Lua bytecode,
so the file must be readen as binary, and after reinterpreted as utf8
if it is not a bytecode.
(installabe_luap handles only Lua source, so the file is readen as utf8)

but it is not the current behavior of trans_encoding

.sub 'main' :main
    $S0 = binary:"Fran\xc3\xa7ois"
    say $S0
    $S9 = escape $S0
    say $S9

    $I0 = find_encoding 'utf8'
    $S1 = trans_encoding $S0, $I0
    say $S1
    $S9 = escape $S1
    say $S9

    $S2 = utf8:"Fran\xe7ois"
    say $S2
    $S9 = escape $S2
    say $S9
.end

as you could see, $S1 is not equal to $S2

>
> I don't know the performance/portability ramifications of ASCII versus
> utf8 for compiling code. The official Lua implementation isn't that
> clear, either, on what functionality is available.

small benchmark on my box with the whole test suite (./setup test)
    ascii encoding : ~ 90s
    utf8 encoding : ~ 130s
so, it is really worse

>
> ## Refactorings in POSTGrammar
>
> https://github.com/dafrito/lua/commits/calling_refactor
>
> As I worked on the preceding bug, I started writing up some refactors
> in POSTGrammar.tg. I started putting them in a separate
> "calling_refactor" branch. The commits usually fall in one of the
> following categories:
>
> * Adding some pod and comments
> * Using .local instead of $P0, $S0, etc. for long-lived variables
> * Using logical names for labels: using FINISH instead of L5, for example
> * Adding some noop labels for readability or consistency purposes
>
> Please review the commit history if you're interested.
>
> ## PIR Best Practices
>
> A few assorted best-practice questions for PIR:
>
> * Is it preferred to use L0, L1, etc. for label names?
> * Is there consensus regarding how labels should be capitalized? I
> capitalized them, to distinguish them from variables.
> * Is it preferred to use .local over $P0? My rule-of-thumb was if the
> variable lived beyond a few lines, it should have a name.
> * Is it common/frowned-upon to use labels solely for readability
> purposes? I added them in a few places to group a chunk of code in a
> long function together, or for symmetry reasons (to explicitly note a
> branch that's accessed via fall-through)
>

It is just MY style, and I use it consistently in the whole project.

> ## Documentation questions
>
> I noticed there's ~140 places where the copyright's referenced year is
> not current. Should these be updated to refer to 2011? (even if they
> haven't been modified since then?) My preference is to update them all
> at once, rather than updating on modification since this adds noise to
> commits.

It is not my current policy.

>
> The documentation assumes, in several places, that lua resides in
> $PARROT/languages/lua. Parrot's fetch_languages script does place the
> git repo there, but the repository is able to stand alone. Should
> lua's documentation assume it's installed in the languages
> subdirectory?

out of date links which reflect the past (~3 years ago).

>
> --
> Aaron Faanes <dafrito at gmail.com>
> _______________________________________________
> Parrot-users mailing list
> Parrot-users at lists.parrot.org
> http://lists.parrot.org/mailman/listinfo/parrot-users
>


More information about the Parrot-users mailing list