[DRAFT] Parrot module ecosystem

Geoffrey Broadwell geoff at broadwell.org
Tue Aug 11 07:43:20 UTC 2009


Thank you for the detailed review!

Allison wrote:
> Geoffrey Broadwell wrote:
> >   * We can create a centralized metadata store, but do not want to
> >     build and manage a module distribution network ...
> 
> Good.
> 
> >   * However it should be possible for another group to do so.
> 
> <shrug> I'm not sure it is possible. Keep in mind that we're not just 
> talking about custom Parrot modules, we're talking about the entire 
> collected history of Perl, Python, PHP, Ruby, Lua, etc modules. That's 
> just an insane amount of data, even Google isn't crazy enough to want to 
> take that on.

A friend suggested that rather than trying to form a module distribution
network for every individual module, especially using CPAN-style
tarballs-over-HTTP distribution, a much more likely scenario is a
torrent service for full Power Packs (the top few of which are likely to
be sufficiently popular to actually take advantage of traffic sharing).

I'm not saying *we* do this, just that it is a reasonable thing for some
enterprising group to decide to do -- without having to do Google-scale
work.

> And, an additional requirement:
>   * The API for extracting info from the metadata store, and the process 
> of installing modules based on that metadata should be dead simple and 
> clearly specified so people can clone it. If we see implementations 
> popping up in Python, PHP, Ruby, Perl (5&6), etc it'll be a mark of success.

Agreed.

> >   * Tools must properly handle the difference between user-local,
> >     site-local, and vendor-installed modules.
> 
> Not sure this really matters.

I seem to recall the Perl 5 toolchain people actively iterating to get
this right, so I'm trusting they've thought about it enough to share
their institutional knowledge here.

> >   * Sufficient for automated programs to create system packages
> >     (DEB, RPM, etc.).
> 
> That would be tough, but we can at least cover the simple cases, with 
> guidance to point people toward how to extend the generated template to 
> handle the harder cases.

Sure.  As long as we don't treat perfection as a hard requirement,
thinking about the issues here can help flesh out weak points in the
spec.

> >   * Separate static v. configure-discovered v. hand-edited metadata.
> >     Separate files?
> 
> Sounds overly complex. Provide a field in the metadata for "data_source".

This came from a thread I noticed in #toolchain.  I had not had time to
investigate it fully, but I'll ask in more depth for the next iteration.

> >   * Allows disambiguation as per Perl 6 module spec (authorities,
> >     versions, authors, etc.).
> 
> It's just extra metadata, sure, why not. We should also make sure we can 
> accommodate the metadata for RubyGems, PHP PEAR, etc.

Absolutely; also on the list for the next iteration.  (I had to stop
investigating at some point and get feedback on my outline so far,
before I sailed completely off the map ....)

> >   * Specifies rules for dependency string parsing/interpretation.
> 
> Not sure what you mean. More detail?

Just as examples, not a complete list, a module in the wild could
reasonably declare dependencies against:

* Other modules from the same or other HLLs, or Parrot itself
* Power Packs
* System libraries, which vary between OSen in basename, version,
  basic architecture ('libfoo' versus 'Foo Framework'), etc., and may or
  may not require foo-dev packages to work with
* System tools, which vary in name and versioning across platforms
  ('make' v. 'nmake')
* Operating system versions ('linux 2.6.30+' or 'Windows Vista+')
* Alternatives ('iceweasel | firefox | safari | web-browser')
* Virtual dependencies (depending on something that multiple packages
  provide, such as 'web-browser' above)

Each of these is potentially in a different namespace, with its own
rules about how version strings are parsed and compared (the rules for
cpan shell are not the same as those used by apt-get, for instance).

The easy thing is to punt off most of the parsing to namespace-specific
parsers; PHP should know how to parse and compare PHP module versions.
But even if we do that, we need a meta-syntax that allows us to specify
which namespace a dependency string should be interpreted in.

As a first try, we could do:

    namespace:'dependency_string'

Thus:

    perl5:'Foo::Bar 2.16'
    perl6:':name<Baz::Biff>:auth<CPAN:QUUX>:ver<3.15.*>'
    apt:'zlib1g (>= 1:1.2.3.3.dfsg)'

But that's a bit ugly ... plus we still have other questions to resolve,
such as how to specify alternates across namespaces.

All of which is just to say: because we need to work with all of the
existing dependency metadata of every other module ecosystem out there,
there's a lot to think about even if we try to punt most of the details.

> You're still thinking CPAN (the best technology 1995 had to offer).

Sure, I'll buy that.  Had to start somewhere.  :-)

> The primary interface should be a web form where people can enter 
> metadata about their module. They should also be able to *update* the 
> information stored there, to mark an older version as deprecated, that a 
> module is no longer maintained, change the owner(s), change the URI for 
> download, or to remove a module entirely. (Look at Launchpad.net for 
> inspiration.)

I can certainly see offering this as *an* interface.  But I don't buy it
as the primary interface.  Manual labor that has to be replicated on the
project's source hosting site (which may be Launchpad.net or github or
what have you) and on Aviary as well breaks the main requirements of
making things dead easy for the module authors and not adding any
additional friction to their (probably volunteer) workload.

> From the form, we can generate a JSON dump of any module's metadata. We 
> can also accept a JSON block as an alternate input source, so someone 
> can keep a copy of the .json file checked into their repository, make a 
> few changes and paste it in the web form when releasing the next version.

Or they could register their repository *once*, and we can pull the info
(perhaps polling, perhaps on request).  Or if the posting site can feed
us release notices, we can use that.

We should work very hard to add as little extra work for authors as
possible, and definitely avoid adding an extra manual publish step.
Yes, we can offer the manual process as an option ... just not the only
one.

> The metadata shouldn't keep copies of any files from the module 
> distribution, though it should have space for a description. (If 
> someone's lazy they might paste in the entire README, but that's 
> generally about building a module, and so not appropriate for someone 
> who's looking for general information about it, a.k.a. "Do I want this 
> module?".)

Hmmm.  Something tickling my brain about this, but it's too late to
figure out exactly what.  Probably something to revisit when we get
closer to working code.

> > * Core modules
> >   * parrot config  (already exists -- config.pir)
> >   * HTTP client    (at least GET, with redirect and proxy support)
> >   * zlib           (at least decompress)
> >   * tar            (at least extract)
> >   * JSON           (at least parse)
> >   * version spec   (at least parse and compare)
> >   * library probe  (shared library info: present? version? location?)
> >   * file paths     (portability: File::Spec + File::Basename + ...) 
> >   * file install   (portability: copy file, set file perms, etc.)
> >   * query metadata (perform API calls to metadata/search server)
> >   * installer lib  (all the real brains/glue for the module repo client)
> >   * installer ui   (CLI and/or Readline, minimal brains, uses lib)
> 
> Too heavy weight.

You may be right, but that partially depends on the next item ...

> And honestly, I think it's backwards. Aviary should be 
> a standalone "Pack" that has Parrot as its first dependency. (If you 
> have Parrot installed, great, if not, it'll install it for you. It's 
> pretty much what Rakudo does already.)

There's a chicken and egg problem here, and I see no obvious reason that
one logically comes first.  I don't think the user cares either way, as
long as they only have to perform one manual step.  More important I
think is the branding -- do we want to brand ourselves as Parrot or as
Aviary?  Whichever one the user installs manually is the one they will
probably perceive as the primary brand.

My reason for choosing Parrot first was actually more technical -- we
make binaries/system packages of Parrot for various operating systems,
and Parrot once installed provides a nice abstract layer insulating us
from a lot of operating system differences.  Thus building Aviary on top
of Parrot makes use of Parrot's platform.  Building Aviary separately
from Parrot, so that it can be used to install Parrot, just forces us to
solve lots of portability issues all over again.

And however we do it, we shouldn't trample the system's Parrot or Aviary
if they were installed using the native package tools.

> Details of what modules go where can come later. First step is to get 
> Pack installs working. (Basic Batteries is just a small Pack.)

Sure; though BB is mildly special from a project standpoint, you're
right that from a technical standpoint it's just like any other pack.

> Aviary could include tools to make it very easy to create a Pack from a 
> simple JSON file (a list of modules, and a title/description for the 
> Pack). All Packs should be standalone, installing Parrot and Aviary if 
> needed.

No, not all of them.  Or at least, Packs should be available in
'regular' as well as 'all-in-one'.  This goes back to my reasons for
Parrot to be installed first.  People making Power Packs should have the
option to create bootstrapping Packs, but it doesn't make a lot of sense
to me that bootstrapping be the default.  I'm quite likely to want to
install the Games, Education, Science, and Graphics Packs all at once.
I see them as add-ons to a central system, not completely independent
things.

> > * Misc recommendations
> >   * Separate 'parrot-modules' mailing list for module creators/users.
> 
> Nothing kills a good idea faster than shoving it off on a separate 
> mailing list that no one reads. If module-specific traffic ever seems to 
> be overwhelming parrot-dev we can split it off. (Since we got the ticket 
> traffic off parrot-dev, traffic is quite tolerable now.)

OK, fine by me.

> >   * Default to simple (CPAN-style) dependency resolution; upgrade to
> >     full resolution and system package awareness in Basic Batteries.
> 
> Not sure what you mean. More detail?

The first part was about handling edge cases such as conflicts/provides
graphs that can only be resolved by upgrading multiple modules in
concert to particular versions and removing other modules at the same
time.  The second part was about knowing that particular non-Parrot
system packages were installed (and what versions).

> >   * Names so far suggested for module repository network:
> >     + Aviary
> 
> Love it. Very Parroty. aviary.parrot.org?

All in favor, say 'aye'.  Barring trademark collision, I say 'aye'.

> > * Manifest fields:
> >   * files
> >     + configure
> >     + build
> >     + test
> >     + install
> >       - share
> >       - docs
> >       - bin
> >       - lib
> >       - runtime
> 
> Skip the manifest, it's a pile of duplicated data that's only needed by 
> the build process (by the time you start the build process, you have the 
> tarball anyway).

This was conceptually to support script-free module installs and a few
other similar ideas.  This feels like a 'play it by ear' sort of thing;
I'm too tired to be sure whether a detailed manifest is really useful or
not.

> > * Undecided fields:
> >   * dynamic_config
> >   * no_index
> >   * digests
> >   * signatures
> 
> We should be prepared for the format to grow and change over time, 
> possibly allowing custom fields.

OK.  I have some very rough ideas, but I haven't thought through the
edge cases yet.

> I don't see anything here for "standard build instructions". As in, the 
> specific command-line instructions for "configure", "build", "test", and 
> "install". These could allow variable substitutions from parrot_config 
> (or Aviary's collected configuration information), so Rakudo's "perl 
> Configure.pl" could be "@perl@ Configure.pl", while Pynie's "python 
> setup.py build" could be "@python@ setup.py build", and a general 'make 
> test' could be "@make@ test" (to allow for nmake, etc).

The standard instructions should be 'aviary install foo-pack'.
Everything else is either advanced usage or internal details that I
think we'll work out as we come across them.


-'f




More information about the parrot-dev mailing list