The Darwin module system for Lua

Durham, NC, U.S.A.

License

Darwin is licensed under the MIT Open Source license reproduced below.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

1. Introduction

1.1. Rationale

1.2. Design goals for Darwin

1.3. The binding for the name Darwin

2. Design

2.1. What is a module?

2.2. What is a structure?

2.3. Signatures

2.4. Module code

2.5. Turning Lua code into Darwin modules

2.5.1. Lua modules defined with the Lua module function

2.5.2. Lua modules defined without the module function

2.5.3. Lua modules defined by a file of code that returns a table

2.5.4. General Lua code

2.6. Special treatment for objects

2.7. Darwin and Lua's package table

3. Discussion

3.1. Initializers of package-specific data

3.2. What makes a good module?

3.3. Darwin's require, module, and seeall functions

3.4. Darwin modules and garbage collection

3.5. Searching for structure declarations

3.6. Note on starting Darwin

4. Comparison to other module systems

4.1. Scheme48

4.2. SML

4.3. Java

5. Darwin Reference

5.1. Functions

5.1.1. structure.declare

5.1.2. structure.load

5.1.3. structure.signature

5.1.4. structure.currentopentable

5.1.5. structure.preloadtable

5.1.6. structure.instructure

5.1.7. structure.initialize

5.2. darwin.initialstructures

6. Known bugs, limitations, and planned enhancements

6.1. Known bugs

6.2. Limitations

6.2.1. Code change needed to support MacOS

6.2.2. No explicit support for renaming exports

6.2.3. No explicit support for importing a subset of bindings

6.3. Planned enhancements

6.3.1. A Meta-Module Protocol?

6.3.2. Weak imports

6.3.3. Support for importing a subset of bindings

6.3.4. Functors

7. Bibliography

Introduction

The Darwin module system provides a structured module system for Lua. Darwin is implemented purely in Lua and is compatible with the existing Lua 5.1.x module and package facilities.

Rationale

Lua is an extraordinarily flexible language. Conceived originally as a scripting language, it has proven broadly useful in a range of application domains and application sizes. Lua has a basic module system that provides some namespace control and an (independent) ability to search for modules and files so that the programmer need only supply the name and not the location of the desired code.

There are two circumstances in which an enhanced module system is needed:

When the script writers are users whose scripts should not have the ability to harm the larger system into which Lua is embedded; and
When the amount of Lua code in a project grows beyond some point (perhaps 1k lines) and would benefit not only from partitioning into modules, but also from the composability of modules that derives from a proper, structured module system.

In the first case, one wishes to provide users with libraries of functions with which they can write their scripts (Lua programs). Suppose two libraries are provided, X and Y, and Y uses functions from X. The user may be using a combination of functions from X and Y, but in Lua's module system, the user can alter X, which can in turn break the functions implemented in Y. It may be acceptable (in general) for the user to alter X, if the most harm they can do is to break their own code. But it is unfortunate if the user can also break Y. (E.g. a function in Y might be called by the system to process results generated by the user's code. The user can make the system operations fail!)

In the second case, one desires not only isolation between modules, but some composability as well.

Isolation is essential in large systems because it eliminates a particularly devilish type of bug that emerges from unexpected interactions between modules. (Such bugs can appear and disappear depending on the load order of the modules or depending even on the run-time sequence of function calls.) But isolation is not enough. In a large system, the dependencies between modules can be difficult to track and maintain. Having a declarative syntax for module dependencies becomes essential, because it enables static analysis and (by extension) a variety of automated tools.

Sometimes, dependencies can be difficult to resolve, such as when module A requires version 1 of module utiland module B only works with version 2 of util. The notion of "composition" addresses this problem. When modules are composable, we can construct A using version 1 of util, and construct B using version 2 of util. The resulting system can contain both A and B. We would say that the system is composed of modules A and B, where A and B are, in turn, composed of other modules.

Design goals for Darwin

Isolation: Module code should be sufficiently isolated from code that uses the module such that no user of a module can interfere with another user of the same module.

Namespace control: When a module is used, no bindings should be exposed other than those in the designed interface to the module.

Declarative dependency representation: Dependencies between modules should be declared in a way that enables static analysis whether by automated tools or simply by visual inspection.

Interface specification: The interface to a module should be able to be specified separately from the code that implements it, in order to allow the substitution of alternate implementations.

Simplicity: The module system should be as simple as possible in order to promote adoption, e.g. by providing precise definitions of familiar concepts (like "module") and avoiding the introduction of many new concepts. Most importantly, it should be possible to write Lua (and C) code without thinking about the module system, and then to use that code later in a module.

Backwards compatibility: The module system should effortlessly co-exist with the existing Lua 5.1.4 module features (where Lua modules have the same features and limitations as today), and further to provide ways to reuse existing Lua module code in the Darwin system in order to obtain additional benefits (e.g. isolation and namespace control).

Separate compilation: The design of the module system should accommodate the separate compilation of modules and also the dynamic loading of both Lua and C modules that is provided today in Lua 5.1.4.

The binding for the name Darwin

The name Darwinis not an allusion to Charles Darwin, nor to the process of natural selection he identified. It is also not a reference to any part of any Apple Operating System. Darwin is the name of my dog.

Design

The rest of this paper describes the concepts and artifacts of the Darwin module system for Lua. References to module mean Darwin module; references to Lua module refer to the modules of Lua 5.1.4. Generally, where there is overlap in terminology between Lua 5.1.4 and Darwin 1.0.0, the reader should assume the Darwin binding for all unqualified terms.

What is a module?

A module is a collection of bindings, where a binding maps a name to a location in the store.

A module system (like Darwin) specifies how modules are created, manipulated, and used. By far the most common use of a module is to open it, i.e. to make its bindings accessible in the current environment. Sometimes called importing the bindings of a module, the process of opening a module involves a sort of merging of the module environment into the current environment (where the open is occurring).

What is a structure?

In Darwin, we use the term structure instead of module, partly due to the influence of SML and Scheme48, and partly because Lua already uses the term 'module' and has a function of the same name. A structure in Darwin is a run-time object that encapsulates everything about a module:

the code that implements the module (or references to files containing that code);
other modules opened by this module;
the signature of the module;
and, any special treatment that might be needed for objects created by the module code.

To declare a structure, you use a data structure that specifies any of these items that pertain to your module. Tables are the data structure in Lua, so a structure declaration is a Lua table. Let's look at a simple example before we discuss the occasional need for "special treatment" stated in the list above. We will come to that topic before the end of Section 2.

The following example declares a "list module", in other words, a module that defines a list data type. In Darwin terminology, this example declares the list structure:

CODE EXAMPLE

structure.declare { name="list"; location="."; open={"_G"}; objects={"null"}; files="list.lua"; }

The declaration itself is the table with the slots name, location, etc. As written, this example calls the function structure.declare on that table. Later, we will see how structures are loaded and used. The example above shows only how a structure is declared.

The code for the example structure is in the file "list.lua". (Note: The files clause will accept a single file name or a table of file names.) The functions structure.getpathand structure.setpathcan be used to read and write the search path for file names that appear in a Darwin structure declaration. The example declaration states that the code implementing the list structure requires the functions in the base library _G. We will explain the location and objects clauses later.

The actual run-time object created when a structure is declared contains, additionally, the module environment (the actual collection of bindings) and a bit of state. The module environment is empty until the module is loaded, and the state reflects whether or not the module is loaded. Before a module can be opened, it must be loaded, which means that the code that implements the module must be loaded and (if necessary) compiled, and then executed.

Signatures

One of our design goals is to allow the separate specification of a module's implementation from its interface. A module's interface is the set of bindings that become accessible when the module is opened. We say that a module exports these bindings.

Inspired by Scheme48's module system (and, therefore, by SML modules), each Darwin module has a signature, which is a list of the names exported by a Darwin module. Each name is a string. No matter how many global variable bindings are created by the Lua code that implements a module, only the subset of those bindings specified by the signature will be accessible by users of the module.

Recall the first example in this document, which declared the list structure. The file "list.lua" contains 27 global definitions, and all 27 are available whenever the list structure is opened. But suppose I want to expose only a few of those 27 definitions, e.g. just the ones that are needed to support Scheme's pair datatype. I can declare a structure that provides only those functions:

CODE EXAMPLE

structure.declare { name="pair"; signature={"cons", "car", "cdr", "isnull", "null"}; open={"_G"}; objects={"null"}; files="list.lua"; }

The signature clause lets me list exactly the definitions that I want exposed when the pair structure is opened. Below is a sample transcript showing an interactive Lua session (with Darwin loaded), where the pair structure is declared, opened, and used. Note that in normal use, the pair structure declaration would be in a file, and the pair structure would likely be opened by being named in the open clause of another declaration.

TRANSCRIPT EXAMPLE

Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio Darwin 1.0.2 Copyright (c) 2009 James S. Jennings Current package is 'user'. > structure.declare { name="pair"; signature={"cons", "car", "cdr", "isnull", "null"}; open={"_G"}; objects={"null"}; files="list.lua"; } > structure.open "pair" > =pair.null {} > =pair.cons("a", pair.cons("b", pair.null)) {a, b} > =pair.isnull(pair.null) true >

The signature concept provides 3 key benefits:

A declared signature can be inspected as part of a static analysis. (On a related note, a signature answers the user's question "What is exported?" more reliably than by reading the code, and without having to load the module.)
A family of modules may be built by using different signatures to expose the same implementation in different ways. A common use case occurs when the module writer wants to provide both a "standard" interface to users of the module and a "debugging" (or "internals") interface which exposes the inner workings to aid the development of programs that use the module.
By separating the signature from the implementation, it is possible to substitute an alternate implementation of the module without changing any of the code that uses the module. In practice, this is rarely achieved in any programming system, but it's a nice idea. Actually, it does work in practice for a useful class of problems: the implementation of (very) well-specified data structures.

Like SML, and unlike Scheme48, Darwin does not force the programmer to supply a list of exported bindings. If a list is provided in the module declaration, it is used. Otherwise, the signature is automatically constructed (when the module is loaded) and is essentially the list of global bindings created by the module code.

More precisely, when Darwin automatically constructs a module signature, it starts with a list of all the global bindings in the module environment (more on that environment later). All names that match bindings imported from other modules are filtered out. Lastly, all global tables are assumed to be "substructures" that contain more bindings, and their contents are recursively added to the module's signature. The rationale for and implications of this approach are discussed in the section "Special treatment for objects" below.

Module code

The code that implements a Darwin module can come from many places. In nearly all cases, Darwin modules may be built out of code that was not written with modules in mind — neither Lua modules or Darwin modules. We shall see, later, however, that both ordinary Lua code and existing Lua modules can easily be turned into Darwin modules. Darwin's structure declaration facility supports:

code supplied as strings in the structure declaration;
code loaded from files;
and, bindings that have already been created by any means that Lua supports for finding, loading, and running code.

The ability to put module code into strings given in a structure declaration is primarily meant for exposition, but it does have a practical use as well. If almost all of the code you need is in a file, you can specify the file name in the structure declaration and then provide additional code as a literal string. The additional code might establish a preferred set of default values, or it might define and insert hook functions to customize the behavior of the code stored in the file.

We expect that, most commonly, code will be loaded from files. The semantics of loading code from a file in Darwin match those of Lua's dofile — in fact, dofileis used in the implementation of Darwin.

The third way that a module can be created from existing Lua code is to use the environment clause in a structure declaration. The environment clause accepts either a table or a string. If a table value is supplied, then that table becomes the module environment. This is how the standard Lua libraries are exposed by Darwin as modules: Their tables (e.g. math, string, debug, ...) are supplied in an environmentclause in a structure declaration. Note that the open clause is irrelevant in this scenario, because the structure declaration contains actual bindings, not code requiring an environment in which it must be loaded and executed.

If, on the other hand, you supply a string value in the environment clause instead of a table value, Darwin will treat the string as code that produces the module environment. The code will be run in a sandbox, and it must produce a table as the first returned value. That table becomes the module environment; any other returned value are discarded.

The sandbox is the environment in which Darwin loads and executes the code string in the environment clause. Once the resulting table value is obtained, the sandbox is discarded. (Of course, the code in the string may well have created closures over objects in the sandbox, or even of the entire sandbox!)

The sandbox is created by opening all of the structures declared in the open clause, and then loading all the files specified in files, and then executing all of the other code strings from the structure declaration in order. Finally, the code string given in the environment clause is processed to produce the final environment table. The next section explains how to re-use existing Lua code as the content for Darwin modules.

Turning Lua code into Darwin modules

Perhaps the most important attribute of Darwin is that Darwin modules can be easily assembled using existing Lua (and C) code. Lua modules are commonly used today, and it is easy to turn Lua modules into Darwin modules, even though Lua modules may be constructed in a variety of ways.

Lua modules defined with the Lua module function

The Lua module function creates a table and sets the (default) function environment for the loader function to be that table. The loader is the function that is processing the code, e.g. dofile. All of the bindings created by the loader function (after the module function is called) will be created in the module table, and function values will have that table as their function environment.

The package.seeallfunction in Lua is needed because the global environment is not accessible from the table created by the module function. The optional arguments to Lua's module function are functions, and package.seeallis a function which creates a metatable for its table argument where __index=_G, so that the global environment becomes accessible.

It is easy to build a Darwin module which reuses a Lua module without changes. When a Lua module creates a table containing the desired module bindings, you can assign that table to be the environment of a Darwin module with the environment clause.

CODE EXAMPLE

structure.declare { name="lanes"; open={"_G", "package", "table", "string"}; environment=[[ require("lanes"); return lanes ]]; }

The example above is based on a real package, Lua Lanes 2.0.3 (http://luaforge.net/projects/lanes). The code that implements Lanes is written in C and Lua. The Lua code uses functions from three of the standard Lua libraries: the basic functions in _G, plus functions from the table and string libraries. The declaration above opens package as well, because Lua defines the packagelibrary as two functions, require and module, together with the package table. The Lanes code does not need the packagelibrary, but the code inside the structure declaration does — it calls require.

Recall that the value supplied in the environment clause must be either a table or code (a string value) that returns a table. Before Darwin, we would write require("lanes"), so that is what goes into the environment clause. However, the code must return a table, and we know that the call to require will produce the table lanes. Therefore, we end the environment code string with return lanes.

After the declaration in the example above, we can use the lanes structure in several different ways.

When we declare another Darwin module that uses Lanes, we include "lanes" in the open clause of that declaration.
When we load Lua code that calls require("lanes"), the lanes structure will be opened automatically, because Darwin inserts entries into package.preload for each declared structure, so that the declared structure will be shared by all structures that open it.
In an interactive Lua session, we can issue either structure.open "lanes", or require "lanes" at the prompt.

Notice that there is no longer any need for package.seeall. Lua's package.seeallprovided the module code with access to _G with no way to guarantee that the bindings in _G would be the ones that were expected by the author of Lanes. Suppose we redefine some of the names in _G before loading the Lua lanes module (as opposed to the Darwin lanes module, declared in the example above). We can easily break Lanes.

With the Darwin module system, Lanes is much more likely to work as expected because the Lanes code receives a pristine set of bindings from the _G structure as a result of opening "_G".

We can also write more "secure" code with Darwin. Perhaps it is better to say that we can better control the set of bindings available to module code using Darwin. For example, suppose that the Lanes code calls os.execute (although it does not) and you want Lanes to use your own customized version of os.execute, perhaps because your version implements a "white list" or some other form of validation on the function's argument. Darwin can ensure that Lanes has access only to your version of os.execute and not the Lua standard version. Let's look at two reasonable ways to do this.

We will use a very simple module for our example. We will define the structure environment using a code string without referring to any files, just to keep the example small. First, let's declare a structure that exports a function, f, which uses os.execute.

TRANSCRIPT EXAMPLE

> structure.declare { name="test"; open={"_G", "os"}; environment= [[ function f() os.execute("date") end; return {f=f} ]] } > structure.open "test" > test.f() Sun Jul 26 08:32:08 EDT 2009 >

In our simple example, the code for the entire module is given in a literal string. But in general, the module code is more likely to be contained in one or more files. We can ensure that our own version of os.execute will be called by the functions in those files by installing our own os.execute into the environment. One approach is to leverage the ability to insert code as strings into a structure declaration, knowing they will be executed before the environment clause is processed. The change from the previous example is highlighted.

TRANSCRIPT EXAMPLE

> structure.declare { name="test"; open={"_G", "os"}; pre=[[ local exec=os.execute; os.execute=function(...) print("Running os.execute..."); exec(...); print("done."); end ]]; environment= [[ function f() os.execute("date") end; return {f=f} ]]; } Warning: replacing existing structure test > structure.close "test" > structure.open "test" > test.f() Running os.execute... Sun Jul 26 08:26:49 EDT 2009 done. >

The code we added used the "pre" declaration clause. This clause is named "pre" because Darwin processes it before the "files" clause. There is a symmetric clause called "post" that is evaluated after "files". The last clause processed is "environment".

A good convention is to put the environment clause last, as shown, in order to emphasize that Darwin processes this clause last. The actual order of the declaration elements is irrelevant, since the declaration itself is a table with string keys.

Let's return to our example in which we want to provide our own "os.execute" function. Another approach, perhaps a better one than the previous example, is to define our own os module which contains our own versions of the Lua os library functions. Then we can use our own os module instead of the Lua one wherever we need it. And, changes to our module can be made in one place, where our module is defined, instead of in each structure declaration that needs our custom os functions.

CODE EXAMPLE

structure.declare { name="my_os"; open={"_G", "os"}; environment= [[ local exec = os.execute; os.execute=function(...) print("Running os.execute..."); exec(...); print("done."); end; return os ]] }

This example (above) is not quite right, however. When a Darwin module is opened, by default the module bindings appear in a table named after the module. In the example above, opening "my_os" will create a table my_oswith entries such as my_os.date, my_os.execute, etc.

We want the bindings of "my_os" (date, execute, ...) to appear in a table called "os" so that we can use "my_os" as a replacement for "os". Darwin's structure declaration facility provides a location keyword with which you can specify where the bindings should appear. The location clause accepts a string value, which is the table name. The name may contain the dot "." character, in which case it specifies a nested table, e.g. "a.b". Or, the name may be the dot character by itself, which has the special meaning of "at the top level".

Here is the declaration we want. The additional line is highlighted.

CODE EXAMPLE

structure.declare { name="my_os"; location="os"; open={"_G", "os"}; environment= [[ local exec = os.execute; os.execute=function(...) print("Running os.execute..."); exec(...); print("done."); end; return os ]] }

Now we can go back to our interactive Lua session and try out a declaration for a module that was written to use the Luaos module, but which we will force to use our custom my_OS package.

TRANSCRIPT EXAMPLE

> structure.declare { name="test"; open={"_G", "my_os"}; environment= [[ function f() os.execute("date") end; return {f=f} ]]; } Warning: replacing existing structure test > structure.close "test" > structure.open "test" > test.f() Running os.execute... Sun Jul 26 12:43:39 EDT 2009 done. >

Notice that Darwin warns that we are redefining the structure test, which was defined by our first attempt in a prior example. When we see that warning, we should make sure that we close the current structure test, if it was already open. Otherwise, structure.open "test" will return immediately because "test" (using the prior declaration) was already open.

Lua modules defined without the module function

Using the Lua module function is one way to create a named table of bindings. Another technique is to write a file containing just one global variable assignment: one that assigns the module name to a table of bindings. To turn such a Lua module into a Darwin module, exactly the same approach is used as in the previous section (where the Lua module function was used).

For example, consider a file containing this code:

FILE EXAMPLE

-- file "mtest.lua" local x=1 local function get_function() return(x) end local function set_function(v) x=v end mtest = {get=get_function, set=set_function}

When this code is loaded into Lua, e.g. withdofile("mtest.lua") or require("mtest"), the net effect is the creation of the tablemtest at top level, with the two module functions accessible asmtest.get andmtest.set.

For example:

TRANSCRIPT EXAMPLE

Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > require "mtest" > =mtest.get() 1 > =mtest.set(99) > =mtest.get() 99 >> structure.declare { name="test"; open={"_G", "my_os"}; environment= [[ function f() os.execute("date") end; return {f=f} ]]; } Warning: replacing existing structure test > structure.close "test" > structure.open "test" > test.f() Running os.execute... Sun Jul 26 12:43:39 EDT 2009 done. >

We can use exactly the same approach taken in the previous section to create a Darwin module. We know that after require "mtest", we will have a table calledmtest which contains the bindings we want in our Darwin module. Here's how the structure can be declared and tested:

TRANSCRIPT EXAMPLE

> structure.declare { name="mtest"; open={"package"}; environment=[[ require("mtest"); return mtest ]] } > structure.open "mtest" > =mtest.get() 1 > =mtest.set(2000) > =mtest.get() 2000 >

Lua modules defined by a file of code that returns a table

Some people write Lua modules by constructing a file of code that returns a table containing the module bindings. To turn such a module into a Darwin module, we use a variation on the technique shown in the prior section. In this case, we know that require modname(or, alternatively, dofilefilename) returns the table we are looking for. So, once again we use the environment clause.

FILE EXAMPLE

-- file "mtest2.lua" local M={} local x=1 function M.get() return(x) end function M.set(v) x=v end return M

TRANSCRIPT EXAMPLE

Given the code in "mtest2.lua" and how it is meant to be used, here is how to turn the Lua module mtest2 into a Darwin module:

TRANSCRIPT EXAMPLE

> structure.declare { name="mtest2"; open={"package"}; environment=[[ return require("mtest2") ]] } > structure.open "mtest2" > =mtest2.get() 1 > =mtest2.set(5) > =mtest2.get() 5 >

General Lua code

Sometimes, developers use the Lua require function to load files of arbitrary code that are unrelated to modules. The benefit of using require instead of dofileis that require searches for the Lua file, and once loaded, will not load the file again. Both require and dofilecan be used separately or together in order to load arbitrary units of code and (typically), thereby creating a set of bindings to useful functions.

We may wish to turn an arbitrary collection of code (such as might be loaded today using a set of require and dofilecalls) into a Darwin a module. Suppose we have a project whose code starts with:

CODE EXAMPLE

require "lanes" dofile "mtest.lua" set_debug_mode = function() DEBUG=true; end

We could package up this preamble into a module that, when opened, would have the same effect. The declaration would look like this:

CODE EXAMPLE

structure.declare { name="projectx"; location="."; open={"_G", "package"}; pre = [[ require "lanes" dofile "mtest.lua" set_debug_mode = function() DEBUG=true; end ]] }

Here we use the special value "." in the location clause to tell Darwin to export the bindings of projectxinto the top level (i.e. global) environment. Notice that we were able to copy the 3-line preamble code (highlighted) without modification into the body of the declaration of projectx.

Opening projectxmakes these bindings available in the global environment:

lanes (a table containing lanes.gen, lanes.linda, etc.)
mtest(a table containing mtest.get and mtest.set)
set_debug_mode (a function defined in the declaration of projectx)

Note that there is good value obtained for the small effort of creating the projectx module and using it when developing (and later shipping) Project X: We can do whatever we want to _G in the working environment without affecting the functioning of lanes or mtest. Due to Darwin's isolation of the working _G from the _G in which lanes and mtest were loaded, we will get consistent behavior of lanes and mtest code every time, from development through test and on to production use.

Special treatment for objects

Tables are more than just the data structurein Lua — they are also the data structure used for environments! It is wonderful that function environments are first-class objects in Lua and that they are implemented using a simple data structure. Many benefits derive from these design decisions, including the ability to modify Lua's behavior, e.g. with strict.lua, which signals an error when an uninitialized global is used. Also, the implementation of Lua's module and package.seeall functions are models of simplicity.

However, while a rose is a rose is a rose, a table is not always an environment. Suppose a module M exports a name Foowhich denotes a table value. Foomay be an environment within M, in other words, a table containing a set of bindings. In common Lua fashion, the module writer may have grouped his functions into tables within M, using M.Foo to hold the bar, bat, and baz functions. The module writer expects the module user to access these bindings by writing M.Foo.bar, etc.

Opening M makes accessible the bindings "M.foo.bar", etc. What Darwin actually provides to the module user is a private copy of these bindings, in order to isolate the module users from one another. So far, so good.

But what if the table Foo is not a collection of bindings, but is instead an "object" with three initial slots defined (bar, bat, and baz)? Because it is an object, Foo probably has a metatable that shapes its behavior. When Darwin exports bindings, it ignores metatables. That is, Darwin creates plain tables in the user environment to hold the bindings of modules opened by the user.

So, while the the developer may intend Foo to be an instance of an object, Darwin has no way to know whether Foo is an object or a collection of bindings (another environment) within M. This scenario cannot be detected programmatically. The presence of a metatable is a good clue, but an object need not have a metatable. How can Darwin know the intended use of Foo?

When the need arises to export a binding to an object, the module writer informs Darwin that this is the case by using the objects clause of the structure declaration. For each object listed, the user of the module receives a binding to the object, not to a new table with equivalent contents. The user sees a binding to the object itself, metatable and all.

FILE EXAMPLE

-- file "memo.lua" local function compute(string) -- An expensive calculation would go here, -- but we will just calculate string length instead return #string end local mt = { __index = function(self, key) local result = compute(key) rawset(self, key, result) return result end } results = setmetatable({}, mt)

The file in the example above defines an object called results, which is a table that memoizes the compute calculation on the keys of the table. The user of the module may request results[k] for any k, and the results object will either return the already-known value or compute the value and store it for future requests.

Our first attempt at a structure declaration will fail because Darwin will treat results as a collection of bindings to export. (Although the table happens to be empty.)

TRANSCRIPT EXAMPLE

> structure.declare { name="memo"; open={"_G"}; files="memo.lua" } > structure.open "memo" > =memo.results["Hello"] nil > =getmetatable(memo.results) nil >

We should have received 5 as the value of memo.results["Hello"], because the compute function returns the length of its argument. We need to tell Darwin to treat results as an object. This is the right declaration:

TRANSCRIPT EXAMPLE

> structure.declare { name="memo"; open={"_G"}; objects={"results"}; files="memo.lua" } > structure.open "memo" > =memo.results["Hello"] 5 > =rawget(memo.results, "Goodbye") nil > =memo.results["Goodbye"] 7 > =rawget(memo.results, "Goodbye") 7 >

We can see in the transcript above that using objects produced the right effect. Clearly, results has its metatable intact, because the object functions as we expect it to function (as demonstrated by the use of rawgetto inspect what is happening inside the table).

Darwin and Lua's package table

By default, Darwin is completely unrelated to Lua's package table. When you open a structure, it does not appear in package.loaded. Conversely, when you require a module, the require function will not be aware of structures that might be declared.

However, it is often desirable to have exactly the opposite behavior, i.e. for open structures to appear in package.loaded and for the require function to open a structure if there is one declared with the right name. Darwin provides an easy way to do this.

When you open the package structure, you get bindings for require, structure, and package. Darwin maintains a table of all declared structure names in a preload table, in which each key is a structure name, and all the values are the function structure.open. If you set the metatable index of package.preloadto be Darwin's preload table, then require(modname) will exhibit the desired behavior:

If package.loaded.modnamethen return this value
If package.preload.modnamethen run this function, i.e. open the structure modname.
Else start iterating over the functions in package.loaders.

The Darwin function structure.preloadtable returns the global preload table. The function takes no arguments.

But we also want open structures to appear in package.loaded, so that require(modname) will simply return package.loaded.modnameif modnameis an open structure. Darwin provides access to the table of open packages for the current package, and the open package table has the same format as package.loaded. If you set the metatable index of package.loaded to be the Darwin open package table for the current package, then all of the open packages will appear in package.loaded.

The Darwin function structure.currentopentable returns the table of open packages for the current package. The function takes no arguments. Do not modify the table obtained through this function.

This is how Darwin might define the user package in order to enable require to open structures and to have all open structures appear in package.loaded.

CODE EXAMPLE

structure.declare { name="user"; open={"_G", "structure", "package"}; pre = [[ package.initialize(); setmetatable(package.preload, {__index=structure.preloadtable()}) setmetatable(package.loaded, {__index=structure.currentopentable()}) ]] }

Notice the call to package.initialize in the example above. This statement is necessary due to a design decision to keep Darwin as uniform as possible. In brief, each user of the package structure needs its own private copy of the package table, because each Darwin package may have a different set of open packages from all other packages. Within a module system like Darwin, there is only one instance of each structure, so if we had written require to be closed over a package table, then every package that uses require would share that table. (This is, in fact, how plain Lua works.)

In order for each package to have its own package table, the require function looks up the package table in the current environment whenever it needs it, instead of referring to a local variable (an "upvalue"). But the local package table must be initialized before it can be used; hence the need for package.initialize.

For convenience, the Darwin versions of the require and module functions automatically call package.initialize if it has not already been called.

Important note:

The technique of linking package.preloadand package.loaders to Darwin is intended for use when loading code into the user package, such as during an interactive Lua session. Darwin modules should not use the Lua require function to load Darwin structures. Instead, the structures needed by a module should be declared in the open clause of the structure declaration.

A correct example is:

CODE EXAMPLE

structure.declare { name="geometry"; open={"_G", "matrix", "class"}; files={"geometry-objects.lua", "dim2.lua", "dim3.lua"} }

The files listed in this example may call the require function with any of the open structures as an argument ("_G", "matrix", or "class"). The default package.preload table is empty, and the default package.loaded table contains entries for all of the structures that Darwin opened, i.e. all of structures listed in the open clause. As a result, suppose "dim2.lua" included the statement: require "matrix".

The call to require will simply return package.loaded.matrix, because the matrix structure was already loaded by Darwin and opened in the geometry environment. Contrast this scenario with the following incorrect example.

INCORRECT CODE EXAMPLE

structure.declare { name="geometry"; open={"_G"}; files={"geometry-objects.lua", "dim2.lua", "dim3.lua"} }

Note that the declaration does not open the matrix or class structures. Of itself, this is not an error. However, let's assume that both of these structures have been declared, which was also the case in the prior (correct) example. Suppose one of the geometry files, such as "dim2.lua" included the statement: require "matrix".

The matrix structure is not listed in the open clause of the geometry structure declaration, so there will be no package.loaded.matrix entry. Recalling that the default package.preload table is empty, the require function will start looking for a loader for "matrix" using the package.loaders table. If it succeeds, the Lua module (or non-module) called "matrix" will be loaded. The geometry package may work correctly, but it will not be using the declared matrix structure.

Of course, it is possible to insert entries into package.preload such that requiring "matrix" will cause the matrix structure to be opened. However, this is not the intended way to use Darwin. The intended usage is to declare the structures needed by "geometry" in the open clause of the structure declaration for "geometry". The main reason to declare the needed structures is to permit static analysis of the dependencies among structures.

Darwin does no static dependency analysis today. As structures are opened, all dependent structures are opened. However, one of the benefits of a robust module system is that it is amenable to static dependency analysis. This is a benefit because it affirms the predictive power obtained by using the module system: A module depends only on the structures listed in the open clause and no others. This fact enables separate compilation of modules as well as predictable run-time behavior of modules, whether they are compiled or not.

See the section "Initializers of package-specific data" for a discussion.

Discussion

Initializers of package-specific data

Darwin promotes the run-time sharing of loaded package code. When a structure foo is declared and loaded, it can be used in many other packages and yet only one copy of the functions (and data) of foo exists in the Lua state. Any variables over which the functions of foo are closed (i.e. any "upvalues") are therefore shared by all users of foo. Since so many modules are stateless (containing no closures), this sharing property is often irrelevant. And when there are closures, sometimes the sharing behavior is in fact the expected behavior.

Consider a structure that contains a factory function, where every object produced by the factory has a unique serial number. A simple implementation would be to use a counter function implemented as a closure over a variable holding the previous count. No matter how many packages use the (same) factory, each package will receive objects with unique serial numbers, since the counter is shared.

For another example, consider a structure that maintains a pool of reusable objects, such as buffers or processes. The sharing behavior is the expected behavior: There is one instance of the structure and the pool is shared among the various users of the package.

However, there are times when a module's functionality requires that some data structures exist on a per-user basis. In other words, each user of the structure should see its own data set, and the functions in the module should operate on the data set of the caller. This is a classic factory design pattern, and it is not made transparent by Darwin. The module should be written to export a factory function that produces (and initializes) the needed data structures. The user of the module must call the factory before using module functions that need the data. Usually, the object produced by the factory is supplied as an argument to factory functions.

Consider again the example definition of the user package (see the section "Declaration of the user package"). In this case, the per-user data is the package table, and it is a singleton. That is, a user of the package module instantiates exactly one package table. Also, the package table has a standard location in th environment, namely the table called package. Consequently, package.initialize does not need to return a value, and the package table does not need to be supplied as an argument to functions like require. Because there is only one package table, the package module is written to store it in a well-defined place (a table called "package" in the current structure environment), and the module functions (like require) are written to look for it there.

What makes a good module?

Taken as a general question, the answer is out of scope for this document. However, there are some technical considerations to keep in mind when creating Lua modules with Darwin. We will examine these considerations using a case analysis on the type of the value being exported from a module.

First, note that modules typically export functions. Because each user of a module function in Darwin has a binding to the same function, any state held by the function will be shared across users. For example, a simple counter object can be made by creating a closure over a private index variable. (In Lua terminology, the index variable is an upvalueof the counter function.) Suppose such a counter is defined in a Darwin structure called counter. Two modules that both use counter will get unique values because there is only one counter function, and it has one index variable as its upvalue.

Next, let's look at tables. By default, when a structure exports a table, Darwin creates for the user a deep copy of that table. This seems to be the right thing to do when the table contains functions or tables of functions (etc.) because in most cases the module writer wishes to export those functions. However, as we have already seen, a table in Lua might implement an object (with state and methods) rather than just being a collection of functions. To accommodate this case, Darwin provides the objects clause in structure.declare.

When a structure is opened and Darwin encounters an object in that structure, Darwin does exactly what it does for functions: it creates a binding for that table (object) in the destination environment, the place where the structure is being opened. In this case, the table (object) is a single instance that is shared between all users of the structure. Unlike functions, though, tables are writable. Altering a shared table (object) alters it for all the users of the structure that exports the table (object). Keep this in mind when exporting objects.

Finally, exporting scalars has virtually no utility. Suppose a structure x exports a binding named foo that is bound to a scalar value, which in Lua includes numbers, strings, and booleans. Any structure that opens x receives a copy of the binding of foo. In other words, when you open x, you get a binding of the name foo to a scalar value, e.g. 5 or "hello". The code of the structure x cannot modify this binding.

The binding that the user sees for foo did not exist when the structure code was loaded, so there is no way for the structure code to access the user's binding for foo. Now, the structure code might modify its own binding for foo, which is the binding in the structure environment. That binding is not visible to the user.

A diagram here would really help.

We have a situation in which (1) the code in structure x can write to its own binding for foo, which cannot be read by the user; and (2) the code in structure x cannot write to the user's binding for foo, which can be read by the user. In other words, the structure code and the user code cannot communicate through scalar bindings.

It appears that the only use case for exporting scalar values is to make constants readable to the user code. For example, the math package exports math.pi. The same reasoning can be applied to a structure that exports some data about how it is configured, e.g. a matrix package might export matrix.maximumdimension.

To share state between all users of a structure, you can export a binding to container (which in Lua means a table).

To provide private state to each user of a structure x, the "factory" pattern should be used. To follow this approach, x should export a "factory function" that creates new instances of whatever private state is needed. Each user of x must call the factory, which returns some object. In some cases, such as Darwin's implementation of the package table, the factory is only called once by each user. In that case, the factory could simply be named initialize. In other cases, such as in a graphics rendering environment, a factory function might produce canvases, and the user code would call the factory each time it wanted to create a new canvas.

Darwin's require, module, and seeall functions

Darwin was designed to be independent of Lua's module system, but to work with it. Within a Darwin package, the Lua module capabilities behave just as they do without Darwin. That is, the package table and the require and module functions work as expected, except their scope is no longer global — it is limited to the environment of the package in which they execute. This feature of Darwin allows existing code to be used without the need to declare any Darwin structures for it.

Moreover, because the existing Lua module code is confined to the package in which it is loaded, the existing code cannot inadvertently affect other Darwin packages. Any odd effects of using existing Lua module code (namespace collisions, namespace leaks, etc.) can still occur, but only locally, in the package in which the existing code is used.

In order to achieve this isolation for Lua's module system, Darwin provides its own implementation (in Lua) of the require, module, and package.seeallfunctions. The plain Lua versions of these functions are closed over a single global environment (LUA_GLOBALSINDEX) and a single package loaded table (stored in the LUA_REGISTRYINDEX at "_LOADED"). The Darwin versions of require, module, and seeallare intended to be faithful ports of their C language counterparts into Lua, with only one difference: The Darwin versions access the package table from the current structure environment. The result is that each Darwin package has its own package table.

By default, Lua's module capabilities do not interact with the Darwin system. As we saw earlier in the section called "Darwin and Lua's package table", however, the two can be easily linked so that Lua's package.loaded table will reflect open structures, and Lua's package.preloadtable will reflect available structures that can be opened. Observe that this linkage (via metatable index) is on a per-package basis. The Darwin user can choose for some packages to open the structure modnamewhen require(modname) executes, and for other packages to ignore the structure modname, in which case require functions as it does in plain Lua.

In all cases, the Darwin design goal remains to give the user full control of the environment in which code runs.

Darwin modules and garbage collection

Darwin module declarations and instances (i.e. loaded modules) will not be collected unless the module is deleted with structure.delete. Deletion of structure x does not affect the instances created by other structures which have already opened structure x. If the declaration is deleted, then no other structure can open the module.

For a short discussion of collection of unneeded bindings, see the "Weak imports" sub-section of the section called "Planned enhancements".

Searching for structure declarations

We do not recommend using any mechanism that searches the file system for structure declarations. The loading of declarations is meant to be a kind of administrative task in which the run-time environment for code is defined (and constrained). It would weaken the assurances provided by the Darwin module system if structure declarations were "found", because they could specify anything — even opening up security holes or in general allowing undesired capability leaks.

The intended use of Darwin in a production system is that Darwin will be loaded first, and Darwin finishes by establishing the environment that will hold user code. This is the user structure by default. Next, one or more files containing structure declarations are loaded. Finally, user code is loaded into the environment crafted especially to contain the user code. The user code runs in as free or as constrained an environment as is established for it.

In a production system in which there is no additional "user" code loaded, the user structure may not be needed at all. Or, the user structure can contain the "driver" code that starts the system running with the right input parameters.

Note on starting Darwin

The Lua command line accepts an argument to load libraries, and so "-ldarwin" may be specified on the command line. For interactive sessions, the command "lua -ldarwin -i" achieves the right effect.

The LUA_INIT environment variable can also be used to load Darwin. See the documentation for the Lua command for more information.

Comparison to other module systems

Scheme48

To do.

SML

To do.

Java

To do.

Darwin Reference

There is one part of Darwin with which you need to interact: Darwin provides a set of functions for declaring and manipulating structures.

If you want to customize the set of initial structures provided by Darwin when it starts, then you need to modify the file "initialstructures.lua", which defines all of the initial structures, including the user structure.

Functions

The following table summarizes the functions provided by Darwin. Key functions are documented further in the subsections that follow. Most users will only ever use the first 6 functions, and possibly only the first one: declare.

Function

Arguments

Returns

Description

declare

see below

—

see below

load

name (string)

true (boolean)

see below

open

name1, ...

environment (table)

open, loading if necessary

name

name or nil

close and delete bindings

delete

name

name or nil

delete from global structure list

signature

name

signature (table) or nil

see below

isdeclared

name

true or nil

is name in global structure list?

isloaded

name

true or nil

is name loaded?

isopen

name

environment or nil

is name open in current environment?

currentpackage

—

name

name of current package

currentenvironment

—

environment

environment of current package

currentopentable

—

(table)

analogous to package.loaded

preloadtable

—

(table)

see below

instructure

name [, code]

(values)

see below

declared

—

(table)

list of declared structures

getpath

—

(string)

search path for files clause

setpath

path

—

set the files search path

initialize

name

true

see below

structure.declare

To declare one or more structures, call structure.declare with a table argument. Multiple structures that share an implementation can be declared at once, in order to provide different "views" on the same underlying code. The table entries in a structure declaration are called clauses, and there are two categories of clauses: Name/signature/location clauses, and code clauses.

Name/signature/location clauses

When declaring multiple structures, you must provide a structures clause whose value is a table of name/signature/location clauses, e.g.

CODE EXAMPLE

structure.declare { structures= { { name="tableprint", signature={"tprint", "tprintf", "tdump"} }; { name="tableprint_min", signature={"tprint"}, location="." }; { name="tableprint_full", location="tableprint" }; }; open={"_G", "table"}; files={"tableprint.lua"}; }

In this example (above), three structures are defined. They are three different views of the same code. They literally share the same code, i.e. the file "tableprint.lua" is loaded once. Users of tableprint may open any of the declared structures: "tableprint", "tableprint_min", or "tableprint_full". Opening "tableprint" yields the three functions listed in the "tableprint" signature: tprint, tprintf, and tdump. Since there is no location clause for "tableprint", these functions will be located in a table with the same name as the structure, i.e. tableprint.

By contrast, opening "tableprint_min" yields just one function, tprint(the one name listed in the signature for "tableprint_min"). And tprintwill appear in the global environment, as specified by the location value "." for "tableprint_min".

Finally, "tableprint_full" provides the entire set of global bindings from the file "tableprint.lua", because no signature is specified. Presumably, there are more functions than the ones already named. Perhaps there are some functions or objects that are used to configure how tables are printed by default with tprint. When "tableprint_full" is opened, all of the global bindings from the file "tableprint.lua" will be visible in the tableprinttable of the global environment (again due to the location clause, which overrides the default behavior of naming the table after the structure, which in this case would be tableprint_full).

Darwin's declare function allows a syntactic shortcut to be taken when declaring only one structure, which is the most common case. The shortcut is to drop the structures clause entirely, and put the name, signature, and location clauses at the top level of the declaration. The other examples in this document use that more compact syntax. Remember that signature and location are optional.

Code clauses

A set of "code clauses" specify the code that creates a Darwin structure. All of the code clauses are optional, and they may be used in any combination. The following list defines each clause; the list reflects the order in which Darwin processes code clauses when loading a structure.

open={m1, ...} Starting with an empty environment, Darwin opens each of the structures m1, ... in order. If two or more structures export the same name to the same location, a warning is displayed and the last binding will be used.
pre="..." When present, this string value is evaluated as code in the current structure environment, i.e. an environment in which m1, ... are open.
files=f1 or files={f1, ...} The files clause accepts a single string or a list of strings as a value. Each string is a file name or part of a file name. Following the pattern established by Lua's path and cpath variables, Darwin looks for f1, ... by substituting f1 (and the others in turn) for the '?' character in each part of the Darwin path (see below).
post="..." When present, this string value is evaluated as code in the current structure environment, i.e. the environment after the files clause is processed.
environment="..." When present, this string value is evaluated as code in the current structure environment, i.e. the environment after the post clause is processed. The environment code must return a table. That table becomes the structure environment. It is from this table that bindings are copied when the declared structure is opened in another module. When the environment clauses is absent, the structure environment is the environment that was present after the last code clause was processed, whether it was the post clause, the files clause, the pre clause, or the open clause.

The Darwin path variable specifies how Darwin finds files that are listed in the files clause. You can examine the variable by called structure.getpath() and set the variable by calling structure.setpath(newpathstring). Here is an example showing the default value and how to change it. Note that the default path is created from Lua's package.path. The ".lua" extension is removed from each pattern, and "?;" is prepended so that Darwin will first treat a file name as if it were an absolute file name, and only if that fails will other options be considered.

TRANSCRIPT EXAMPLE

Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio Darwin 1.0.2 Copyright (c) 2009 James S. Jennings Current package is 'user'. > structure.getpath() > =structure.getpath() ?;./?;/usr/local/share/lua/5.1/?;/usr/local/share/lua/5.1/?;/usr/local/lib/lua/5.1/?;/usr/local/lib/lua/5.1/? > structure.setpath("?") > > =structure.getpath() ? >

structure.load

The load function, like most of the functions exposed in structure, is not normally needed when using Darwin.

Structure declarations are lazy in that they do not execute the code that is specified in their code clauses. Instead, that code is executed when a structure is opened for the first time. The process of executing the code specified in the code clauses, which creates the structure environment, is called "loading" the structure.

A structure is only loaded once, i.e. the code specified by the code clauses is executed only once, ever.

The load function is provided so that you can control when a structure is loaded, if you wish. Loading a structure does not also open it. The argument to load is a structure name, and the return value is always true (else an error is thrown).

structure.signature

The signature function takes a structure name as its argument and returns a (fresh) table that is an array1 of the names of the bindings exported by that structure. The array returned by signature is essentially a copy of the actual structure signature. Therefore, you cannot modify the signature of a structure using this function.

See structure.declare to learn how a single structure environment can be the basis for several different structures, each with their own signature.

Note: signature returns nil if and only if a structure is declared without an explicit signature and the structure has not been loaded.

structure.currentopentable

This function returns the table of structures which are currently open, i.e. open in the current structure. The table has the same format as Lua's package.loaded.

N.B. This table is the actual table used by Darwin. Modifying its contents will produce unexpected and probably undesirable behavior. This is a weakness that will be addressed in a future release.

The table of open structures returned by currentopentablehas one intended use: By setting the __index meta-method on package.loaded to this table, Lua's require function will see all of the open structures. Therefore, require(x) will return currentopentable[x] if x is not already in package.loaded.

The code example below is the Darwin declaration for user structure (which may be customized as you wish). This declaration opens some of the standard libraries and also links package.loaded to the current open table and package.preloadto the table of declared Darwin structures. (See structure.preloadablefor more information on the latter.)

CODE EXAMPLE

structure.declare { name="user"; open={"_G", "structure", "package", "debug", "table", "io", "os", "string"}; pre= [[ package.initialize(); setmetatable(package.preload, {__index=structure.preloadtable()}) setmetatable(package.loaded,{__index=structure.currentopentable()}) ]]; }

structure.preloadtable

The preloadtable function is similar to the currentopentablein that it is provided only to enable a linkage between Darwin and Lua's require function. Since require uses the loaders in package.loaders, and the first entry in that table is (by default) a function that looks in package.preload, Darwin provides a "preload table" of its own that has the same format as the first default loader expects to find in package.preload: Each key is a structure name, and each value is a function which provides that structure.

In the Lua terminology, the loader function found in package.preload[x] "loads" the module x. In the Darwin terminology, the function found in preloadtable()[x] "opens" the structure x, loading it if necessary.

In fact, every entry in preloadtable() has the same value, which is the function structure.open.

See the example in structure.currentopentableto see how the table returned by preloadtableis intended to be used.

Note that you don't need preloadtable — you can simply add entries to package.preloadin order to enable Lua's require function to automatically open declared structures.

structure.instructure

This function is only for use during development and debugging. It allows you to break through the walls that separate structures and execute code in the environment of another structure. With instructure, you can run code in another structure's environment and you can change the structure of the current thread to another structure.

The first argument is a structure name, and the second argument is a string that will be evaluated as code in the environment of the named structure. There are three common scenarios during development that can be addressed with instructure:

To retrieve a value directly from the structure environment, as shown in the transcript example below, where we accidentally removed our binding to dofile, but we got it back by accessing the binding in the _G structure.
To access an unexported binding from a structure's environment, which is useful when debugging.
To modify values (including function definitions) in the structure environment — but in general you will have to close the structure and then open it again in order to see the effects of the change.

TRANSCRIPT EXAMPLE

> dofile=5 > =dofile 5 > dofile=structure.instructure("_G", "return dofile") > =dofile function: 0xe32700 >

In this example, we show the other use of instructure, which is to change the environment to one of another structure. Note that the prompt changes when entering the configstructure. This is due to the statement _PROMPT="config> " in the file "initialpackages.lua". When developing and debugging, this is a very useful trick that you can use with your own structures as you wish.

TRANSCRIPT EXAMPLE

> structure.instructure("config") config> =math table: 0xff85c0 config> =structure table: 0x1000ef0 config> =originalluapackage table: 0xffa930 config> structure.instructure("user") > =math nil > =structure table: 0x1020140 > =originalluapackage nil >

structure.initialize

The initialize function initializes Darwin. You can see how it is used by looking in the file "initialstructures.lua", which is discussed later in this document. Darwin takes the initial environment (the one into which Darwin is loaded) and turns it into a structure. The name of that structure is the argument to initialize. Darwin's default configuration names this structure "config". The "config" structure is really just a container for everything that's there when Darwin is loaded, and this structure is not meant to be accessed except, perhaps, during debugging.

An attempt to open the initial package will throw an error.

darwin.initialstructures

The file "initialstructures.lua" contains the declarations for all of the structures that will be available when Darwin finishes loading. This file initializes Darwin, defines the signatures of structure and _G, converts the standard libraries into structures, and declares the user structure.

The file "darwin.lua" switches the running thread into the user structure and prints a copyright and version number for Darwin.

Customizing "initialstructures.lua" allows you to create whatever set of structures you want to be available after Darwin loads. E.g. you can restrict the signature of structure to expose fewer functions; you can omit or add libraries in the set of initial structures; etc.

Known bugs, limitations, and planned enhancements

Known bugs

None. Surely there are bugs. Just not known bugs. Yet.

Limitations

Code change needed to support MacOS

Darwin's versions of some of the functions in package.loaders will fail on MacOS. The reason is that Lua's package.configstring does not provide all of the characters needed to replicate the functionality of the functions in the Lua package library (which are implemented in C).

Here is an excerpt from the file "darwinpackage.lua":

CODE EXAMPLE

local DIRSEP = string.sub(originalluapackage.package.config, 1, 1) local PATHSEP = string.sub(originalluapackage.package.config, 3, 3) local PATHMARK = string.sub(originalluapackage.package.config, 5, 5) local EXECDIR = string.sub(originalluapackage.package.config, 7, 7) local IGMARK = string.sub(originalluapackage.package.config, 9, 9) -- These values are #defined and should be part of package.config local OFSEP = "_" local POF = "luaopen_" -- !@# NOT PORTABLE, e.g. on MacOS will fail

As you can see, the last two items, OFSEPand POF, are not part of Lua's package.config. It's possible for Darwin to make a run-time check to determine the OS type, but that would require either a special library or at least the use of the ospackage, which may or may not be present in a given installation.

Note: A work-around, such as a run-time check, should be added to Darwin so that MacOS users will not have to edit "darwinpackage.lua" in order to try out Darwin.

No explicit support for renaming exports

Some module systems, like the one in Scheme48, provide a way to export a binding under a different name, or even to rename all the exported bindings (e.g. by adding a prefix string). Darwin could, but does not, provide such a feature. Because Lua uses tables for environments, operations such as renaming can always be done within a structure declaring by supplying a post code clause containing code that operates directly on the environment.

For example, consider the following declaration:

CODE EXAMPLE

structure.declare { name = "verbose_trigonometry"; location = "vmath"; signature = { "sine", "cosine", "tangent" }; open = { "math" }; post = [[ sine = math.sin; cosine = math.cos; tangent = math.tan; ]]; }

With this declaration, we can use verbose names for the trigonometry functions as follows:

TRANSCRIPT EXAMPLE

> structure.open "verbose_trigonometry" > =vmath.sine(3.1416/4) 0.70710807985947 > =vmath.tangent(3.1416/4) 1.0000036732118 >

Recall that the location clause of a structure declaration is used to relocate a structure's exports to a different table name or to top-level. If we wanted to build a variation of the math package that exported different function names (such as the verbose ones in the example above), we would write location="math" in our structure declaration.

No explicit support for importing a subset of bindings

At times, you may wish to import only a subset of the bindings exported by a package. For example, your code for binary search might use the function math.floor, so the structure containing your code must open the math structure. Although you only need one that one function, you will get a copy of all of the bindings in math. While these do not take up much space, the unneeded bindings are indeed wasting space which may be scarce.

Following the example in the previous section, you can devise a work-around in which you define a structure whose only export is the math.floor function. However, it would be convenient if Darwin provided a way to import a subset of the bindings exported by another structure. This is the subject of the one of the planned enhancements below.

Planned enhancements

Over and above the weaknesses already noted in this document, the following sections mention in brief some planned enhancements to Darwin.

A Meta-Module Protocol?

There are many similarities between a module system and an object system. For example, the dependency graph among a set of modules is like the inheritance graph of classes in a class-based object system with multiple inheritance. Other similarities derive from that observation, such as the parallel between method lookup and choosing the right binding when there are multiple choices. (In Darwin today, it is illegal to import the same name from multiple packages, although this is not detected in the current implementation.)

There appears to be some opportunity to apply the concept of a meta-object protocol here. (Cite AMOP here.) The result would be a module system whose properties could be altered by the module designer, i.e. the person building modules for use by themselves and others. Because Darwin can run inside of Darwin, we are already prepared for enabling one designer's customized module rules to co-exist with other modules that have the default (or some other) behavior.

Behaviors that might be customizable with such an enhancement include such things as:

Post-processing of the table of bindings produced when a structure is opened. A custom function plugged into Darwin for post-processing could make the table weak (see "Weak imports" below); or it could perform per-user initialization steps; etc.
A customizable file search function (or table of functions, like package.loaders) could replace the current search path capability. The default search function could, of course, be the exactly the path-based searcher that is in Darwin today.
If a custom wrapper around structure.open procedure could be inserted, then Darwin could be extended by its users to do things like allow aliases for structure names, allow "compound structure names" (where one name stands in for several structures), etc. A custom wrapper for open would also facilitate run-time checking of access rights to a structure, so that access permissions could be declared.
Etc. The examples above illustrate the idea, and it is easy to take this idea and run with it.

Weak imports

When a structure, x, is opened, the user gets a copy of the bindings exported by x. However, the structure that uses x may not use all of the bindings exported by x. The Lua mechanism of weak tables can be employed to automatically remove (during garbage collection) all imported bindings that are not used.

The implementation would require another declaration clause that indicates which structures should be weakly opened. When you open such a structure, you will get a weak table of bindings whose values will be removed by the garbage collector when they cannot be accessed from your code.

The benefit of this approach is that you do not need to declare which functions of x you are going to use. You use the ones you need, and the bindings for the rest will be collected.

The disadvantage to this approach is that "reflective code" can fail. By "reflective", I mean code that examines its own environment, e.g. f=_G[name]. Since the value of the variable name cannot, in general, be known, this assignment statement may access any entry in _G. If the author of this statement assumed that _G would forever contain exactly the elements that were there when _G was imported, then she may be surprised to learn that the garbage collector removed unused bindings. In other words, if name is the name of a function which is not otherwise used in the author's code, then it is not able to be accessed except through _G. If _G were a weak table, such a binding would be collected.

If weak environments are supported, then Darwin will support a way to avoid this behavior and obtain regular (non-weak) tables when needed, so that "reflective code" will work as expected.

Support for importing a subset of bindings

A slightly different use case can be solved with a declaration that tells Darwin that you want only a subset of the bindings of structure x, and not all of them. Here, the use case may be motivated by conservation of memory, but there is a more general reason. Examining declarations statically, one can see the dependencies between structures. For static analysis, we want an explicit declaration that structure x is opened but only some of its bindings are used.

Static dependency analysis could be used for a variety of purposes, but a simple one is to identify commonly-needed subsets of a structure. A commonly-used subset of a structure is a candidate to become a separate structure.

Functors

A powerful feature of the SML module system is the concept of a functor. A functor is a function that accepts modules as arguments and produces a module.

When functors are explained in the SML literature, a common example arises when you have a choice of data structures to use in your code (as you write a module), and so you decide to write a parameterized module which takes another module as an argument.

For example, suppose I am implementing a module that renders three-dimensional geometric shapes on a two-dimensional canvas such as a computer screen. I need a fairly standard set of matrix operations in order to write my code. Suppose I can use any library that lets me create, multiply, invert, and take the determinant of matrices.

If I write my rendering module to accept a matrix module as an argument, then my rendering module can be instantiated (declared, in Darwin terminology) with a matrix module supplied at declaration time. Of course, the matrix module must support the needed operations, and use the expected names for those functions. (But we know how to write a declaration that renames exports, so the latter is not really an issue.)

With this approach, when I declare my rendering module as a Darwin structure, I must supply a matrix structure as an argument. Today, I will use the basic matrix module I wrote for myself, which is not optimized in any way. In the future, when my rendering module is working correctly and has all of the functions I need, I may look for a faster matrix module that can substitute for the one I use today.

I change the declaration of my module, supplying a new matrix module as the argument, and I get a version of my module that uses the new matrix code.

It is possible to get almost exactly this effect using Darwin today, without functors. You can declare that Darwin opens the matrix module, and then use different declarations for matrix at different times. Today, my matrix structure declaration would refer to my own code, but tomorrow I can replace the declaration for matrix with one that refers to a new library that I obtained from someone else.

The difference, at least in this example, is subtle and not especially compelling. But consider a different scenario, in which we have a module that implements a fairly generic capability such as lexicographic sorting. A sorting module needs objects to sort. We might want to sort vectors one day, strings another day, payroll records the next day, etc.

If the sorting module is parameterized, i.e. it accepts another module as a parameter, then we can make a "payroll record sorting module" by providing the payroll module as the argument. Similarly, a "vector sorting module" can be made by providing a vector module as the argument.

The sorting module will require that its argument (e.g. the payroll module) supports the set of functions needed to do sorting, such as a comparison function that maps payroll records A and B to true if and only if A precedes B.

Using a parameterized sorting module (a functor), we can have a system in which modules which do no sorting can open the "plain" payroll module, and modules that do sorting can additionally open a module produced by the sorting functor which provides the capability to sort payroll records.

Similarly, exactly the same sorting functor can be applied to a vector module to produce a module that sorts vectors. Etc.

Note that, in SML, the power (and the complexity) of functors is more apparent because of SML's sophisticated type system. Even though type-related benefits would not materialize for Lua, it is possible that the ability to parameterize a module will be useful nonetheless.

Consider the ability to parameterize a module with arguments that are not other modules. For example, a "logging level" parameter could be provided when a Darwin structure is declared. This is a declaration-time action that can affect what the module code does when it is loaded. The module code could, for example, examine the logging level parameter and use it to conditionally define functions.

A module declared with a logging level of zero might be much smaller (in memory) and faster than the same module declared with a high logging level.

Other "compile-time" optimizations are possible in this way. For example, a matrix module could take a "maximum dimension" parameter in its declaration. Depending on the maximum dimension, the module might choose one implementation over another for some functions. For example, determinants can be very quickly calculated in 3 dimensions without resorting to a generic algorithm that works on higher dimensional matrices.

Bibliography

To do.

Cite AMOP, Scheme48, SML, Java, "Java Has No Module System"

The Darwin module system for Lua

Table of Contents

Introduction

Rationale

Design goals for Darwin

The binding for the name Darwin

Design

What is a module?

What is a structure?

Signatures

Module code

Turning Lua code into Darwin modules

Lua modules defined with the Lua module function

Lua modules defined without the module function

Lua modules defined by a file of code that returns a table

General Lua code

Special treatment for objects

Darwin and Lua's package table

Discussion

Initializers of package-specific data

What makes a good module?

Darwin's require, module, and seeall functions

Darwin modules and garbage collection

Searching for structure declarations

Note on starting Darwin

Comparison to other module systems

Scheme48

SML

Java

Darwin Reference

Functions

structure.declare

structure.load

structure.signature

structure.currentopentable

structure.preloadtable

structure.instructure

structure.initialize

darwin.initialstructures

Known bugs, limitations, and planned enhancements

Known bugs

Limitations

Code change needed to support MacOS

No explicit support for renaming exports

No explicit support for importing a subset of bindings

Planned enhancements

A Meta-Module Protocol?

Weak imports

Support for importing a subset of bindings

Functors

Bibliography