Re: Announce: Darwin module system
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Announce: Darwin module system
- From: Jim Jennings <jennings.durham.nc+lua@...>
- Date: Mon, 7 Sep 2009 11:00:41 -0400
On Sun, 6 Sep 2009 22:27:03 -0400, Patrick Donnelly <
batrick@batbytes.com> wrote:
> Can you elaborate on key problems you are trying to solve and how you
> solve them?
Sure.  The documentation at 
http://lua-users.org/wiki/JimJennings describes the design, but does not answer your question about the problem I'm trying to solve.  And regarding the "how" question, the doc needs some pictures to make clear how it works.  So I'll give it a go here, but without pictures.  
In this discussion, I'm talking about Lua code, although some of the issues apply to native (C) libraries also, because library functions are exposed as entries in a Lua table, just like Lua modules.
I'll start with an example of the one Darwin function that you need to use if you want to use Darwin:  structure.declare.
structure.declare { name="pair";                    signature={"cons", "car", "cdr", "isnull", "null"};
                    open={"_G"};                    objects={"null"};
                    files="list.lua";                 }
The example above declares a module, which in Darwin is called a structure.  It gives a name, "pair", which might be different from the module name used in the source code file.  That file is "list.lua".  There can be multiple files, by the way.  The "open" clause lists the modules needed by "pair", of which there is only one: _G.  (Darwin turns _G and the other standard libraries into structures.)  If you don't specify a signature, then when you open "pair", you will get every global definition from "list.lua", which might be a lot of stuff.  In this example, an explicit signature is provided, so any code that uses "pair" will only see cons, car, cdr, isnull, and null.
Other modules use "pair" by listing "pair" in their own "open" clauses.  For interactive use, you can call ' structure.open "pair" '.  Or, you can simply issue ' require "pair" '.  (The enhancement to the behavior of "require" is configurable.)
The "objects" clause is not often needed.  It's described in the documentation.  See if you can figure out what it does before you look.
Problems that Darwin tries to solve:
(1) Isolation between modules
Suppose I load Lua modules A and B, where B uses the functions of A.  If my code modifies A, I can easily break B.  I could redefine a function in A that B needs, for example.  The scenario is more difficult to debug and recover from if it's not my code that changes A, but instead it's the code of another module, C, that does so.  The modification to A might be intentional or accidental.
When someone wrote B, they made a decision to use module A, and they wrote B with the expectation that A would behave the same way all the time.  We can engineer an environment in which that assumption is true.  When we load B and it requires A, we can provide B with a *copy* of A.  Now, we don't want to copy the code of A, just the bindings to A's functions.  If you think of a binding as being like a pointer, we are giving B pointers directly to A's functions.
Darwin does this for every module that uses A.  So when module C requires A, C gets a copy of the bindings to A's functions.  Suppose A is the math module.  C can redefine a math function, e.g. executing "math.cos = function(theta) ...", but the redefinition only affects the binding that C has.  In other words, C now has a binding for math.cos that refers to its own cosine function.  But the other modules, like B (as well as the "main" non-module code) have their own math.cos binding that continues to point to the original math.cos function.
Naturally, the copies bindings take up some space, but not too much.  They are "pointers" into the heap, not copies of the functions themselves.  (I think that a few changes to the Lua implementation could enable a more space-efficient implementation, but I need to give this more thought.)
There are other issues around isolation as well, including some that were recently pointed out on this list.  One that springs to mind is that if module D uses "strict.lua" and module E is defined the using the module function and package.seeall, then E is also subject to the restrictions of "strict.lua", which may break E.  David Manura pointed this out on this list on 13 Aug 2009.
One more example: It is well-known that package.seeall exposes a "hole" in a module namespace, making it possible to access (read and write) the global environment through a module's table.
(2a) Namespace control: Imports
There are many ways to write Lua modules.  I like the ones that return the module table, so that I can choose the name for the table of functions provided by the module, e.g.
rtype = require "recordtype"
Unfortunately, not too many modules are written this way, because it can be awkward to do so.  A lot of modules create a table with a name that is fixed in the source code for the module.  In fact, modules that use the Lua module function do this.
I could write a loader that loads Lua modules wherever I want them to show up, so that I could re-use existing module code without having to edit its source.  Darwin includes exactly such a feature, so that the same module source code can be used by different people with their functions appearing in different tables.  E.g. person P might load the lfs library into a tabled called "luafilesystem" because P has already written a module called "lfs".  Meanwhile, person Q accepts the default location for lfs functions.
With the isolation described above, two modules in the same Lua state can use lfs in both of those ways at once.  I.e. we can have one copy of lfs loaded, as well as modules P and Q loaded, where P uses lfs but sees its functions in a table called "luafilesystem", while Q sees the same functions in a table called "lfs".  No one had to modify lfs to do this.
(2b) Namespace control: Exports
Several of the common ways to write Lua modules, including the way that uses the Lua "module" function, expose all of the non-local definitions to the user of the module.  If I am the module author, I have to be careful not to accidentally leave a private function exposed.  Darwin let's you specify an explicit list of exports if you want to.  Otherwise, all the globals are exported.
But catching accidental globals is not too hard to do.  What's hard to do with the current module system is to provide multiple ways of using the same module.  For example, I might write an object system that has two kinds of functions: (i) the set of functions that all users need almost all the time; and (ii) a set of functions that alter the behavior of the object system itself.  The functions in (ii) are only needed by a few users, and furthermore are only needed when setting up the object system.
Darwin allows you to declare multiple "views" on the same code.  My object system can be written in the "normal way", i.e. so that it can be used with plain Lua.  All the functions will be available.  But I can make a Darwin structure declaration that contains two distinct sets of exported functions: (i) the normal-use functions, and (ii) the customization functions.  Each set has its own module name, e.g. "LuaObjectSystemEnhancedRuntime" and "LOSER_customization".  When someone uses my object system (LOSER), they can open one or both modules.  Using either one will cause the code for LOSER to be loaded.  If you open LuaObjectSystemEnhancedRuntime, you will only see the normal-use functions.  If you open LOSER_customization, you will see only the customization functions.  You can now open the latter module in the code that sets up your run-time environment, and you can open the former module in code that creates and uses LOSER objects.  The benefit is the knowledge that your "regular code" has no access to the customization functions, so you know you cannot accidentally alter the behavior of LOSER in your code (or in code loaded by your code, or in code written by others, etc.).
(3) Declarative dependency specification
Once you have more than a handful of modules, some questions become both important and also hard to answer.  One is:  Which modules does module R require?  As users of Lua Rocks know, it's useful to have a declarative specification for dependencies.  The module systems that I like (from other languages) force the module author to list the dependencies of her/his module.  Darwin does the same thing, but it also enforces it.
When Darwin loads the source code for module S, it loads S into an environment that contains all of the declared dependencies of S, and nothing else.  If you forget to list a dependency, S will fail (while loading, while testing, or while running, depending).
What you cannot do is be tripped up by load order.  Suppose I always load modules T, U, and V when I start Lua.  With Lua Rocks, as far as I know, I could forget to list some dependency (such as T, U, or V) for my module S.  I can use S through Rocks successfully, as can anyone else who happens to have the missing dependency, say T, already loaded.  Eventually, someone may load S before T (or use S without T) and get an error.
With Darwin, that error happens as soon as you declare the dependencies for S and then try to use S, because S is not loaded into the global environment (where T may or may not exist).  Darwin loads S into its own private environment containing exactly the modules that S depends on, and no others.
By the way, Darwin converts _G and the other standard libraries (coroutine, debug, io, etc.) into modules (which Darwin calls "structures").   A uniform treatment of all libraries/modules, whether standard or not, is a good thing.
(4) Backwards compatibility
Ok, this is not a problem I'm trying to solve, but it is a design goal.  I don't want to replace the dynamic and flexible module system that Lua has today.  I just want to add a new way of using existing and new modules -- a way that achieves the goals I've discussed here, as well as some others.
With Darwin, the Lua package library continues to work as it always has [1].  Your "main" code and the modules that you load can all use "require" and the features in the Lua package table.  If you do not declare any structures, everything will continue to be dynamic and you might fall into any of the traps discussed here as "problems I want to solve".
When you want to isolate a module to protect it from intentional or accidental modification by other code, simply declare a structure that names the module and tells Darwin what file to load.  (You can even use "require" to specify what Darwin should load -- you don't need to provide an absolute file name.)  Suppose we make a declaration for module X.
The next time your code (or the code of another module) issues ' require "X" ', Darwin will automatically provide X to whoever called "require".  But the X will be a copy of the bindings for X.  In other words, by declaring that X is a structure, we have enabled Darwin to make X available by creating a fresh table of the bindings for X for whoever calls ' require "X" '.  Now no one can accidentally modify X or interfere with another module that uses X.
Because of the backwards compatibility with existing code, you can make structure declarations for some or all of the modules you use.  The module authors can release new versions, and you can use those new versions in most cases without changing your structure declarations.  So you can have the benefits of run-time isolation and namespace control (etc.), even though the module authors know nothing of Darwin.
Of course, if module authors use Darwin, they can include a structure declaration with their code.  (The data in a declaration is just a Lua table.)  That way, users can accept the default declaration if they wish to use the module with Darwin.  Or, the user can alter the declaration to suit their needs.  Since the declaration is separate from the code that implements the module, this is easy.
Despite the length of this email, I'm sure there are things I am forgetting.  The recent discussion on lua-l about the complexity of using modules is another example of something that Darwin can help with.  If you take a set of dynamically-loadable modules that you want to use, and declare them as Darwin structures, you will have declared the dependencies (which you'd also do if you wrote a Rock spec).  But you've also done more than that.
You now have a set of declarations that will work (in Darwin) regardless of load order, regardless of the mix of Lua and C functions, and regardless of the module table and function names chosen by the module writers.  If you are embedding a "complete Lua system", you can tell Darwin to set up the user environment (for user code) in any way you want.  Maybe your platform initialization code needs access to some low-level configuration modules, but the user code should not see those modules at all.  That is trivial to do with Darwin, because you get to declare the "user" structure (where user code runs) in the same way that you declare any other structure.
Let me know if this has answered your question.  At least I've now produced some more text for the documentation.  :-)
Jim
[1] Except for a cross-platform issue regarding the naming of exports from native libraries... see the documentation under "Limitations".  I can't fix this without one of: (i) a change to the Lua implementation, (ii) writing a trivial C library that will have to be part of Darwin, or (iii) adding a work-around that checks the os type at run-time.