On Sun, Jan 3, 2010 at 3:27 AM, Andrew Straw <str...@as...> wrote: >> > Typically, the dependencies only depend on the smallest subset of what > they require (if they don't need lapack, they'd only depend on > python-numpy-core in your example), but yes, if there's an unsatisfiable > condition, then apt-get will raise an error and abort. In practice, this > system seems to work quite well, IMO. Yes, but: - debian dependency resolution is complex. I think many people don't realize how complex the problem really is (AFAIK, any correct scheme to resolve dependencies in debian requires an algorithm which is NP-complete ) - introducing a lot of variants significantly slow down the whole thing. I think it worths thinking whether our problems warrant such a complexity. > > Anyhow, here's the full Debian documentation: > http://www.debian.org/doc/debian-policy/ch-relationships.html This is not the part I am afraid of. This is: http://people.debian.org/~dburrows/model.pdf cheers, David
On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau <cou...@gm...> wrote: > Buildout, virtualenv all work by sandboxing from the system python: > each of them do not see each other, which may be useful for > development, but as a deployment solution to the casual user who may > not be familiar with python, it is useless. A scientist who installs > numpy, scipy, etc... to try things out want to have everything > available in one python interpreter, and does not want to jump to > different virtualenvs and whatnot to try different packages. What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc. This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution. On another note, I hope toydist will provide a "source prepare" step, that allows arbitrary code to be run on the source tree. (For, e.g., cython->C conversion, ad-hoc template languages, etc.) IME this is a very common pain point with distutils; there is just no good way to do it, and it has to be supported in the distribution utility in order to get everything right. In particular: -- Generated files should never be written to the source tree itself, but only the build directory -- Building from a source checkout should run the "source prepare" step automatically -- Building a source distribution should also run the "source prepare" step, and stash the results in such a way that when later building the source distribution, this step can be skipped. This is a common requirement for user convenience, and necessary if you want to avoid arbitrary code execution during builds. And if you just set up the distribution util so that the only place you can specify arbitrary code execution is in the "source prepare" step, then even people who know nothing about packaging will automatically get all of the above right. Cheers, -- Nathaniel
On Sun, Jan 03, 2010 at 03:05:54AM -0800, Nathaniel Smith wrote: > What I do -- and documented for people in my lab to do -- is set up > one virtualenv in my user account, and use it as my default python. (I > 'activate' it from my login scripts.) The advantage of this is that > easy_install (or pip) just works, without any hassle about permissions > etc. This should be easier, but I think the basic approach is sound. > "Integration with the package system" is useless; the advantage of > distribution packages is that distributions can provide a single > coherent system with consistent version numbers across all packages, > etc., and the only way to "integrate" with that is to, well, get the > packages into the distribution. That works because either you use packages that don't have much hard-core compiled dependencies, or these are already installed. Think about installing VTK or ITK this way, even something simpler such as umfpack. I think that you would loose most of your users. In my lab, I do lose users on such packages actually. Beside, what you are describing is possible without package isolation, it is simply the use of a per-user local site-packages, which now semi automatic in python2.6 using the '.local' directory. I do agree that, in a research lab, this is a best practice. Gaël
On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith <nj...@po...> wrote: > On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau <cou...@gm...> wrote: >> Buildout, virtualenv all work by sandboxing from the system python: >> each of them do not see each other, which may be useful for >> development, but as a deployment solution to the casual user who may >> not be familiar with python, it is useless. A scientist who installs >> numpy, scipy, etc... to try things out want to have everything >> available in one python interpreter, and does not want to jump to >> different virtualenvs and whatnot to try different packages. > > What I do -- and documented for people in my lab to do -- is set up > one virtualenv in my user account, and use it as my default python. (I > 'activate' it from my login scripts.) The advantage of this is that > easy_install (or pip) just works, without any hassle about permissions > etc. It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target. No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions. > This should be easier, but I think the basic approach is sound. > "Integration with the package system" is useless; the advantage of > distribution packages is that distributions can provide a single > coherent system with consistent version numbers across all packages, > etc., and the only way to "integrate" with that is to, well, get the > packages into the distribution. Another way is to provide our own repository for a few major distributions, with automatically built packages. This is how most open source providers work. Miguel de Icaza explains this well: http://tirania.org/blog/archive/2007/Jan-26.html I hope we will be able to reuse much of the opensuse build service infrastructure. > > On another note, I hope toydist will provide a "source prepare" step, > that allows arbitrary code to be run on the source tree. (For, e.g., > cython->C conversion, ad-hoc template languages, etc.) IME this is a > very common pain point with distutils; there is just no good way to do > it, and it has to be supported in the distribution utility in order to > get everything right. In particular: > -- Generated files should never be written to the source tree > itself, but only the build directory > -- Building from a source checkout should run the "source prepare" > step automatically > -- Building a source distribution should also run the "source > prepare" step, and stash the results in such a way that when later > building the source distribution, this step can be skipped. This is a > common requirement for user convenience, and necessary if you want to > avoid arbitrary code execution during builds. Build directories are hard to implement right. I don't think toydist will support this directly. IMO, those advanced builds warrant a real build tool - one main goal of toydist is to make integration with waf or scons much easier. Both waf and scons have the concept of a build directory, which should do everything you described. David
On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau <cou...@gm...> wrote: > On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith <nj...@po...> wrote: >> What I do -- and documented for people in my lab to do -- is set up >> one virtualenv in my user account, and use it as my default python. (I >> 'activate' it from my login scripts.) The advantage of this is that >> easy_install (or pip) just works, without any hassle about permissions >> etc. > > It just works if you happen to be able to build everything from > sources. That alone means you ignore the majority of users I intend to > target. > > No other community (except maybe Ruby) push those isolated install > solutions as a general deployment solutions. If it were such a great > idea, other people would have picked up those solutions. AFAICT, R works more-or-less identically (once I convinced it to use a per-user library directory); install.packages() builds from source, and doesn't automatically pull in and build random C library dependencies. I'm not advocating the 'every app in its own world' model that virtualenv's designers had min mind, but virtualenv is very useful to give each user their own world. Normally I only use a fraction of virtualenv's power this way, but sometimes it's handy that they've solved the more general problem -- I can easily move my environment out of the way and rebuild if I've done something stupid, or experiment with new python versions in isolation, or whatever. And when you *do* have to reproduce some old environment -- if only to test that the new improved environment gives the same results -- then it's *really* handy. >> This should be easier, but I think the basic approach is sound. >> "Integration with the package system" is useless; the advantage of >> distribution packages is that distributions can provide a single >> coherent system with consistent version numbers across all packages, >> etc., and the only way to "integrate" with that is to, well, get the >> packages into the distribution. > > Another way is to provide our own repository for a few major > distributions, with automatically built packages. This is how most > open source providers work. Miguel de Icaza explains this well: > > http://tirania.org/blog/archive/2007/Jan-26.html > > I hope we will be able to reuse much of the opensuse build service > infrastructure. Sure, I'm aware of the opensuse build service, have built third-party packages for my projects, etc. It's a good attempt, but also has a lot of problems, and when talking about scientific software it's totally useless to me :-). First, I don't have root on our compute cluster. Second, even if I did I'd be very leery about installing third-party packages because there is no guarantee that the version numbering will be consistent between the third-party repo and the real distro repo -- suppose that the distro packages 0.1, then the third party packages 0.2, then the distro packages 0.3, will upgrades be seamless? What if the third party screws up the version numbering at some point? Debian has "epochs" to deal with this, but third-parties can't use them and maintain compatibility. What if the person making the third party packages is not an expert on these random distros that they don't even use? Will bug reporting tools work properly? Distros are complicated. Third, while we shouldn't advocate that people screw up backwards compatibility, version skew is a real issue. If I need one version of a package and my lab-mate needs another and we have submissions due tomorrow, then filing bugs is a great idea but not a solution. Fourth, even if we had expert maintainers taking care of all these third-party packages and all my concerns were answered, I couldn't convince our sysadmin of that; he's the one who'd have to clean up if something went wrong we don't have a big budget for overtime. Let's be honest -- scientists, on the whole, suck at IT infrastructure, and small individual packages are not going to be very expertly put together. IMHO any real solution should take this into account, keep them sandboxed from the rest of the system, and focus on providing the most friendly and seamless sandbox possible. >> On another note, I hope toydist will provide a "source prepare" step, >> that allows arbitrary code to be run on the source tree. (For, e.g., >> cython->C conversion, ad-hoc template languages, etc.) IME this is a >> very common pain point with distutils; there is just no good way to do >> it, and it has to be supported in the distribution utility in order to >> get everything right. In particular: >> -- Generated files should never be written to the source tree >> itself, but only the build directory >> -- Building from a source checkout should run the "source prepare" >> step automatically >> -- Building a source distribution should also run the "source >> prepare" step, and stash the results in such a way that when later >> building the source distribution, this step can be skipped. This is a >> common requirement for user convenience, and necessary if you want to >> avoid arbitrary code execution during builds. > > Build directories are hard to implement right. I don't think toydist > will support this directly. IMO, those advanced builds warrant a real > build tool - one main goal of toydist is to make integration with waf > or scons much easier. Both waf and scons have the concept of a build > directory, which should do everything you described. Maybe I was unclear -- proper build directory handling is nice, Cython/Pyrex's distutils integration get it wrong (not their fault, distutils is just impossible to do anything sensible with, as you've said), and I've never found build directories hard to implement (perhaps I'm missing something). But what I'm really talking about is having a "pre-build" step that integrates properly with the source and binary packaging stages, and that's not something waf or scons have any particular support for, AFAIK. -- Nathaniel
On Mon, Jan 4, 2010 at 8:42 AM, Nathaniel Smith <nj...@po...> wrote: > On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau <cou...@gm...> wrote: >> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith <nj...@po...> wrote: >>> What I do -- and documented for people in my lab to do -- is set up >>> one virtualenv in my user account, and use it as my default python. (I >>> 'activate' it from my login scripts.) The advantage of this is that >>> easy_install (or pip) just works, without any hassle about permissions >>> etc. >> >> It just works if you happen to be able to build everything from >> sources. That alone means you ignore the majority of users I intend to >> target. >> >> No other community (except maybe Ruby) push those isolated install >> solutions as a general deployment solutions. If it were such a great >> idea, other people would have picked up those solutions. > > AFAICT, R works more-or-less identically (once I convinced it to use a > per-user library directory); install.packages() builds from source, > and doesn't automatically pull in and build random C library > dependencies. As mentioned by Robert, this is different from the usual virtualenv approach. Per-user app installation is certainly a useful (and uncontroversial) feature. And R does support automatically-built binary installers. > > Sure, I'm aware of the opensuse build service, have built third-party > packages for my projects, etc. It's a good attempt, but also has a lot > of problems, and when talking about scientific software it's totally > useless to me :-). First, I don't have root on our compute cluster. True, non-root install is a problem. Nothing *prevents* dpkg to run in non root environment in principle if the packages itself does not require it, but it is not really supported by the tools ATM. > Second, even if I did I'd be very leery about installing third-party > packages because there is no guarantee that the version numbering will > be consistent between the third-party repo and the real distro repo -- > suppose that the distro packages 0.1, then the third party packages > 0.2, then the distro packages 0.3, will upgrades be seamless? What if > the third party screws up the version numbering at some point? Debian > has "epochs" to deal with this, but third-parties can't use them and > maintain compatibility. Actually, at least with .deb-based distributions, this issue has a solution. As packages has their own version in addition to the upstream version, PPA-built packages have their own versions. https://help.launchpad.net/Packaging/PPA/BuildingASourcePackage Of course, this assumes a simple versioning scheme in the first place, instead of the cluster-fck that versioning has became within python packages (again, the scheme used in python is much more complicated than everyone else, and it seems that nobody has ever stopped and thought 5 minutes about the consequences, and whether this complexity was a good idea in the first place). > What if the person making the third party > packages is not an expert on these random distros that they don't even > use? I think simple rules/conventions + build farms would solve most issues. The problem is if you allow total flexibility as input, then automatic and simple solutions become impossible. Certainly, PPA and the build service provides for a much better experience than anything pypi has ever given to me. > Third, while we shouldn't advocate that people screw up backwards > compatibility, version skew is a real issue. If I need one version of > a package and my lab-mate needs another and we have submissions due > tomorrow, then filing bugs is a great idea but not a solution. Nothing prevents you from using virtualenv in that case (I may sound dismissive of those tools, but I am really not. I use them myselves. What I strongly react to is when those are pushed as the de-facto, standard method). > Fourth, > even if we had expert maintainers taking care of all these third-party > packages and all my concerns were answered, I couldn't convince our > sysadmin of that; he's the one who'd have to clean up if something > went wrong we don't have a big budget for overtime. I am not advocating using only packaged, binary installers. I am advocating using them as much as possible where it makes sense - on windows and mac os x in particular. Toydist also aims at making it easier to build, customize installs. Although not yet implemented, --user-like scheme would be quite simple to implement, because toydist installer internally uses autoconf-like directories description (of which --user is a special case). If you need sandboxed installs, customized installs, toydist will not prevent it. It is certainly my intention to make it possible to use virtualenv and co (you already can by building eggs, actually). I hope that by having our own "SciPi", we can actually have a more reliable approach. For example, the static dependency description + mandated metadata would make this much easier and more robust, as there would not be a need to run a setup.py to get the dependencies. If you look at hackageDB (http://hackage.haskell.org/packages/hackage.html), they have a very simple index structure, which makes it easy to download it entirely, and reuse this locally to avoid any internet access. > Let's be honest -- scientists, on the whole, suck at IT > infrastructure, and small individual packages are not going to be very > expertly put together. IMHO any real solution should take this into > account, keep them sandboxed from the rest of the system, and focus on > providing the most friendly and seamless sandbox possible. I agree packages will not always be well put together - but I don't see why this would be worse than the current situation. I also strongly disagree about the sandboxing as the solution of choice. For most users, having only one install of most packages is the typical use-case. Once you start sandboxing, you create artificial barriers between the sandboxes, and this becomes too complicated for most users IMHO. > > Maybe I was unclear -- proper build directory handling is nice, > Cython/Pyrex's distutils integration get it wrong (not their fault, > distutils is just impossible to do anything sensible with, as you've > said), and I've never found build directories hard to implement It is simple if you have a good infrastructure in place (node abstraction, etc...), but that infrastructure is hard to get right. > But what I'm really talking about is > having a "pre-build" step that integrates properly with the source and > binary packaging stages, and that's not something waf or scons have > any particular support for, AFAIK. Could you explain with a concrete example what a pre-build stage would look like ? I don't think I understand what you want, cheers, David
Nathaniel Smith <nj...@po...> wrote: > On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau <cou...@gm...> > wrote: >> Another way is to provide our own repository for a few major >> distributions, with automatically built packages. This is how most >> open source providers work. Miguel de Icaza explains this well: >> >> http://tirania.org/blog/archive/2007/Jan-26.html >> >> I hope we will be able to reuse much of the opensuse build service >> infrastructure. > > Sure, I'm aware of the opensuse build service, have built third-party > packages for my projects, etc. It's a good attempt, but also has a lot > of problems, and when talking about scientific software it's totally > useless to me :-). First, I don't have root on our compute cluster. I use Sage for this very reason, and others use EPD or FEMHub or Python(x,y) for the same reasons. Rolling this into the Python package distribution scheme seems backwards though, since a lot of binary packages that have nothing to do with Python are used as well -- the Python stuff is simply thin wrappers around what should ideally be located in /usr/lib or similar (but are nowadays compiled into the Python extension .so because of distribution problems). To solve the exact problem you (and me) have I think the best solution is to integrate the tools mentioned above with what David is planning (SciPI etc.). Or if that isn't good enough, find generic "userland package manager" that has nothing to do with Python (I'm sure a dozen half-finished ones must have been written but didn't look), finish it, and connect it to SciPI. What David does (I think) is seperate the concerns. This makes the task feasible, and also has the advantage of convenience for the people that *do* want to use Ubuntu, Red Hat or whatever to roll out scientific software on hundreds of clients. Dag Sverre
On Mon, Jan 4, 2010 at 5:48 PM, Dag Sverre Seljebotn <da...@st...> wrote: > > Rolling this into the Python package distribution scheme seems backwards > though, since a lot of binary packages that have nothing to do with Python > are used as well Yep, exactly. > > To solve the exact problem you (and me) have I think the best solution is > to integrate the tools mentioned above with what David is planning (SciPI > etc.). Or if that isn't good enough, find generic "userland package > manager" that has nothing to do with Python (I'm sure a dozen > half-finished ones must have been written but didn't look), finish it, and > connect it to SciPI. You have 0install, autopackage and klik, to cite the ones I know about. I wish people had looked at those before rolling toy solutions to complex problems. > > What David does (I think) is seperate the concerns. Exactly - you've describe this better than I did David