Thursday, December 15, 2011
How big is the core Sage library?
I just did the following with Sage-4.8.alpha5:
Here's the result for the full Sage library, which does not distinguish between Python and Cython. Note that sloccount really only counts lines of code -- comments are blank lines are ignored.
This suggests that the core Sage library is just over a "half million lines of Python and Cython source code, not counting comments and whitespace".
Here's the breakdown by module:
Here is the script [1]:
- "sudo apt-get install sloccount".
- "cp -rv SAGE_ROOT/devel/sage-main /tmp/x"
- Use a script [1] to rename all .pyx and .pxi files to .py.
- Ran "sloccount *" in the /tmp/x directory, which ignores autogenerated .c/.cpp files coming from Cython.
Here's the result for the full Sage library, which does not distinguish between Python and Cython. Note that sloccount really only counts lines of code -- comments are blank lines are ignored.
Totals grouped by language (dominant language first): python: 530370 (96.41%) ansic: 14538 (2.64%) cpp: 5188 (0.94%)
This suggests that the core Sage library is just over a "half million lines of Python and Cython source code, not counting comments and whitespace".
Here's the breakdown by module:
SLOC Directory SLOC-by-Language (Sorted) 88903 rings python=87720,cpp=1183 72913 combinat python=71629,cpp=1284 47747 schemes python=46255,cpp=1492 39815 graphs python=28377,ansic=11438 31540 matrix python=31540 31019 modular python=31012,ansic=7 24475 libs python=21171,ansic=2845,cpp=459 20517 misc python=20383,ansic=134 18006 interfaces python=18006 17577 geometry python=16936,cpp=641 12775 categories python=12775 12093 server python=12093 11971 groups python=11971 11961 plot python=11961 10686 crypto python=10686 9920 modules python=9920 8389 symbolic python=8260,cpp=129 8150 algebras python=8150 7260 ext python=7198,ansic=62 7093 structure python=7093 6364 coding python=6364 5670 functions python=5670 5249 homology python=5249 4798 numerical python=4798 4323 quadratic_forms python=4323 3919 gsl python=3919 3911 calculus python=3911 3879 sandpiles python=3879 3003 sets python=3003 2647 databases python=2647 2074 logic python=2074 1736 finance python=1736 1608 games python=1608 1465 monoids python=1465 1435 tests python=1383,ansic=52 1370 stats python=1370 971 interacts python=971 959 tensor python=959 906 lfunctions python=906 308 parallel python=308 275 probability python=275 219 media python=219 197 top_dir python=197
Here is the script [1]:
#!/usr/bin/env python
import os, shutil
for dirpath, dirnames, filenames in os.walk('.'):
for f in filenames:
if f.endswith('.pyx') or f.endswith('.pxi'):
print f
shutil.move(os.path.join(dirpath, f),
os.path.join(dirpath, os.path.splitext(f)[0] + '.py'))
Tuesday, December 13, 2011
Using Sage to Support Research Mathematics
When using Sage to support research mathematics, the most important point to make is to strongly encourage people to do the extra work to turn their "scruffy research code" into a patch that can be peer reviewed and included in Sage. They will have to 100% doctest it, and the quality of their code may improve dramatically as a result. Including code in Sage means that the code will continue to work as Sage is updated. Also, the code is peer reviewed and has to have examples and documentation for every function. That's a much higher bar than just "reproducible research".
Moreover, getting code up to snuff to include in Sage will often also reveal mistakes that will avoid embarrassment later. I'm fixing some issues related to a soon-to-be-done paper right now that I found when doing just this for trac 11975.
This final step of turning snippets of research code into a peer-reviewed contribution to Sage is: (1) a surprisingly huge amount of very important useful work, (2) something that is emphasized as an option for Sage more than with Magma or Mathematica or Pari (say), (3) something whose value people have to be sold on, since they get no real extra academic credit for it, at present, usually, and journal referees often don't care either way (I do, but I'm probably in the minority there), and (4) something that a *lot* of research mathematicians do not do. As an example of (4), in the last two months I've seen a ton of (separate!) bodies of code which is all sort of secret research code in various Dropbox repos, and which isn't currently squarely aimed at going into Sage. I've also seen a bunch of code related to Edixhoven et al.'s algorithm for computing Galois representation with a similar property (there is now trac 12132, due to my
urging).
I did *not* do this step yet with this recently accepted paper. Instead I used "scrappy research code" in psage to do the fast L-series computations. The referee for Math Comp didn't care either way, actually... I hope this doesn't come back to haunt me, though there are many double checks here (e.g., BSD) so I'm not too worried. I will do this get-it-in-Sage step at some point though.
This will be better for the community in the long run, and better for individual researcher's credibility too. And there is a lot of value in having a stable refereed snapshot of code on which a published (=very stable) paper is based.
Moreover, getting code up to snuff to include in Sage will often also reveal mistakes that will avoid embarrassment later. I'm fixing some issues related to a soon-to-be-done paper right now that I found when doing just this for trac 11975.
This final step of turning snippets of research code into a peer-reviewed contribution to Sage is: (1) a surprisingly huge amount of very important useful work, (2) something that is emphasized as an option for Sage more than with Magma or Mathematica or Pari (say), (3) something whose value people have to be sold on, since they get no real extra academic credit for it, at present, usually, and journal referees often don't care either way (I do, but I'm probably in the minority there), and (4) something that a *lot* of research mathematicians do not do. As an example of (4), in the last two months I've seen a ton of (separate!) bodies of code which is all sort of secret research code in various Dropbox repos, and which isn't currently squarely aimed at going into Sage. I've also seen a bunch of code related to Edixhoven et al.'s algorithm for computing Galois representation with a similar property (there is now trac 12132, due to my
urging).
I did *not* do this step yet with this recently accepted paper. Instead I used "scrappy research code" in psage to do the fast L-series computations. The referee for Math Comp didn't care either way, actually... I hope this doesn't come back to haunt me, though there are many double checks here (e.g., BSD) so I'm not too worried. I will do this get-it-in-Sage step at some point though.
This will be better for the community in the long run, and better for individual researcher's credibility too. And there is a lot of value in having a stable refereed snapshot of code on which a published (=very stable) paper is based.
Subscribe to:
Comments (Atom)