You can use this site at a number of levels:
- You can look at the stemming algorithm definitions themselves, and use
them as templates for coding your own versions of stemmers in the computer
language of your choice.
- You can use the various ANSI C
and Java stemmers in programs of your own,
without bothering yourself
with the Snowball system that generated them. To do that,
download either the
C
or the
Java
version of the libstemmer library, and follow the instructions
contained in the
README files within these tarballs.
The tarballs also contain simple example
programs which allow you to run the stemmers from the command line.
- You can get involved in Snowball itself. This is particularly worthwhile
if you want to adjust the stemmers or develop new stemmers. A typical reason
for adjusting the stemmers is that you are working with a different encoding
of accented letters from the ISO Latin I encoding assumed in most of the scripts
here. Then you need to make your own version of the Snowball compiler and
work with the Snowball scripts.
-
Snowball is a language in which stemming algorithms can be easily
represented. The Snowball compiler translates a Snowball script (a .sbl
file) into either a thread-safe ANSI C program or a Java program.
For ANSI C, each Snowball script produces a program file and
corresponding header file
(with .c and .h extensions). The language has a full
manual,
and the various stemming scripts act as example programs.
- You can get deeply interested in stemming. If you do, read the
introductory paper
about Snowball. It is a bit heavyweight, but provides essential background.
And look at the
notes
on how you can help.