git.postgresql.org Git - postgresql.git/commit

git projects / postgresql.git / commit
? search:
summary | shortlog | log | commit | commitdiff | tree
(parent: b076eb7) | patch
Sync our Snowball stemmer dictionaries with current upstream.
2018年9月24日 21:29:08 +0000 (17:29 -0400)
2018年9月24日 21:29:38 +0000 (17:29 -0400)
commit fd582317e10e26083b8c720598bfcdbf89787112
Sync our Snowball stemmer dictionaries with current upstream.

We haven't touched these since text search functionality landed in core
in 2007 :-(. While the upstream project isn't a beehive of activity,
they do make additions and bug fixes from time to time. Update our
copies of these files.

Also update our documentation about how to keep things in sync, since
they're not making distribution tarballs these days. Fortunately,
their source code turns out to be a breeze to build.

Notable changes:

* The non-UTF8 version of the hungarian stemmer now works in LATIN2
not LATIN1.

* New stemmers have appeared for arabic, indonesian, irish, lithuanian,
nepali, and tamil. These all work in UTF8, and the indonesian and
irish ones also work in LATIN1.

(There are some new stemmers that I did not incorporate, mainly because
their names don't match the underlying languages, suggesting that they're
not to be considered mainstream.)

Worth noting: the upstream Nepali dictionary was contributed by
Arthur Zakirov.

initdb forced because the contents of snowball_create.sql have
changed.

Still TODO: see about updating the stopword lists.

Arthur Zakirov, minor mods and doc work by me

Discussion: https://postgr.es/m/20180626122025.GA12647@zakirov.localdomain
Discussion: https://postgr.es/m/20180219140849.GA9050@zakirov.localdomain
88 files changed:
doc/src/sgml/textsearch.sgml diff | blob | blame | history
src/backend/snowball/Makefile diff | blob | blame | history
src/backend/snowball/README diff | blob | blame | history
src/backend/snowball/dict_snowball.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_danish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_dutch.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_english.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_finnish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_french.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_german.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_indonesian.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_ISO_8859_1_irish.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_ISO_8859_1_italian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_norwegian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_porter.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_portuguese.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_spanish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_1_swedish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_2_hungarian.c [moved from src/backend/snowball/libstemmer/stem_ISO_8859_1_hungarian.c with 54% similarity] diff | blob | blame | history
src/backend/snowball/libstemmer/stem_ISO_8859_2_romanian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_KOI8_R_russian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_arabic.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_danish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_dutch.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_english.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_finnish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_french.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_german.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_hungarian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_indonesian.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_irish.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_italian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_lithuanian.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_nepali.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_norwegian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_porter.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_portuguese.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_romanian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_russian.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_spanish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_swedish.c diff | blob | blame | history
src/backend/snowball/libstemmer/stem_UTF_8_tamil.c [new file with mode: 0644] blob
src/backend/snowball/libstemmer/stem_UTF_8_turkish.c diff | blob | blame | history
src/backend/snowball/libstemmer/utilities.c diff | blob | blame | history
src/backend/snowball/stopwords/nepali.stop [new file with mode: 0644] blob
src/bin/initdb/initdb.c diff | blob | blame | history
src/include/catalog/catversion.h diff | blob | blame | history
src/include/snowball/libstemmer/header.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_danish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_dutch.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_english.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_finnish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_french.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_german.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_hungarian.h [deleted file] blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_indonesian.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_ISO_8859_1_irish.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_ISO_8859_1_italian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_norwegian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_porter.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_portuguese.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_spanish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_1_swedish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_ISO_8859_2_hungarian.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_ISO_8859_2_romanian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_KOI8_R_russian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_arabic.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_danish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_dutch.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_english.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_finnish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_french.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_german.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_hungarian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_indonesian.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_irish.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_italian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_lithuanian.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_nepali.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_norwegian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_porter.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_portuguese.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_romanian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_russian.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_spanish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_swedish.h diff | blob | blame | history
src/include/snowball/libstemmer/stem_UTF_8_tamil.h [new file with mode: 0644] blob
src/include/snowball/libstemmer/stem_UTF_8_turkish.h diff | blob | blame | history
This is the main PostgreSQL git repository.
RSS Atom

AltStyle によって変換されたページ (->オリジナル) /