Summary
Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
Short demo
library(kgrams)
# Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing"
freqs <- kgram_freqs(kgrams::much_ado, N = 4)
# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.
mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75)
# Sample sentences from the language model at different temperatures
set.seed(840)
sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)[1] "i have studied eight or nine truly by your office [...] (truncated output)"
[2] "ere you go : <EOS>"
[3] "don pedro welcome signior : <EOS>"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)[1] "i will not be sworn but love may transform me [...] (truncated output)"
[2] "i will not fail . <EOS>"
[3] "i will go to benedick and counsel him to fight [...] (truncated output)"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)[1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"
[2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"
[3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"
NEWS
Overall Software Improvements
- The package’s test suite has been greatly extended.
- Improved error/warning conditions for wrong arguments.
- Re-enabled compiler diagnostics as per CRAN policy (#19)
API Changes
verbosearguments now default toFALSE.probability(),perplexity()andsample_sentences()are restricted to accept onlylanguage_modelclass objects as theirmodelargument.
New features
as_dictionary(NULL)now returns an emptydictionary.
Bug Fixes
- Fixed bug causing
.preprocessand.tknz_sentarguments to be ignored inprocess_sentences(). - Fixed previously wrong defaults for
max_linesandbatch_sizearguments inkgram_freqs.connection(). - Added print method for class
dictionary. - Fixed bug causing invalid results in
dictionary()with batch processing and non-trivial size constraints on vocabulary size.
Other
- Maintainer’s email updated
Reuse
Citation
BibTeX citation:
@online{gherardi2021,
author = {Gherardi, Valerio},
title = {Kgrams V0.1.2 on {CRAN}},
date = {2021年11月13日},
url = {https://vgherard.github.io/posts/2021-11-13-kgrams-v012-released/},
langid = {en}
}
For attribution, please cite this work as:
Gherardi, Valerio. 2021. "Kgrams V0.1.2 on CRAN." November
13, 2021. https://vgherard.github.io/posts/2021-11-13-kgrams-v012-released/.