Name	Name	Last commit message	Last commit date
Latest commit History 268 Commits
.circleci	.circleci
.github/workflows	.github/workflows
R	R
data-raw	data-raw
data	data
docs	docs
man	man
src	src
tests	tests
vignettes	vignettes
.Rbuildignore	.Rbuildignore
.travis.yml	.travis.yml
DESCRIPTION	DESCRIPTION
NAMESPACE	NAMESPACE
NEWS.md	NEWS.md
README.Rmd	README.Rmd
README.md	README.md
_pkgdown.yml	_pkgdown.yml
appveyor.yml	appveyor.yml
codecov.yml	codecov.yml
cran-comments.md	cran-comments.md
sbo.Rproj	sbo.Rproj

Name

Last commit message

Last commit date

Latest commit

History

man

src

sbo

AppVeyor build status CircleCI build status GitHub Actions build status Codecov test coverage CRAN status CRAN downloads Tweet

sbo provides utilities for building and evaluating text predictors based on Stupid Back-off N-gram models in R. It includes functions such as:

kgram_freqs(): Extract k-gram frequency tables from a text corpus
sbo_predictor(): Train a next-word predictor via Stupid Back-off.
eval_sbo_predictor(): Test text predictions against an independent corpus.

Installation

Released version

You can install the latest release of sbo from CRAN:

install.packages("sbo")

Development version:

You can install the development version of sbo from GitHub:

# install.packages("devtools")
devtools::install_github("vgherard/sbo")

Example

This example shows how to build a text predictor with sbo:

library(sbo)
p <- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset
 N = 3, # Train a 3-gram model
 dict = sbo::twitter_dict, # Top 1k words appearing in corpus
 .preprocess = sbo::preprocess, # Preprocessing transformation
 EOS = ".?!:;" # End-Of-Sentence characters
 )

The object p can now be used to generate predictive text as follows:

predict(p, "i love") # a character vector
#> [1] "you" "it" "my"
predict(p, "you love") # another character vector
#> [1] "<EOS>" "me" "the"
predict(p, 
 c("i love", "you love", "she loves", "we love", "you love", "they love")
 ) # a character matrix
#> [,1] [,2] [,3] 
#> [1,] "you" "it" "my" 
#> [2,] "<EOS>" "me" "the"
#> [3,] "you" "my" "me" 
#> [4,] "you" "our" "it" 
#> [5,] "<EOS>" "me" "the"
#> [6,] "to" "you" "and"

Related packages

For more general purpose utilities to work with n-gram models, you can also check out my package {kgrams}.

Help

For help, see the sbo website.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vgherard/sbo

Folders and files

Latest commit

History

Repository files navigation

sbo

Installation

Released version

Development version:

Example

Related packages

Help

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sbo

Installation

Released version

Development version:

Example

Related packages

Help

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages