Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Rust port of BudouX with optional HTML processing and CLI

License

Notifications You must be signed in to change notification settings

neodyland/budouy

Repository files navigation

budouy

License Crates.io Docs.rs CI Demo

Rust port of BudouX with optional HTML processing, WebAssembly support, and a small CLI.

Try the live demo

Features

  • std: default feature for std-enabled builds.
  • alloc: no_std-compatible build using alloc and hashbrown.
  • vendored-models: bundles default Japanese, Simplified Chinese, Traditional Chinese, and Thai models.
  • html: enables HTML processing utilities based on kuchikikiki (requires std).
  • cli: enables the budouy CLI (requires std, implies vendored-models).
  • wasm: enables WebAssembly bindings via wasm-bindgen (implies alloc and vendored-models).

Note: std and alloc are mutually exclusive.

Usage

Library

Custom model:

use std::collections::HashMap;
use budouy::{Model, Parser};
use budouy::model::FeatureKey;
let mut model: Model = HashMap::new();
model.insert(FeatureKey::UW4, HashMap::from([("a".to_string(), 10_000)]));
let parser = Parser::new(model);
let chunks = parser.parse("abcdeabcd");
assert_eq!(chunks, vec!["abcde", "abcd"]);

Default model (requires vendored-models):

use budouy::model::load_default_japanese_parser;
let parser = load_default_japanese_parser();
let chunks = parser.parse("今日は良い天気です");
println!("{:?}", chunks);

HTML processing (requires html + vendored-models):

use budouy::HTMLProcessingParser;
use budouy::model::load_default_japanese_parser;
let parser = load_default_japanese_parser();
let html_parser = HTMLProcessingParser::new(parser, None);
let input = "今日は<strong>良い</strong>天気です";
let output = html_parser.translate_html_string(input);
println!("{}", output);

WebAssembly

Build for web (requires wasm-pack):

wasm-pack build --target web --no-default-features --features wasm

Use from JavaScript:

import init, { BudouY } from './pkg/budouy.js';
await init();
const parser = BudouY.japanese();
const chunks = parser.parse("今日は良い天気です");
console.log(chunks); // ["今日は", "良い", "天気です"]
// Other languages
const zhHans = BudouY.simplifiedChinese();
const zhHant = BudouY.traditionalChinese();
const thai = BudouY.thai();

CLI

Build and run the CLI (requires cli):

cargo run --features cli -- parse --lang ja "今日は良い天気です"

Use a custom model JSON:

cargo run --features cli -- parse --model ./model.json "今日は良い天気です"

Read from stdin:

echo "今日は良い天気です" | cargo run --features cli -- parse --lang ja

no_std

This crate supports no_std with alloc. Disable default features and enable alloc:

budouy = { version = "0.1", default-features = false, features = ["alloc"] }

std and alloc are mutually exclusive. The html and cli features require std.

Models

Vendored models in src/models/*.json are derived from the original BudouX project (Google) and are licensed under Apache-2.0. See LICENSE for details. This project is not affiliated with Google.

License

Apache-2.0. See LICENSE.

About

Rust port of BudouX with optional HTML processing and CLI

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 3

Languages

AltStyle によって変換されたページ (->オリジナル) /