License Crates.io Docs.rs CI Demo
Rust port of BudouX with optional HTML processing, WebAssembly support, and a small CLI.
std: default feature for std-enabled builds.alloc: no_std-compatible build using alloc and hashbrown.vendored-models: bundles default Japanese, Simplified Chinese, Traditional Chinese, and Thai models.html: enables HTML processing utilities based onkuchikikiki(requiresstd).cli: enables thebudouyCLI (requiresstd, impliesvendored-models).wasm: enables WebAssembly bindings viawasm-bindgen(impliesallocandvendored-models).
Note: std and alloc are mutually exclusive.
Custom model:
use std::collections::HashMap; use budouy::{Model, Parser}; use budouy::model::FeatureKey; let mut model: Model = HashMap::new(); model.insert(FeatureKey::UW4, HashMap::from([("a".to_string(), 10_000)])); let parser = Parser::new(model); let chunks = parser.parse("abcdeabcd"); assert_eq!(chunks, vec!["abcde", "abcd"]);
Default model (requires vendored-models):
use budouy::model::load_default_japanese_parser; let parser = load_default_japanese_parser(); let chunks = parser.parse("今日は良い天気です"); println!("{:?}", chunks);
HTML processing (requires html + vendored-models):
use budouy::HTMLProcessingParser; use budouy::model::load_default_japanese_parser; let parser = load_default_japanese_parser(); let html_parser = HTMLProcessingParser::new(parser, None); let input = "今日は<strong>良い</strong>天気です"; let output = html_parser.translate_html_string(input); println!("{}", output);
Build for web (requires wasm-pack):
wasm-pack build --target web --no-default-features --features wasm
Use from JavaScript:
import init, { BudouY } from './pkg/budouy.js'; await init(); const parser = BudouY.japanese(); const chunks = parser.parse("今日は良い天気です"); console.log(chunks); // ["今日は", "良い", "天気です"] // Other languages const zhHans = BudouY.simplifiedChinese(); const zhHant = BudouY.traditionalChinese(); const thai = BudouY.thai();
Build and run the CLI (requires cli):
cargo run --features cli -- parse --lang ja "今日は良い天気です"Use a custom model JSON:
cargo run --features cli -- parse --model ./model.json "今日は良い天気です"Read from stdin:
echo "今日は良い天気です" | cargo run --features cli -- parse --lang ja
This crate supports no_std with alloc. Disable default features and enable alloc:
budouy = { version = "0.1", default-features = false, features = ["alloc"] }
std and alloc are mutually exclusive. The html and cli features require std.
Vendored models in src/models/*.json are derived from the original BudouX
project (Google) and are licensed under Apache-2.0. See LICENSE for details.
This project is not affiliated with Google.
Apache-2.0. See LICENSE.