PyPI version Python versions PyPI downloads CI
NameDivider is a tool that divides Japanese full names into family and given names.
๐ Try Live Demo โข ๐ Documentation (ๆฅๆฌ่ช) โข ๐ณ Docker API โข โก Rust Version
Japanese full names like "่ ็พฉๅ" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.
Unlike cloud-based AI solutions, NameDivider processes all data locally โ no external API calls, no data transmission, and full privacy control.
# Before person_name = "่ ็พฉๅ" # How do you know where to divide? # After from namedivider import BasicNameDivider divider = BasicNameDivider() result = divider.divide_name("่ ็พฉๅ") print(f"Family: {result.family}, Given: {result.given}") # Family: ่ , Given: ็พฉๅ
- ๐ฏ 99.91% accuracy - Tested on real-world Japanese names
- โก Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
- ๐ Privacy-first โ Local-only processing, ideal for sensitive data
- ๐ง Production ready - CLI, Python library, and Docker support
- ๐จ Interactive demo - Try it live with Streamlit
- ๐ Confidence scoring - Know when to trust the results
- ๐ ๏ธ Customizable rules - Add domain-specific patterns
pip install namedivider-python
from namedivider import BasicNameDivider, GBDTNameDivider # Fast but good accuracy (99.3%) basic_divider = BasicNameDivider() result = basic_divider.divide_name("่ ็พฉๅ") print(result) # ่ ็พฉๅ # Slower but best accuracy (99.9%) gbdt_divider = GBDTNameDivider() result = gbdt_divider.divide_name("่ ็พฉๅ") print(result.to_dict()) # { # 'algorithm': 'gbdt', # 'family': '่ ', # 'given': '็พฉๅ', # 'score': 0.7300634880343344, # 'separator': ' ' # }
Perfect for batch processing and automation:
# Single name $ nmdiv name ่ ็พฉๅ ่ ็พฉๅ # Process file with progress bar $ nmdiv file customer_names.txt 100%|โโโโโโโโโโ| 1000/1000 [00:02<00:00, 431.2it/s] # Check accuracy on labeled data $ nmdiv accuracy test_data.txt Accuracy: 99.1%
For environments where Python cannot be used, we provide a containerized REST API:
# Run the API server docker run -d -p 8000:8000 rskmoi/namedivider-api # Send batch requests curl -X POST localhost:8000/divide \ -H "Content-Type: application/json" \ -d '{"names": ["็ซ้็ญๆฒป้", "็ซ้็ฆฐ่ฑๅญ"]}'
Response:
{
"divided_names": [
{"family": "็ซ้", "given": "็ญๆฒป้", "separator": " ", "score": 0.3004587452426102, "algorithm": "kanji_feature"},
{"family": "็ซ้", "given": "็ฆฐ่ฑๅญ", "separator": " ", "score": 0.30480429696983175, "algorithm": "kanji_feature"}
]
}Try NameDivider instantly in your browser: Live Demo โ
Run locally:
cd examples/demo
pip install -r requirements.txt
streamlit run example_streamlit.py| Algorithm | Accuracy | Speed (names/sec) | Use Case |
|---|---|---|---|
| BasicNameDivider / backend=python | 99.3% | 4152.8 | Stable & compatible |
| BasicNameDivider / backend=rust | 99.3% | 18597.7 | Max performance (if available) |
| GBDTNameDivider / backend=python | 99.9% | 1143.3 | Best accuracy, guaranteed |
| GBDTNameDivider / backend=rust | 99.9% | 6277.4 | Fast + accurate (if available) |
Run your own benchmarks:
bash scripts/benchmark_sample.sh
Handle domain-specific names with custom patterns:
from namedivider import BasicNameDivider, BasicNameDividerConfig from namedivider import SpecificFamilyNameRule config = BasicNameDividerConfig( custom_rules=[ SpecificFamilyNameRule(family_names=["็ซ่"]), # Rare family names ] ) divider = BasicNameDivider(config=config) result = divider.divide_name("็ซ่ๅฐ") # DividedName(family='็ซ่', given='ๅฐ', separator=' ', score=1.0, algorithm='rule_specific_family')
For high-volume processing, NameDivider offers several optimization options:
from namedivider import BasicNameDivider, BasicNameDividerConfig # Load your names with open("names.txt", "r", encoding="utf-8") as f: names = [line.strip() for line in f] # Option 1: Enable caching (faster repeated processing) config = BasicNameDividerConfig(cache_mask=True) divider = BasicNameDivider(config=config) results = [divider.divide_name(name) for name in names] # Option 2: (beta) Use Rust backend (up to 4x faster) # First install: pip install namedivider-core config = BasicNameDividerConfig(backend="rust") divider = BasicNameDivider(config=config) results = [divider.divide_name(name) for name in names]
- Customer Data Processing - Clean and standardize name databases
- Form Validation - Real-time name splitting in web applications
- Analytics & Reports - Generate family name statistics
- Data Migration - Convert legacy systems with combined name fields
- Government & Municipal - Process citizen registration data
- Security-sensitive Environments - Process names without sending data to external APIs
- ๐ Use REST API with minimal client samples - Integration examples (7 languages available in namedivider-rs)
- โก Performance Optimization - Handle large datasets efficiently
- ๐ง Custom Rules Examples - Domain-specific configurations
MIT License
cc-by-sa-4.0
English
(1) Purpose of use
family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.
Any other use of family_name_repository.pickle is prohibited.
(2) Liability
The author or copyright holder assumes no responsibility for the software.
Japanese / ๆฅๆฌ่ช
(1) ๅฉ็จ็ฎ็
ใใฎใฝใใใฆใงใขใ็จใใฆๅงๅๅๅฒใใใใณๅงๅๅๅฒใขใซใดใชใบใ ใฎ้็บใใใๅ ดๅใfamily_name_repository.pickleใฏๅ็จ/้ๅ็จๅใใๅฉ็จๅฏ่ฝใงใใ
ใใไปฅๅคใฎ็ฎ็ใงใฎfamily_name_repository.pickleใฎๅฉ็จใ็ฆใใพใใ
(2) ่ฒฌไปป
ไฝ่ ใพใใฏ่ไฝๆจฉ่ ใฏใfamily_name_repository.pickleใซ้ขใใฆไธๅใฎ่ฒฌไปปใ่ฒ ใใพใใใ
The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(ๅๅญ็ฑๆฅnet).
- โก namedivider-rs - High-performance Rust implementation
- ๐ง BERT Katakana Divider - Deep learning approach for katakana names
GitHub stars GitHub forks Docker Pulls
Trusted by developers worldwide