Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rskmoi/namedivider-python

Repository files navigation

namedivider-python๐Ÿฆ’

NameDivider Logo

PyPI version Python versions PyPI downloads CI

NameDivider is a tool that divides Japanese full names into family and given names.

๐Ÿš€ Try Live Demo โ€ข ๐Ÿ“– Documentation (ๆ—ฅๆœฌ่ชž) โ€ข ๐Ÿณ Docker API โ€ข โšก Rust Version


๐Ÿ’ก Why NameDivider?

Japanese full names like "่…็พฉๅ‰" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.

Unlike cloud-based AI solutions, NameDivider processes all data locally โ€” no external API calls, no data transmission, and full privacy control.

# Before
person_name = "่…็พฉๅ‰" # How do you know where to divide?
# After 
from namedivider import BasicNameDivider
divider = BasicNameDivider()
result = divider.divide_name("่…็พฉๅ‰")
print(f"Family: {result.family}, Given: {result.given}")
# Family: ่…, Given: ็พฉๅ‰

โœจ Key Features

  • ๐ŸŽฏ 99.91% accuracy - Tested on real-world Japanese names
  • โšก Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
  • ๐Ÿ” Privacy-first โ€“ Local-only processing, ideal for sensitive data
  • ๐Ÿ”ง Production ready - CLI, Python library, and Docker support
  • ๐ŸŽจ Interactive demo - Try it live with Streamlit
  • ๐Ÿ“Š Confidence scoring - Know when to trust the results
  • ๐Ÿ› ๏ธ Customizable rules - Add domain-specific patterns

๐Ÿš€ Quick Start

Installation

pip install namedivider-python

Basic Usage

from namedivider import BasicNameDivider, GBDTNameDivider
# Fast but good accuracy (99.3%)
basic_divider = BasicNameDivider()
result = basic_divider.divide_name("่…็พฉๅ‰")
print(result) # ่… ็พฉๅ‰
# Slower but best accuracy (99.9%)
gbdt_divider = GBDTNameDivider()
result = gbdt_divider.divide_name("่…็พฉๅ‰")
print(result.to_dict())
# {
# 'algorithm': 'gbdt',
# 'family': '่…',
# 'given': '็พฉๅ‰',
# 'score': 0.7300634880343344,
# 'separator': ' '
# }

๐Ÿ”ง Multiple Interfaces

๐Ÿ–ฅ๏ธ Command Line Interface

Perfect for batch processing and automation:

# Single name
$ nmdiv name ่…็พฉๅ‰
่… ็พฉๅ‰
# Process file with progress bar
$ nmdiv file customer_names.txt
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1000/1000 [00:02<00:00, 431.2it/s]
# Check accuracy on labeled data
$ nmdiv accuracy test_data.txt
Accuracy: 99.1%

๐Ÿณ REST API (Docker)

For environments where Python cannot be used, we provide a containerized REST API:

# Run the API server
docker run -d -p 8000:8000 rskmoi/namedivider-api
# Send batch requests
curl -X POST localhost:8000/divide \
 -H "Content-Type: application/json" \
 -d '{"names": ["็ซˆ้–€็‚ญๆฒป้ƒŽ", "็ซˆ้–€็ฆฐ่ฑ†ๅญ"]}'

Response:

{
 "divided_names": [
 {"family": "็ซˆ้–€", "given": "็‚ญๆฒป้ƒŽ", "separator": " ", "score": 0.3004587452426102, "algorithm": "kanji_feature"},
 {"family": "็ซˆ้–€", "given": "็ฆฐ่ฑ†ๅญ", "separator": " ", "score": 0.30480429696983175, "algorithm": "kanji_feature"}
 ]
}

๐ŸŽฏ Interactive Web Demo

Try NameDivider instantly in your browser: Live Demo โ†’

Run locally:

cd examples/demo
pip install -r requirements.txt
streamlit run example_streamlit.py

๐Ÿ“Š Performance & Benchmarks

Algorithm Accuracy Speed (names/sec) Use Case
BasicNameDivider / backend=python 99.3% 4152.8 Stable & compatible
BasicNameDivider / backend=rust 99.3% 18597.7 Max performance (if available)
GBDTNameDivider / backend=python 99.9% 1143.3 Best accuracy, guaranteed
GBDTNameDivider / backend=rust 99.9% 6277.4 Fast + accurate (if available)

Run your own benchmarks:

bash scripts/benchmark_sample.sh

๐Ÿ› ๏ธ Advanced Features

Custom Rules

Handle domain-specific names with custom patterns:

from namedivider import BasicNameDivider, BasicNameDividerConfig
from namedivider import SpecificFamilyNameRule
config = BasicNameDividerConfig(
 custom_rules=[
 SpecificFamilyNameRule(family_names=["็ซœ่ƒ†"]), # Rare family names
 ]
)
divider = BasicNameDivider(config=config)
result = divider.divide_name("็ซœ่ƒ†ๅฐŠ")
# DividedName(family='็ซœ่ƒ†', given='ๅฐŠ', separator=' ', score=1.0, algorithm='rule_specific_family')

Speed Up

For high-volume processing, NameDivider offers several optimization options:

from namedivider import BasicNameDivider, BasicNameDividerConfig
# Load your names
with open("names.txt", "r", encoding="utf-8") as f:
 names = [line.strip() for line in f]
# Option 1: Enable caching (faster repeated processing)
config = BasicNameDividerConfig(cache_mask=True)
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]
# Option 2: (beta) Use Rust backend (up to 4x faster)
# First install: pip install namedivider-core
config = BasicNameDividerConfig(backend="rust")
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]

๐Ÿข Typical Use Cases

  • Customer Data Processing - Clean and standardize name databases
  • Form Validation - Real-time name splitting in web applications
  • Analytics & Reports - Generate family name statistics
  • Data Migration - Convert legacy systems with combined name fields
  • Government & Municipal - Process citizen registration data
  • Security-sensitive Environments - Process names without sending data to external APIs

๐Ÿ“š Examples & Tutorials

๐Ÿ“„ License

Source code and gbdt_model_v1.txt

MIT License

bert_katakana_v0_3_0.pt

cc-by-sa-4.0

family_name_repository.pickle

English

(1) Purpose of use

family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.

Any other use of family_name_repository.pickle is prohibited.

(2) Liability

The author or copyright holder assumes no responsibility for the software.

Japanese / ๆ—ฅๆœฌ่ชž

(1) ๅˆฉ็”จ็›ฎ็š„

ใ“ใฎใ‚ฝใƒ•ใƒˆใ‚ฆใ‚งใ‚ขใ‚’็”จใ„ใฆๅง“ๅๅˆ†ๅ‰ฒใ€ใŠใ‚ˆใณๅง“ๅๅˆ†ๅ‰ฒใ‚ขใƒซใ‚ดใƒชใ‚บใƒ ใฎ้–‹็™บใ‚’ใ™ใ‚‹ๅ ดๅˆใ€family_name_repository.pickleใฏๅ•†็”จ/้žๅ•†็”จๅ•ใ‚ใšๅˆฉ็”จๅฏ่ƒฝใงใ™ใ€‚

ใใ‚Œไปฅๅค–ใฎ็›ฎ็š„ใงใฎfamily_name_repository.pickleใฎๅˆฉ็”จใ‚’็ฆใ˜ใพใ™ใ€‚

(2) ่ฒฌไปป

ไฝœ่€…ใพใŸใฏ่‘—ไฝœๆจฉ่€…ใฏใ€family_name_repository.pickleใซ้–ขใ—ใฆไธ€ๅˆ‡ใฎ่ฒฌไปปใ‚’่ฒ ใ„ใพใ›ใ‚“ใ€‚

The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(ๅๅญ—็”ฑๆฅnet).

๐Ÿ”— Related Projects

๐Ÿ“ˆ Project Stats

GitHub stars GitHub forks Docker Pulls

Trusted by developers worldwide


Made with โค๏ธ by @rskmoi โ€ข Contact @rskmoi

About

A tool that divides Japanese full names into family and given names.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

AltStyle ใซใ‚ˆใฃใฆๅค‰ๆ›ใ•ใ‚ŒใŸใƒšใƒผใ‚ธ (->ใ‚ชใƒชใ‚ธใƒŠใƒซ) /