Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

knowtions/rseg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

16 Commits

Repository files navigation

Introduction
========
Rseg is a Chinese Word Segmentation(中文分词) routine in pure Ruby.
The algorithm is based on this article: http://xiecc.blog.163.com/blog/static/14032200671110224190/
Usage
========
Rseg now support two modes: inline and C/S mode.
1. Inline mode
> require 'rubygems'
> require 'rseg'
> Rseg.segment("需要分词的文章")
['需要', '分词', '的', '文章']
The first call to Rseg#segment will need about 30 seconds to load the dictionary, the second call will be very fast, you can also call Rseg#load to load dictionaries manually.
2. C/S mode
$ rseg_server
== Sinatra/0.9.4 has taken the stage on 4100
This will start rseg server on http://localhost:4100
You can visit it via your browser or the rseg command.
$ rseg '需要分词的文章'
需要 分词 的 文章
You can also access server with the Rseg#remote_segment
$ irb
> require 'rubygems'
> require 'rseg'
> Rseg.remote_segment("需要分词的文章") # This will be very fast
['需要', '分词', '的', '文章']
Performance
========
About 5M character/s on my Macbook (Intel Core 2 Duo 2GHz/4G mem). 
License
========
Rseg includes two built-in dictionaries:
* CC-CEDICT (http://cc-cedict.org/wiki/) with Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)
* Wikipedia Chinese article title list (http://download.wikimedia.org/zhwiki/) with Creative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/)
The codes and others in Rseg are licensed under MIT license.
Feedback
========
All feedback are welcome, Yuanyi Zhang(zhangyuanyi#gmail.com)

About

A Chinese Word Segmentation(中文分词) routine in pure Ruby

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 94.5%
  • CSS 5.5%

AltStyle によって変換されたページ (->オリジナル) /