Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ToSpcy is a python package which helps prepocessing dataset for model training in spaCy. It could convert labeled dataset into spaCy format.

License

Notifications You must be signed in to change notification settings

patrick013/toSpcy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

2 Commits

Repository files navigation

toSpcy


ToSpcy is a python package which helps prepocessing dataset for model training in spaCy. It could convert labeled dataset into spaCy format.

Example

from toSpcy.toSpacy import Convertor
dataset=['When <p>Sebastian Thrun</p> started working on self-driving cars at <o>Google</o> in <d>2007</d>, few people outside of the company took him seriously.','<PER>Tom</PER> is traveling in <GEO>China</GEO>']
myConvertor=Convertor()
spacydata=myConvertor.toSpacyFormat(dataset)
spacydata
>>> [('When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously.', {'entities': [(5, 20, 'p'), (61, 67, 'o'), (71, 75, 'd')]}),
 ('Tom is traveling in China', {'entities': [(0, 3, 'PER'), (20, 25, 'GEO')]})]

TagLabels

You could also covert the tags into desired labels when instantiating your object by using "taglabels" - a dictionary of tags and corresponding labels:

dic_taglabels={'p':'PERSON','o':'ORG'}
myConvertor=Convertor(dic_taglabels)
spacydata=myConvertor.toSpacyFormat(dataset)
spacydata[0]
>>> ('When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously.',
 {'entities': [(5, 20, 'PERSON'), (61, 67, 'ORG'), (71, 75, 'd')]})

Installation

pip install toSpcy

License

 Copyright [2019] [Patrick Ruan]
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

ToSpcy is a python package which helps prepocessing dataset for model training in spaCy. It could convert labeled dataset into spaCy format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /