Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 6755b5f

Browse files
author
Arun Tejasvi Chaganty
committed
Initialized with protobuf and tests
1 parent 6417157 commit 6755b5f

File tree

12 files changed

+3523
-0
lines changed

12 files changed

+3523
-0
lines changed

‎.travis.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# this file is *not* meant to cover or endorse the use of travis, but rather to
2+
# help confirm pull requests to this project.
3+
4+
language: python
5+
6+
env:
7+
- TOXENV=py27
8+
- TOXENV=py33
9+
- TOXENV=py34
10+
11+
install: pip install tox
12+
13+
script: tox
14+
15+
notifications:
16+
email: false

‎MANIFEST.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Include the license file
2+
include LICENSE.txt
3+
4+
# Include the data files
5+
recursive-include data *

‎README.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Stanford CoreNLP Python Bindings
2+
================================
3+
4+
This package contains python bindings for [Stanford
5+
CoreNLP](https://github.com/stanfordnlp/CoreNLP)'s protobuf
6+
specifications, as generated by `protoc`. These bindings can used to
7+
parse binary data produced by, e.g., the [Stanford CoreNLP
8+
server](https://stanfordnlp.github.io/CoreNLP/corenlp-server.html).
9+
10+
----
11+
12+
Usage::
13+
14+
from corenlp_protobuf import Document
15+
16+
# document.dat contains a serialized Document.
17+
with open('document.dat', 'r') as f:
18+
buffer = f.read()
19+
doc = Document()
20+
doc.ParseFromString(buffer)
21+
22+
# You can access the sentences from doc.sentence.
23+
sentence = doc.sentence[0]
24+
25+
# You can access any property within a sentence.
26+
print(sentence.text)
27+
28+
# Likewise for tokens
29+
token = sentence.token[0]
30+
print(token.lemma)

‎corenlp_protobuf/CoreNLP_pb2.py

Lines changed: 2681 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

‎corenlp_protobuf/__init__.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from __future__ import absolute_import
2+
3+
from google.protobuf.internal.decoder import _DecodeVarint
4+
from .CoreNLP_pb2 import *
5+
6+
def parseFromDelimitedString(obj, buf, offset=0):
7+
"""
8+
Stanford CoreNLP uses the Java "writeDelimitedTo" function, which
9+
writes the size (and offset) of the buffer before writing the object.
10+
This function handles parsing this message starting from offset 0.
11+
12+
@returns how many bytes of @buf were consumed.
13+
"""
14+
size, pos = _DecodeVarint(buf, offset)
15+
obj.ParseFromString(buf[offset+pos:offset+pos+size])
16+
return pos+size
17+
18+
def to_text(sentence):
19+
"""
20+
Helper routine that converts a Sentence protobuf to a string from its tokens.
21+
"""
22+
text = ""
23+
for i, tok in enumerate(sentence.token):
24+
if i != 0:
25+
text += tok.before
26+
text += tok.word
27+
return text

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /