This package contains python bindings for Stanford CoreNLP's protobuf specifications, as generated by protoc. These bindings can used to parse binary data produced by, e.g., the Stanford CoreNLP server.
Usage:
from corenlp_protobuf import Document, parseFromDelimitedString # document.dat contains a serialized Document. with open('document.dat', 'r') as f: buf = f.read() doc = Document() parseFromDelimitedString(doc, buf) # You can access the sentences from doc.sentence. sentence = doc.sentence[0] # You can access any property within a sentence. print(sentence.text) # Likewise for tokens token = sentence.token[0] print(token.lemma)
See test_read.py for more examples.