Analyzing Syntax
Stay organized with collections
Save and categorize content based on your preferences.
While most Natural Language methods analyze what a given text is about,
the analyzeSyntax method inspects the structure of the language itself.
Syntactic Analysis breaks up the given text into a series of sentences and
tokens (generally, words) and provides linguistic information about those tokens.
See Morphology & Dependency Trees for details
about the linguistic analysis and Language Support
for a list of the languages whose syntax the Natural Language API can analyze.
This section demonstrates a few ways to detect syntax in a document. For each document, you must submit a separate request.
Analyzing Syntax in a String
Here is an example of performing syntactic analysis on a text string sent directly to the Natural Language API:
Protocol
To analyze syntax in a document, make a POST request to the
documents:analyzeSyntax
REST method and provide
the appropriate request body as shown in the following example.
The example uses the gcloud auth application-default print-access-token
command to obtain an access token for a service account set up for the
project using the Google Cloud Platform gcloud CLI.
For instructions on installing the gcloud CLI,
setting up a project with a service account
see the Quickstart.
curl-XPOST\ -H"Authorization: Bearer "$(gcloudauthapplication-defaultprint-access-token)\ -H"Content-Type: application/json; charset=utf-8"\ --data"{ 'encodingType': 'UTF8', 'document': { 'type': 'PLAIN_TEXT', 'content': 'Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones.' } }""https://language.googleapis.com/v1/documents:analyzeSyntax"
If you don't specify document.language, then the language will be automatically
detected. For information on which languages are supported by the Natural Language API,
see Language Support. See the Document
reference documentation for more information on configuring the request
body.
If the request is successful, the server returns a 200 OK HTTP status code and
the response in JSON format:
{
"sentences": [
{
"text": {
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
"beginOffset": 0
}
},
{
"text": {
"content": "Sundar Pichai said in his keynote that users love their new Android phones.",
"beginOffset": 105
}
}
],
"tokens": [
{
"text": {
"content": "Google",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "SINGULAR",
"person": "PERSON_UNKNOWN",
"proper": "PROPER",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 7,
"label": "NSUBJ"
},
"lemma": "Google"
},
...
{
"text": {
"content": ".",
"beginOffset": 179
},
"partOfSpeech": {
"tag": "PUNCT",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "NUMBER_UNKNOWN",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 20,
"label": "P"
},
"lemma": "."
}
],
"language": "en"
}
The tokens array contains Token
objects representing the detected sentence tokens, which include information
such as a token's part of speech and its position in the sentence.
gcloud
Refer to the analyze-syntax
command for complete details.
To perform syntax analysis, use the gcloud CLI and
use the --content flag to identify the content to analyze:
gcloud ml language analyze-syntax --content="Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones."
If the request is successful, the server returns a response in JSON format:
{
"sentences": [
{
"text": {
"content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
"beginOffset": 0
}
},
{
"text": {
"content": "Sundar Pichai said in his keynote that users love their new Android phones.",
"beginOffset": 105
}
}
],
"tokens": [
{
"text": {
"content": "Google",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "SINGULAR",
"person": "PERSON_UNKNOWN",
"proper": "PROPER",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 7,
"label": "NSUBJ"
},
"lemma": "Google"
},
...
{
"text": {
"content": ".",
"beginOffset": 179
},
"partOfSpeech": {
"tag": "PUNCT",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "NUMBER_UNKNOWN",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 20,
"label": "P"
},
"lemma": "."
}
],
"language": "en"
}
The tokens array contains Token
objects representing the detected sentence tokens, which include information
such as a token's part of speech and its position in the sentence.
Go
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Go API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
funcanalyzeSyntax(ctxcontext.Context,client*language.Client,textstring)(*languagepb.AnnotateTextResponse,error){
returnclient.AnnotateText(ctx,&languagepb.AnnotateTextRequest{
Document:&languagepb.Document{
Source:&languagepb.Document_Content{
Content:text,
},
Type:languagepb.Document_PLAIN_TEXT,
},
Features:&languagepb.AnnotateTextRequest_Features{
ExtractSyntax:true,
},
EncodingType:languagepb.EncodingType_UTF8,
})
}
Java
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Java API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try(com.google.cloud.language.v1.LanguageServiceClientlanguage=
com.google.cloud.language.v1.LanguageServiceClient.create()){
com.google.cloud.language.v1.Documentdoc=
com.google.cloud.language.v1.Document.newBuilder().setContent(text)
.setType(com.google.cloud.language.v1.Document.Type.PLAIN_TEXT).build();
AnalyzeSyntaxRequestrequest=
AnalyzeSyntaxRequest.newBuilder()
.setDocument(doc)
.setEncodingType(com.google.cloud.language.v1.EncodingType.UTF16)
.build();
// Analyze the syntax in the given text
AnalyzeSyntaxResponseresponse=language.analyzeSyntax(request);
// Print the response
for(Tokentoken:response.getTokensList()){
System.out.printf("\tText: %s\n",token.getText().getContent());
System.out.printf("\tBeginOffset: %d\n",token.getText().getBeginOffset());
System.out.printf("Lemma: %s\n",token.getLemma());
System.out.printf("PartOfSpeechTag: %s\n",token.getPartOfSpeech().getTag());
System.out.printf("\tAspect: %s\n",token.getPartOfSpeech().getAspect());
System.out.printf("\tCase: %s\n",token.getPartOfSpeech().getCase());
System.out.printf("\tForm: %s\n",token.getPartOfSpeech().getForm());
System.out.printf("\tGender: %s\n",token.getPartOfSpeech().getGender());
System.out.printf("\tMood: %s\n",token.getPartOfSpeech().getMood());
System.out.printf("\tNumber: %s\n",token.getPartOfSpeech().getNumber());
System.out.printf("\tPerson: %s\n",token.getPartOfSpeech().getPerson());
System.out.printf("\tProper: %s\n",token.getPartOfSpeech().getProper());
System.out.printf("\tReciprocity: %s\n",token.getPartOfSpeech().getReciprocity());
System.out.printf("\tTense: %s\n",token.getPartOfSpeech().getTense());
System.out.printf("\tVoice: %s\n",token.getPartOfSpeech().getVoice());
System.out.println("DependencyEdge");
System.out.printf("\tHeadTokenIndex: %d\n",token.getDependencyEdge().getHeadTokenIndex());
System.out.printf("\tLabel: %s\n\n",token.getDependencyEdge().getLabel());
}
returnresponse.getTokensList();
}Node.js
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Node.js API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
// Imports the Google Cloud client library
constlanguage=require('@google-cloud/language');
// Creates a client
constclient=newlanguage.LanguageServiceClient ();
/**
* TODO(developer): Uncomment the following line to run this code.
*/
// const text = 'Your text to analyze, e.g. Hello, world!';
// Prepares a document, representing the provided text
constdocument={
content:text,
type:'PLAIN_TEXT',
};
// Need to specify an encodingType to receive word offsets
constencodingType='UTF8';
// Detects the sentiment of the document
const[syntax]=awaitclient.analyzeSyntax({document,encodingType});
console.log('Tokens:');
syntax.tokens.forEach(part=>{
console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
console.log('Morphology:',part.partOfSpeech);
});Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
fromgoogle.cloudimport language_v1
defsample_analyze_syntax(text_content):
"""
Analyzing Syntax in a String
Args:
text_content The text content to analyze
"""
client = language_v1.LanguageServiceClient()
# text_content = 'This is a short sentence.'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type_": type_, "language": language}
# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8
response = client.analyze_syntax (
request={"document": document, "encoding_type": encoding_type}
)
# Loop through tokens returned from the API
for token in response.tokens:
# Get the text content of this token. Usually a word or punctuation.
text = token.text
print(f"Token text: {text.content}")
print(f"Location of this token in overall document: {text.begin_offset}")
# Get the part of speech information for this token.
# Part of speech is defined in:
# http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
part_of_speech = token.part_of_speech
# Get the tag, e.g. NOUN, ADJ for Adjective, et al.
print(
"Part of Speech tag: {}".format(
language_v1.PartOfSpeech .Tag (part_of_speech.tag).name
)
)
# Get the voice, e.g. ACTIVE or PASSIVE
print(
"Voice: {}".format(
language_v1.PartOfSpeech .Voice (part_of_speech.voice).name
)
)
# Get the tense, e.g. PAST, FUTURE, PRESENT, et al.
print(
"Tense: {}".format(
language_v1.PartOfSpeech .Tense (part_of_speech.tense).name
)
)
# See API reference for additional Part of Speech information available
# Get the lemma of the token. Wikipedia lemma description
# https://en.wikipedia.org/wiki/Lemma_(morphology)
print(f"Lemma: {token.lemma}")
# Get the dependency tree parse information for this token.
# For more information on dependency labels:
# http://www.aclweb.org/anthology/P13-2017
dependency_edge = token.dependency_edge
print(f"Head token index: {dependency_edge.head_token_index}")
print(
"Label: {}".format(
language_v1.DependencyEdge .Label (dependency_edge.label).name
)
)
# Get the language of the text, which will be the same as
# the language specified in the request or, if not specified,
# the automatically-detected language.
print(f"Language of the text: {response.language}")
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Natural Language reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Natural Language reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Natural Language reference documentation for Ruby.
Analyzing Syntax from Cloud Storage
For your convenience, the Natural Language API can perform syntactic analysis directly on a file located in Cloud Storage, without the need to send the contents of the file in the body of your request.
Here is an example of performing syntactic analysis on a file located in Cloud Storage.
Protocol
To analyze syntax in a document stored in Cloud Storage,
make a POST request to the
documents:analyzeSyntax
REST method and provide
the appropriate request body with the path to the document
as shown in the following example.
curl-XPOST\ -H"Authorization: Bearer "$(gcloudauthapplication-defaultprint-access-token)\ -H"Content-Type: application/json; charset=utf-8"\ --data"{ 'encodingType': 'UTF8', 'document': { 'type': 'PLAIN_TEXT', 'gcsContentUri': 'gs://<bucket-name>/<object-name>' } }""https://language.googleapis.com/v1/documents:analyzeSyntax"
If you don't specify document.language, then the language will be automatically
detected. For information on which languages are supported by the Natural Language API,
see Language Support. See the Document
reference documentation for more information on configuring the request body.
If the request is successful, the server returns a 200 OK HTTP status code and
the response in JSON format:
{
"sentences": [
{
"text": {
"content": "Hello, world!",
"beginOffset": 0
}
}
],
"tokens": [
{
"text": {
"content": "Hello",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "X",
// ...
},
"dependencyEdge": {
"headTokenIndex": 2,
"label": "DISCOURSE"
},
"lemma": "Hello"
},
{
"text": {
"content": ",",
"beginOffset": 5
},
"partOfSpeech": {
"tag": "PUNCT",
// ...
},
"dependencyEdge": {
"headTokenIndex": 2,
"label": "P"
},
"lemma": ","
},
// ...
],
"language": "en"
}The tokens array contains Token
objects representing the detected sentence tokens, which include information
such as a token's part of speech and its position in the sentence.
gcloud
Refer to theanalyze-syntax
command for complete details.
To perform syntax analysis on a file in Cloud Storage, use the gcloud
command line tool and use the --content-file flag to identify the file
path that contains the content to analyze:
gcloud ml language analyze-syntax --content-file=gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME
If the request is successful, the server returns a response in JSON format:
{
"sentences": [
{
"text": {
"content": "Hello, world!",
"beginOffset": 0
}
}
],
"tokens": [
{
"text": {
"content": "Hello",
"beginOffset": 0
},
"partOfSpeech": {
"tag": "X",
// ...
},
"dependencyEdge": {
"headTokenIndex": 2,
"label": "DISCOURSE"
},
"lemma": "Hello"
},
{
"text": {
"content": ",",
"beginOffset": 5
},
"partOfSpeech": {
"tag": "PUNCT",
// ...
},
"dependencyEdge": {
"headTokenIndex": 2,
"label": "P"
},
"lemma": ","
},
// ...
],
"language": "en"
}The tokens array contains Token
objects representing the detected sentence tokens, which include information
such as a token's part of speech and its position in the sentence.
Go
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Go API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
funcanalyzeSyntaxFromGCS(ctxcontext.Context,gcsURIstring)(*languagepb.AnnotateTextResponse,error){
returnclient.AnnotateText(ctx,&languagepb.AnnotateTextRequest{
Document:&languagepb.Document{
Source:&languagepb.Document_GcsContentUri{
GcsContentUri:gcsURI,
},
Type:languagepb.Document_PLAIN_TEXT,
},
Features:&languagepb.AnnotateTextRequest_Features{
ExtractSyntax:true,
},
EncodingType:languagepb.EncodingType_UTF8,
})
}
Java
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Java API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try(com.google.cloud.language.v1.LanguageServiceClientlanguage=
com.google.cloud.language.v1.LanguageServiceClient.create()){
com.google.cloud.language.v1.Documentdoc=
com.google.cloud.language.v1.Document.newBuilder().setGcsContentUri(gcsUri).setType(
com.google.cloud.language.v1.Document.Type.PLAIN_TEXT
).build();
AnalyzeSyntaxRequestrequest=
AnalyzeSyntaxRequest.newBuilder()
.setDocument(doc)
.setEncodingType(com.google.cloud.language.v1.EncodingType.UTF16)
.build();
// Analyze the syntax in the given text
AnalyzeSyntaxResponseresponse=language.analyzeSyntax(request);
// Print the response
for(Tokentoken:response.getTokensList()){
System.out.printf("\tText: %s\n",token.getText().getContent());
System.out.printf("\tBeginOffset: %d\n",token.getText().getBeginOffset());
System.out.printf("Lemma: %s\n",token.getLemma());
System.out.printf("PartOfSpeechTag: %s\n",token.getPartOfSpeech().getTag());
System.out.printf("\tAspect: %s\n",token.getPartOfSpeech().getAspect());
System.out.printf("\tCase: %s\n",token.getPartOfSpeech().getCase());
System.out.printf("\tForm: %s\n",token.getPartOfSpeech().getForm());
System.out.printf("\tGender: %s\n",token.getPartOfSpeech().getGender());
System.out.printf("\tMood: %s\n",token.getPartOfSpeech().getMood());
System.out.printf("\tNumber: %s\n",token.getPartOfSpeech().getNumber());
System.out.printf("\tPerson: %s\n",token.getPartOfSpeech().getPerson());
System.out.printf("\tProper: %s\n",token.getPartOfSpeech().getProper());
System.out.printf("\tReciprocity: %s\n",token.getPartOfSpeech().getReciprocity());
System.out.printf("\tTense: %s\n",token.getPartOfSpeech().getTense());
System.out.printf("\tVoice: %s\n",token.getPartOfSpeech().getVoice());
System.out.println("DependencyEdge");
System.out.printf("\tHeadTokenIndex: %d\n",token.getDependencyEdge().getHeadTokenIndex());
System.out.printf("\tLabel: %s\n\n",token.getDependencyEdge().getLabel());
}
returnresponse.getTokensList();
}Node.js
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Node.js API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
// Imports the Google Cloud client library
constlanguage=require('@google-cloud/language');
// Creates a client
constclient=newlanguage.LanguageServiceClient ();
/**
* TODO(developer): Uncomment the following lines to run this code
*/
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';
// Prepares a document, representing a text file in Cloud Storage
constdocument={
gcsContentUri:`gs://${bucketName}/${fileName}`,
type:'PLAIN_TEXT',
};
// Need to specify an encodingType to receive word offsets
constencodingType='UTF8';
// Detects the sentiment of the document
const[syntax]=awaitclient.analyzeSyntax({document,encodingType});
console.log('Parts of speech:');
syntax.tokens.forEach(part=>{
console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
console.log('Morphology:',part.partOfSpeech);
});Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
fromgoogle.cloudimport language_v1
defsample_analyze_syntax(gcs_content_uri):
"""
Analyzing Syntax in text file stored in Cloud Storage
Args:
gcs_content_uri Google Cloud Storage URI where the file content is located.
e.g. gs://[Your Bucket]/[Path to File]
"""
client = language_v1.LanguageServiceClient()
# gcs_content_uri = 'gs://cloud-samples-data/language/syntax-sentence.txt'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {
"gcs_content_uri": gcs_content_uri,
"type_": type_,
"language": language,
}
# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8
response = client.analyze_syntax (
request={"document": document, "encoding_type": encoding_type}
)
# Loop through tokens returned from the API
for token in response.tokens:
# Get the text content of this token. Usually a word or punctuation.
text = token.text
print(f"Token text: {text.content}")
print(f"Location of this token in overall document: {text.begin_offset}")
# Get the part of speech information for this token.
# Part of speech is defined in:
# http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
part_of_speech = token.part_of_speech
# Get the tag, e.g. NOUN, ADJ for Adjective, et al.
print(
"Part of Speech tag: {}".format(
language_v1.PartOfSpeech .Tag (part_of_speech.tag).name
)
)
# Get the voice, e.g. ACTIVE or PASSIVE
print(
"Voice: {}".format(
language_v1.PartOfSpeech .Voice (part_of_speech.voice).name
)
)
# Get the tense, e.g. PAST, FUTURE, PRESENT, et al.
print(
"Tense: {}".format(
language_v1.PartOfSpeech .Tense (part_of_speech.tense).name
)
)
# See API reference for additional Part of Speech information available
# Get the lemma of the token. Wikipedia lemma description
# https://en.wikipedia.org/wiki/Lemma_(morphology)
print(f"Lemma: {token.lemma}")
# Get the dependency tree parse information for this token.
# For more information on dependency labels:
# http://www.aclweb.org/anthology/P13-2017
dependency_edge = token.dependency_edge
print(f"Head token index: {dependency_edge.head_token_index}")
print(
"Label: {}".format(
language_v1.DependencyEdge .Label (dependency_edge.label).name
)
)
# Get the language of the text, which will be the same as
# the language specified in the request or, if not specified,
# the automatically-detected language.
print(f"Language of the text: {response.language}")
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Natural Language reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Natural Language reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Natural Language reference documentation for Ruby.