[フレーム]
Docs
Neo4j DBMS
Neo4j Aura
Neo4j Tools
Neo4j Graph Data Science
Cypher Query Language
Generative AI
Create applications
Connect data sources
Labs
GenAI Ecosystem
Developer Tools
Frameworks & Integrations
RDF & Linked Data
Get Help
Community Forum
Discord Chat
Product Support
Neo4j Developer Blog
Neo4j Videos
GraphAcademy
Beginners Courses
Data Scientist Courses
Generative AI Courses
Neo4j Certification
Get Started Free
Search
Skip to content
Raise an issue

Machine learning pipeline

This example is a simplified version of the Link Prediction pipeline described in the Machine learning section.

Create the graph

The following Cypher query creates the graph of a small social network in the Neo4j database.

CREATE
 (alice:Person {name: 'Alice', age: 38}),
 (michael:Person {name: 'Michael', age: 67}),
 (karin:Person {name: 'Karin', age: 30}),
 (chris:Person {name: 'Chris', age: 52}),
 (will:Person {name: 'Will', age: 6}),
 (mark:Person {name: 'Mark', age: 32}),
 (greg:Person {name: 'Greg', age: 29}),
 (veselin:Person {name: 'Veselin', age: 3}),
 (alice)-[:KNOWS]->(michael),
 (michael)-[:KNOWS]->(karin),
 (michael)-[:KNOWS]->(chris),
 (michael)-[:KNOWS]->(greg),
 (will)-[:KNOWS]->(michael),
 (will)-[:KNOWS]->(chris),
 (mark)-[:KNOWS]->(michael),
 (mark)-[:KNOWS]->(will),
 (greg)-[:KNOWS]->(chris),
 (veselin)-[:KNOWS]->(chris),
 (karin)-[:KNOWS]->(veselin),
 (chris)-[:KNOWS]->(karin)

The graph looks as follows:

LP example data.

The next query creates an in-memory graph called friends from the Neo4j graph. Since the Link Prediction model requires the graph to be undirected, the orientation of the :KNOWS relationship is discarded.

MATCH (source:Person)-[r:KNOWS]->(target:Person)
RETURN gds.graph.project(
 'friends',
 source,
 target,
 {
 sourceNodeProperties: source { .age },
 targetNodeProperties: target { .age },
 relationshipType: 'KNOWS'
 },
 { undirectedRelationshipTypes: ['KNOWS'] }
)

Configure the pipeline

You can configure a machine learning pipeline with a sequence of Cypher queries.

  • Create the pipeline and add it to the pipeline catalog:

    CALL gds.beta.pipeline.linkPrediction.create('pipe')
  • Add the link features (only age here) and a feature type (l2 here):

    CALL gds.beta.pipeline.linkPrediction.addFeature(
     'pipe',
     'l2',
     { nodeProperties: ['age'] }
    )
  • Configure the train-test split and the number of folds for cross-validation:

    CALL gds.beta.pipeline.linkPrediction.configureSplit(
     'pipe',
     {
     testFraction: 0.25,
     trainFraction: 0.6,
     validationFolds: 3
     }
    )
  • Add a model candidate (a logistic regression with no further configuration here):

    CALL gds.beta.pipeline.linkPrediction.addLogisticRegression('pipe')

Train a model

Once configured, the pipeline is ready to train a model. The training process returns the best performing model with the specified evaluation metrics.

The pipeline configuration shown in the previous section is simplified for convenience; as such, the model performance is not expected to be the best. See the Link prediction pipelines page for a detailed walkthrough.

CALL gds.beta.pipeline.linkPrediction.train(
 'friends', (1)
 {
 pipeline: 'pipe', (2)
 modelName: 'lp-pipeline-model', (3)
 targetRelationshipType: 'KNOWS', (4)
 metrics: ['AUCPR'], (5)
 randomSeed: 42 (6)
 }
)
YIELD modelInfo
RETURN
 modelInfo.bestParameters AS winningModel, (7)
 modelInfo.metrics.AUCPR.train.avg AS avgTrainScore, (8)
 modelInfo.metrics.AUCPR.validation.avg AS avgValidationScore,
 modelInfo.metrics.AUCPR.outerTrain AS outerTrainScore,
 modelInfo.metrics.AUCPR.test AS testScore
1 Name of the projected graph to use for training.
2 Name of the configured pipeline.
3 Name of the model to train.
4 Name of the relationship to train the model on.
5 Metrics used to evaluate the models (AUCPR here).
6 The random seed is only needed to obtain the same results across runs.
7 Parameters of the best performing model returned by the training process.
8 Evaluated metrics (here for AUCPR) of the best performing model returned by the training process.
Table 1. Results
winningModel avgTrainScore avgValidationScore outerTrainScore testScore

{batchSize=100, classWeights=[], focusWeight=0.0, learningRate=0.001, maxEpochs=100, methodName="LogisticRegression", minEpochs=1, patience=1, penalty=0.0, tolerance=0.001}

0.5740740741

0.3611111111

0.3784126984

0.3444444444

Use the model for prediction

You can use the trained model to predict the probability that a link exists between two nodes in a projected graph.

CALL gds.beta.pipeline.linkPrediction.predict.stream( (1)
 'friends', (2)
 {
 modelName: 'lp-pipeline-model', (3)
 topN: 5 (4)
 }
)
YIELD node1, node2, probability
RETURN
 gds.util.asNode(node1).name AS person1,
 gds.util.asNode(node2).name AS person2,
 probability
ORDER BY probability DESC, person1
1 Run the prediction in stream mode (return the predicted links as query results).
2 Name of the projected graph to run the prediction on.
3 Name of the model to use for prediction.
4 Maximum number of predicted relationships to output.
Table 2. Results
person1 person2 probability

"Karin"

"Greg"

0.4991379664

"Mark"

"Karin"

0.4989714183

"Mark"

"Greg"

0.4986938388

"Will"

"Veselin"

0.4986938388

"Mark"

"Alice"

0.4971949275

Next steps

Try to improve the performance of the training by using different model candidates, adding node properties to the features, or configuring autotuning.

AltStyle によって変換されたページ (->オリジナル) /