1
\$\begingroup\$

The given datasets are graph data structure that represents social interactions.

The nodes will be represented as People{node_id, edge, gender, occupation} and the edges indicate how often individuals communicate with each other, whether through phone calls, emails, or messages. A higher weight signifies more frequent communication.

There are 10 persons where 60% will be female and 30% will be male and 10% will be of unknown gender.

'target' represents their genders.

enter image description here

People Nodes

node_id age gender occupation
1 41 Female 0
2 40 Unclassified 3
3 21 Female 1
4 43 Female 4
5 31 Male 3
6 30 Female 0
7 28 Female 1
8 29 Male 4
9 39 Female 1
10 32 Male 0

Social Interaction Edges

from to weight
1 2 0.450499
1 3 0.833195
1 4 0.449754
1 5 0.539692
1 6 0.293488
1 7 0.496794
1 8 0.514994
1 9 0.840499
1 10 0.412794
2 3 0.684105
2 4 0.963660
2 5 0.943470
2 6 0.192717
2 7 0.924245
2 8 0.201781
2 9 0.425102
2 10 0.613922
3 4 0.749955
3 5 0.675580
3 6 0.293714
3 7 0.816202
3 8 0.043064
3 9 0.922738
3 10 0.458666
4 5 0.034314
4 6 0.840261
4 7 0.925287
4 8 0.118203
4 9 0.547889
4 10 0.779928
5 6 0.624413
5 7 0.227053
5 8 0.695268
5 9 0.318876
5 10 0.960750
6 7 0.428481
6 8 0.798711
6 9 0.543386
6 10 0.277181
7 8 0.215006
7 9 0.285211
7 10 0.772858
8 9 0.963206
8 10 0.676292
9 10 0.412905

Adjacency Matrix representation of the edges

1 2 3 4 5 6 7 8 9 10
1 0.0000 0.4505 0.8332 0.4498 0.5397 0.2935 0.4495 0.8489 0.4754 0.8805
2 0.4505 0.0000 0.1734 0.3952 0.5868 0.0141 0.0954 0.7217 0.5633 0.6244
3 0.8332 0.1734 0.0000 0.9267 0.9653 0.1988 0.3708 0.2360 0.6955 0.2956
4 0.4498 0.3952 0.9267 0.0000 0.6070 0.7113 0.6688 0.2561 0.1393 0.1055
5 0.5397 0.5868 0.9653 0.6070 0.0000 0.7902 0.6659 0.0404 0.6044 0.4565
6 0.2935 0.0141 0.1988 0.7113 0.7902 0.0000 0.5913 0.7107 0.5398 0.2184
7 0.4495 0.0954 0.3708 0.6688 0.6659 0.5913 0.0000 0.1109 0.2031 0.4165
8 0.8489 0.7217 0.2360 0.2561 0.0404 0.7107 0.1109 0.0000 0.9429 0.8833
9 0.4754 0.5633 0.6955 0.1393 0.6044 0.5398 0.2031 0.9429 0.0000 0.3243
10 0.8805 0.6244 0.2956 0.1055 0.4565 0.2184 0.4165 0.8833 0.3243 0.0000

This is an implementation of GNN that take adjacency matrix as edge inputs.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
# Sample data for the nodes in the graph
nodes_data = {
 'node_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'age': [41, 40, 21, 43, 31, 30, 28, 29, 39, 32],
 'gender': [2, 0, 2, 2, 1, 2, 2, 1, 2, 1], # 0: Unknown, 1: Male, 2: Female
 'occupation': [0, 3, 1, 4, 3, 0, 1, 4, 1, 0]
}
# Adjacency matrix representing the edges data between nodes
adjacency_matrix = np.array([
 [0.0000, 0.4505, 0.8332, 0.4498, 0.5397, 0.2935, 0.4968, 0.5149, 0.8405, 0.4128],
 [0.4505, 0.0000, 0.6841, 0.9637, 0.9435, 0.1927, 0.9242, 0.2018, 0.4251, 0.6139],
 [0.8332, 0.6841, 0.0000, 0.7500, 0.6756, 0.2937, 0.8162, 0.0431, 0.9227, 0.4587],
 [0.4498, 0.9637, 0.7500, 0.0000, 0.0343, 0.8403, 0.9253, 0.1182, 0.5479, 0.7799],
 [0.5397, 0.9435, 0.6756, 0.0343, 0.0000, 0.6244, 0.2271, 0.6953, 0.3189, 0.9608],
 [0.2935, 0.1927, 0.2937, 0.8403, 0.6244, 0.0000, 0.4285, 0.7987, 0.5434, 0.2772],
 [0.4968, 0.9242, 0.8162, 0.9253, 0.2271, 0.4285, 0.0000, 0.2150, 0.2852, 0.7729],
 [0.5149, 0.2018, 0.0431, 0.1182, 0.6953, 0.7987, 0.2150, 0.0000, 0.9632, 0.6763],
 [0.8405, 0.4251, 0.9227, 0.5479, 0.3189, 0.5434, 0.2852, 0.9632, 0.0000, 0.4129],
 [0.4128, 0.6139, 0.4587, 0.7799, 0.9608, 0.2772, 0.7729, 0.6763, 0.4129, 0.0000]
], dtype=np.float32)
# Convert the node data into a DataFrame
nodes_df = pd.DataFrame(nodes_data)
# Convert node_id to zero-based indexing (if needed for model consistency)
nodes_df['node_id'] = nodes_df['node_id'] - 1
# Extract features from the DataFrame and convert to numpy array
features = nodes_df[['age', 'gender', 'occupation']].to_numpy()
num_features = features.shape[1] # Number of features for each node
num_nodes = features.shape[0] # Number of nodes in the graph
# Target labels representing genders
target_labels = nodes_df['gender'].to_numpy()
# Define a custom Graph Convolution Layer
class GraphConvLayer(layers.Layer):
 def __init__(self, output_dim, **kwargs):
 super(GraphConvLayer, self).__init__(**kwargs)
 self.output_dim = output_dim
 def build(self, input_shape):
 feature_shape = input_shape[0][-1] # Shape of the input features
 # Initialize the weights for the layer
 self.kernel = self.add_weight(
 shape=(feature_shape, self.output_dim),
 initializer='glorot_uniform',
 name='kernel'
 )
 def call(self, inputs):
 features, adj_matrix = inputs
 # Perform graph convolution by multiplying adjacency matrix with features
 output = tf.matmul(adj_matrix, features)
 # Apply the learned weights
 output = tf.matmul(output, self.kernel)
 return output
# Function to create the GNN model
def create_gnn_model(input_shape, output_dim, num_nodes):
 # Define input layers for features and adjacency matrix
 features_input = keras.Input(shape=(num_nodes, input_shape), name='features')
 adj_matrix_input = keras.Input(shape=(num_nodes, num_nodes), name='adj_matrix')
 # Apply the first Graph Convolution Layer
 x = GraphConvLayer(16)([features_input, adj_matrix_input])
 x = layers.ReLU()(x) # Apply ReLU activation
 # Apply the second Graph Convolution Layer
 x = GraphConvLayer(output_dim)([x, adj_matrix_input])
 return keras.Model(inputs=[features_input, adj_matrix_input], outputs=x)
# Create the GNN model
gnn_model = create_gnn_model(num_features, 3, num_nodes) # 3 output classes for gender (Unknown, Male, Female)
# Compile the model with Adam optimizer and Sparse Categorical Crossentropy loss
gnn_model.compile(
 optimizer=keras.optimizers.Adam(learning_rate=0.01),
 loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
 metrics=[keras.metrics.SparseCategoricalAccuracy(name='acc')]
)
# Data preparation for training
features_input = features.astype(np.float32) # Convert features to float32 type
adj_matrix_input = adjacency_matrix.astype(np.float32) # Convert adjacency matrix to float32 type
# Expand dimensions to match the input shape (batch size, num_nodes, num_features)
features_input = np.expand_dims(features_input, axis=0)
adj_matrix_input = np.expand_dims(adj_matrix_input, axis=0)
target_labels = np.expand_dims(target_labels, axis=0) # Expand dimensions of target_labels to match the batch size
# Print shapes of inputs and targets for verification
print("features_input shape:", features_input.shape)
print("adj_matrix_input shape:", adj_matrix_input.shape)
print("target_labels shape:", target_labels.shape)
# Train the model
history = gnn_model.fit(
 x=[features_input, adj_matrix_input], # Inputs to the model
 y=target_labels, # Target labels
 epochs=100, # Number of epochs
 batch_size=1, # Batch size
 validation_split=0 # Set validation_split to 0
)
# Plot training loss over epochs
plt.plot(history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()
# Plot training accuracy over epochs
plt.plot(history.history['acc'])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()
Reinderien
71k5 gold badges76 silver badges256 bronze badges
asked Jul 17, 2024 at 15:57
\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

For the most part this is textbook bog standard boilerplate, of uncertain original authorship.

zero origin

This doesn't seem like a convenient notation.

nodes_data = {
 'node_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 ...
# Convert node_id to zero-based indexing (if needed for model consistency)
nodes_df['node_id'] = nodes_df['node_id'] - 1

Why not just adopt sensible node identifiers from the get go? As written, I need to keep worrying about "node 3, now is that the adjusted node 3 or the input node 3?"

Also, you appear to be offering an undirected graph using digraph notation. Consider writing down just the lower left triangle, and then have code express the notion that upper right is definitely the exact mirror image, with no typographic transcription errors.

redundant conversion

adjacency_matrix = np.array([ ...
], dtype=np.float32)
...
adj_matrix_input = adjacency_matrix.astype(np.float32) # Convert adjacency matrix to float32 type

We already had floats.

Also, the code already told us we were converting to type float, there's no need for an English sentence to say the exact same thing. We express "how?" in the code, and "why?" in the comments.

motivation

You didn't set up the problem to be solved, you cited no authors, it's unclear what the meaning of e.g. 0.8332 as an edge weight is supposed to be. If the task is to infer gender from observed occupation, then tell us that. We can only judge correctness against some written specification. As written, it's hard to say much more about the code than "it ran, and it didn't crash".

Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
answered Jul 17, 2024 at 17:35
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.