160Chapter 5
Foragracefulintroductiontoboth,wewarmlyrecommendthebookfrom
Hamilton(2020).WhileitissomewhatdatedinthespecificGNNmodelsit
studies,itprovidesagentleandintuitiveoverviewofcomputingimportant
metricson agraph,aswell asmanyrelevantmodelsthat precededgraphneural
networks (such as node embedding techniques).
Then,shouldyouwishtodivedeeperintothemanyexcitingfacetsof
graphmachinelearning,thebookofL.Wuetal.(2022)isanexcellentand
comprehensive reference which still holds up very well today.
InparallelwithdivingdeeperintoGNNs,orevenbeforedoingso,we
believeitisworthwhiletogainastrongfoundationingraphtheory. Thiswill
beofsignificantvalue:bothingainingimportantintuitionsforunderstand-
ingGNNs,andforinspirationonwaystofurtherimprovethem.For this,we
recommend the foundational text from Bollobás (1998).
Exercises
1.A linear permutation-equivariant functionon a set is afunction of the form F(X)=
FXsatisfyingFPX=PFXforeverypermutationmatrixP.Showthatanylin-
earpermutation-equivariantfunctioncanbewrittenasalinearcombinationofthe
identity and average functions, i.e. FX=(I+11
⊤
)X.
2.Let Xbeafeature spacecontaining acountable setofpossible values. Thismeans
that every possible set ofvalues coming from Xcan be represented by a bitmask in
B
X
(where B={0,1} is the set of bits), and any permutation-invariant real function
over sucha set must be representable asf:B
X
→R. Show that such a function can
alwaysbeexpressedasaDeepSetsmodel,i.e.,f(X)=φ
P
i∈V
ψ
(
x
i
)
,forsome
choice of ψ:X→Rand φ:R→R, for every set V∈B
X
.
3.Considertherecipeforbuildingpermutation-equivariantgraphneuralnetworks
F(X,A) fromEquation 5.53.Recallhowit leverages a localfunction φ(x
u
,X
N
u
),
applied to theneighbourhood of eachnode u∈Vinparallel. Prove that, if φis per-
mutationinvariant, thatis,φ(x
u
,PX
N
u
)=φ(x
u
,X
N
u
),then Fwillbepermutation
equivariant, thatis, F(PX,PAP
⊤
)=PF(X,A),for every permutation matrixP.
(Note that, when N
u
⊂V, the space of Pdiffers across the two equations.)
4.ApopularwaytoimplementattentionalGNNscomputesattentionallogitsas
a(x
u
,x
v
)=σ
a
⊤
W[x
u
∥x
v
]
,wherea:R
l
,W:R
l×ばつ2k
aretheparameters,∥:R
n
×ばつ
R
m
→R
n+m
isvector concatenation,andσ:R→Risamonotonicactivation func-
tion. Another popularvariant appliestwo of these operations inreverse: a(x
u
,x
v
)=
a
⊤
σ
W[x
u
∥x
v
]
.Explain,usingaproof,whichofthetwovariantsis moreexpres-
sive(Hint:considertherankingbetweenattentionallogitsacrossallthenodesin
the neighbourhood).Which of thetwo formsis more scalable,and why?(Note that,
as the softmax function is monotonic, you may ignore it for this question.)
5.RecallthatsubgraphGNNsshouldbeequivarianttoG=S
n
×ばつS
m
,whereS
n
isthe
n-elementpermutationgroup,nisthenumberofnodesinthegraphandmisthe