Foundations of Geometric Deep Learning

•

Symmetriesaredefinedoversignalspaces,definedoveraphysical

domain.

•

Inordertobeatthecurseofdimensionality,weextendtheinvariance

prior with deformation stability and scale separation.

•

These priors can be used constructively in a generic neural architecture:

the GDL blueprint.

Intheprevious chapter, wehaveintroducedtheformalismofgrouptheory

to account for the symmetries of ageneric learning task. While this formalism

provides thebasicbackbonetothink aboutgeometricdeeplearning,weneed

to complement it with additional structuralassumptions in order to effectively

beat the curse of dimensionality.

First,theinputspaceXofmanylearningproblemsishigh-dimensional,

where symmetrystructures arechallenging todetermine in allgenerality. Our

first taskwill betodescribe insteadinputspaces assignal spaces,wherex∈X

isitselfamappingx:Ω→CfromaphysicaldomainΩtoanoutputchannel

spaceC.Insomeapplications,itmaybeconvenienttoconsiderinsteadthe

domain Ωas being the input, thus varying. Once these objects will be defined,

wewillseehowthenotionsofinvarianceandequivarianceapplytothissetting,

thus defining the familiar neural network geometric layers.

The physical domain isequipped with a metric structure,which will enable

ustogosubstantiallyfurtherthanthealgebraicnotionsofinvarianceandequiv-

ariance.Indeed, thisaddedmetric structureenablesthe notionofscale, iethe

factthatsignalmeasurements canbearrangedintoneighborhoodsofvarying

size.Asitturnsout,thismetricstructureprovidestwokeygeometricpri-

ors:stabilitytodeformationsandscaleseparation.Finally,combiningboth

algebraicand geometricpriorswillnaturally leadtoa blueprintfordesigning

neural neworks — the GDL blueprint.

102Chapter 4

4.1Geometric Domains

Weconcluded inchapter 2thatlearningin highdimensionsisintractablewith-

outinductivebiases,andwhileknowledgeaboutsymmetriescouldprovide

astronginductivebias,wesawintheprevioussectionthatitisunlikelywe

willhaveknowledgeofthefullsymmetrygroupofthelabelfunction.For-

tunatelynatureprovidesuswithasolution:inagreatmanyproblems,the

dataweobservearesignalsonalow-dimensionaldomainwithstructureand

symmetries that can be leveraged for more statistically efficient learning.

4.1.1Physical Domains

The physical domain Ωis where signal measurements take place, and depend-

ingontheapplicationitmaybedefinedwithincreasingamountofstructure.

Thefollowingchapterswilldelveonthekeyrepresentativeapplications,but

here we give a first glimpse of the diversity of physical domains that arises.

Incomputervisionapplications,imagesxcanbeviewedassignalsovera

two-dimensionalgridΩ





1≤i,j≤N

,whereeachpixelu∈Ω encodes

colorintensitiesx(u)∈{1,255}

:=C.Itwillbeconvenienttoviewthese

domainsasdiscretizationsofanunderlyingcontinuousobjectasN →∞,given

byarectangulardomain

Ω=[0,1]

⊂R

,orinsomesettingsitsunbounded

extension

Ω=R

, as well as

C=R

AnotherrepresentativesettingisgivenbyΩ={1,N},asetofNelements.

Thisrepresentsthecrudestformofstructure,absentanynotionoftopology

normetric structure,and appliestosettings suchas pointclouds,in computer

graphics.ThenextfundamentaldomainisgivenbygraphsΩ=G=(V,E),

wherenowelementscanberelatedtoeachother,thusdefiningatopology

and ultimately a metric structure.

Finally, we also encountersituations where the inputto the learning system

isthedomainΩ itself,suchas3Dmeshesincomputergraphics,orgraphsin

somealgorithmictasks.Theproblemsweconsiderthuscomeintwobroad

categories: problemswhereeachdatapointx∈Xisasignalx:Ω→Conthe

samedomainΩ(e.g.everyimagecanbeviewedasafunctionontheplane),

andproblemswhereeachdatapointcorrespondstoadifferentdomain(e.g.

each data point is a different graph or mesh).

ThekeytakeawayisthatthedomainΩ isgenerallyamuchsimplerobject

thantheinputspaceX,enablingaricherdescriptionoftherelevantsymme-

tries.Beforeexecutingthisplan,weneedfirsttounderstandhowtorelateX

to Ω, and, importantly, how symmetries defined over Ωcarry over to X.

Foundations of Geometric Deep Learning103

4.1.2Spaces of Signals on a Domain

NowthatwehaveintroducedthesignaldomainΩ,wecanrevisittheinput

space to the ML model as a vector space of functions over Ω:

The spaceof C-valued signals onΩ

(for Ω aset, possibly withadditional

structure, and Ca vector space, whose dimensions are called channels)

X(Ω,C)={x:Ω→C}(4.36)

isafunctionspacethathasavectorspacestructure.Additionandscalar

multiplication of signals is defined as:

(αx+βy)(u)=αx(u)+βy(u)for allu∈Ω,

with real scalars α,β. Given an inner product ⟨v,w⟩

on Cand ameasure

μon Ω (with respect to which wecan define an integral), we can define an

inner product on X(Ω,C) as

⟨x,y⟩=

⟨x(u),y(u)⟩

dμ(u).(4.37)

Weobservetwosimple,yetimportantpropertiesofthisspace:whileinmany

caseslinearcombinationsofpointsonΩare notwell-defined

,wecanlinearly

combinesignalson it,i.e.,thespace ofsignals formsavectorspace.Moreover,

sincewecandefineaninnerproductbetweensignals,thisspaceisaHilbert

space.

How SignalsTransform: RegularandInduced RepresentationsGiven a

symmetrygroupGactingonthesignaldomainΩ,thisactioncanbenatu-

rally upgraded toXthrough the notionofinduced representation. Let us now

describe it in simple terms.

To fix ideas, suppose first that C=R. Denoting the action of Gon Ωby

G×ばつΩ→Ω(4.38)

(g,u)7→g·u ,(4.39)

we now consider the linear representation

G×ばつX→X(4.40)

(g,x)7→ρ(g)x ,(4.41)

with(ρ(g)x)(u):=x(g

–1

·u).Weverify thatρisindeed agrouprepresentation,

whichsimplyconvertstransformationsofthedomainintotransformationsof

signals, actingby composition.Figure 4.1illustratesa simpleexamplewhenG

isthetranslationgroupinΩ(assumingforsimplicityno boundaryconditions).

104Chapter 4

Figure 4.1

Example of the translation group defining a transformation over X.

Figure 4.2

Example of a color transformation by reflection g.y=±y.

Thisformalism notonlyenables toexpressdomaintransformations,butalso

moregeneralsituationswheregroupsarenotonlyactingonthedomainΩ

butalsoonthechannel(orfiber)C.Asamotivatingexample,supposenow

thatC=R

istheRGBcolorspace(assumedcontinous),andthat,ontopof

translations/rotations over Ω,we canalso performcolortransformations inC;

seeFigure4.2.Then,if

GisnowagroupactingonC,wecandefineajoint

representation for G×ばつ

Gas follows:

(ρ(g,

g)x)(u):=

–1

·(x(g

–1

·u)) .(4.42)

Finally,anotherrelevantscenarioiswhenthesamegroupGdefinesajoint

groupactiononbothdomainandchannelspaces,inwhichcasewewrite

(ρ(g)x)(u)=g

–1

·(x(g

–1

·u)). Figure4.3 illustrates thisscenario over 2D vector

fields and the rotation group.

Insummary,wehavenowseen howan initiallow-dimensional description

ofthesymmetries–expressedoverphysicaldomains–canbenaturallybe

extended to signals on that domain.

Foundations of Geometric Deep Learning105

Figure 4.3

Therotationgroupactingbothon signaldomainandoutput channelspace,here illus-

trated over a vector field.

4.1.3Fixed domain problems & domain symmetries

When thedomain Ωis fixed andknown inadvance, wecan determineits sym-

metrygroupandusethisknowledgeformoreefficientlearning.Thisworks

when the symmetriesof the domain arealso symmetries of the labelfunction,

whichis acommonoccurrence.For example,our datamayconsistofsignals

ontheplane(e.g.cameraimages)oronthesphere(e.g.globalweathersignals).

Shifts and translationsare symmetries of theplane, and 3D rotations aresym-

metries ofthe sphere (sincethese mapsare invertibleandstructure-preserving,

mapping the domain onto itself whilepreserving the relevant structure such as

distances).

Aswehavejustseen,thesesymmetriescanbeappliednotjusttopoints

inthedomainitself,butalsotosignalsonthedomain(See4.1.2),andwill

generallynotchangethesemanticsofthesignal.Hence,ourneuralnetwork

should process inputs related by symmetries in the same way.

Unlike the fullgroup of symmetries of thelabel function, we know symme-

tries ofthe domain apriori, sowe can usethis knowledge to ouradvantage to

constraintthefunctionclassF.Howthisisdoneexactlyviainvarianceand

equivariance will be discussed in section 4.5.

Finally,animportantwordonthedefintionofsymmetry:sincewedefined

symmetriesasstructurepreservinginvertiblemapsfromthedomaintoitself,

what counts as asymmetry depends on what structure one wishesto preserve.

For instance, in Ω=R

, rigid transformations of theform φ(x)=Ax+b, where

⊤

A=I

, areoften symmetries ofimage classification problems,since theydo

notchangethesemantics.However,insomecasesthisisnottrue:the‘clas-

sic’exampleisdigitclassificationbetweendigits‘6’and‘9’.Unlikegeneric

106Chapter 4

object classification, thistask is not rotation-invariant. Wewill thus be careful

to remember that symmetries are inherently a user-specified notion.

4.1.4Variable domain problems & domain isomorphisms

When thedomain isvariable, sois thesymmetry group. There maybe ways to

dealwiththesymmetriesofeachdomaininthecategoryofdomainsundercon-

sideration, but moreimportantly we needto make surewe dealcorrectly with

domainsthatlookdifferentbutareactuallyequivalent(isomorphic)insome

sense.Ingeneral,theappropriatenotionofequivalencedependsonthecon-

text, but what we can say in general isthat two domainsare equivalent if there

isaninvertibleandstructure-preservingmap(isomorphism)betweenthem.We

willseevariousexamplesof structureddomainsandstructure-preservingmaps

throughoutthechaptersofpartII,butfornowwewillsketchtwoimportant

examples: graphs and meshes/manifolds.

Onewaytorepresentagraphisasasetofnodelabels(e.g.arbitrarybut

uniqueintegers)togetherwithasetofedges(pairsofnodelabels).Thensimply

changing thenodelabels gives usa new graphthat isneverthelessequivalent,

and the mapping that sendseach label of the original graph tothe new label is

an isomorphism.

If we havetwo objectsthat are notidentical but still equivalent (under some

notion of equivalence that depends on thecontext), we need toensure that the

functionswe learnproduceanidentical oratleastequivalentoutput forthetwo

objects. For graph classification(chapter 5), thismeans thattwographs related

by a graph isomorphism (See fig. 4.4, left) should be assigned the same label.

Anotherexampleismeshesormanifolds(chapter8),whichmaybecon-

sideredequivalentifthereexistsadistance-preservingmap(i.e.weconsider

isometriestobeisomorphisms);seeFigure4.5.Again,herewewanttwo

meshesrelatedbyanisometrytobetreated thesameway;we wantthe method

tobeintrinsic.Weshallseehowtobuildinvariantrepresentationsforthese

type of surfaces in Chapter 8.

4.2Invariant Function Classes

Atthispoint,itisusefultorecallourlearningsetupfromChapter2,andthe

symmetrypriorsfromChapter3thatpresumablyalleviatethelearningtask.

Wehavef

∗

:X(Ω)→Yanunknownfunctionthatwewishtolearnusingan

hypothesisclassF.Moreover,wealsohaveasymmetrygroupGactingon

X,andthepromisethatf

∗

isG-invariant:f

∗

(g.x)=f

∗

(x)forallx∈Xandall

g∈G. What is the easiest way to combine these objects?

Foundations of Geometric Deep Learning107

123

456

123

456

100101

010001

001110

101101

001011

110111

110010

111101

011000

010101

100011

010111

APAP

⊤

Figure 4.4

Figure of graph isomorphisms. Above: representing a graph by integer node labels and

edges (iso = relabeling), and below: representing a graph as an adjacency matrix (iso =

permutation)

Assume for simplicity that the groupGis compact, meaning that it admits a

Haar measure

μ(g). This allows us to define a group averaging operator

f(x):=

μ(G)

f(g.x)μ(dg) .(4.43)

Inwords,thegroupaveragingoperatoraveragesanyfunctionf∈Foverthe

group orbits {g.x;g∈G}; see Figure 4.6.Observethat, since weare assuming

thatf

∗

isG-invariant,itholdsS

∗

.Sinceourtargetfunctionisconstant

alongorbits,itseemslikeagoodideatoenforceourhypothesistobealso

constantalongorbits—andhencetoreplacef∈Fbyitssmoothedversion

Inother words,we cannow turnourhypothesis spaceFintoa G-invariant

space via the averaging operator: S

F:={S

f;f∈F}.

108Chapter 4

Figure 4.5

The hand shape is both the domain and the input to the learning system.

Figure 4.6

Illustrationoftheorbitsoftherotationgroup,asitactsonimagesviathestandard

representation. TheGroupaveraging operatorS

averages ahypothesisf(x) over each

orbit.

Example1ConsiderΩ={1,...,N}thediscretesetofNelementsandG=

,thesymmetricgroupofNelements,whichactsonΩbypermutingits

elements.Weconsiderahypothesis classF⊂{f:R

→R}givenbyrealpoly-

nomialsf(x)=p(x

,...,x

)ofdegreeuptok.Theassociatedaveragedclass

F={S

p(x);p∈F}correspondsinthiscasetothespaceofsymmetric

polynomialsofdegree≤k.Itisgeneratedbytheso-calledringofpowersum

polymomials {q

,...,x

):=

i=1

NowletusformalizetheintuitionthatreplacingFwithS

Fisbenefi-

cialforlearningpurposes,focusingonaregressionproblemR(f)=E

x∼ν

|f(x)–

∗

(x)|

:=∥f–f

∗

∥

,whereνisthedatadistribution.Weassumethatthedata

distributionis compatiblewith thesymmetrystructurefromG: ifwedenote by

Foundations of Geometric Deep Learning109

[x]={g.x;g∈G}theorbitassociated withx,thenthe distributionofx, condi-

tionalon [x],isthe uniformmeasureon theorbit;precisely theHaarmeasure

of the group. Inother words,the data distribution is unchangedif we replace x

by g.x for any g∈G.

RecallthedecompositionoferrorfromChapter2.Letusfirstaddressthe

approximation error. We decompose

∥f–f

∗

∥

=∥f–S

f+S

f–f

∗

∥

=∥f–S

f∥

+∥S

f–f

∗

∥

+2⟨f–S

f,S

f–f

∗

⟩

=∥f–S

f∥

+∥S

f–f

∗

∥

+2E

[x]

x|[x]

[(f(x)–S

f(x))(S

f(x)–f

∗

(x))]

=∥f–S

f∥

+∥S

f–f

∗

∥

+2E

[x]

[(S

f(x)–S

f(x))(S

f(x)–f

∗

(x))]

=∥f–S

f∥

+∥S

f–f

∗

∥

wherethe fourthequalityholdsbecause f

∗

isconstant alongorbits.Inwords,

thegroupaveragingisanorthogonalprojectioninL

(X,ν).Moreover,we

immediatelydeducethat∥f–f

∗

∥

≥∥S

f–f

∗

∥

foranyf∈F,andtherefore

thattheapproximationerrorofS

FisupperboundedbythatofF.Addi-

tionally,wewon’tformalizeithere,butitisnothardtoseethatsincegroup

averagingisaprojection,thestatisticalerrorisreducedbyaveraging.We

thushaveahypothesisclassS

Fforwhichboththeapproximationandthe

statistical errors are upper bounded by those of F.

Thissoundslikegreatnewsindeed,reaffirmingtheadvantageofusing

knownsymmetriesofthetargetfunctionintoourhypothesisclass.However,

there are(at least)twoimportantissues atthis point.First, thegroup averaging

operatordefinesaconvenientmathematicalobject,butitrequiresanaverag-

ing overgroup orbits, whichcan become prohivitively expensiveveryquickly,

forsufficientlylargegroups.Next,wehavearguedthatstatisticalerrorswill

besmallerinthesymmetrichypothesisspace,butwehaven’tactuallyquan-

tifiedbyhowmuch.Asitturnsout,thegainscanberoughylyquantifiedas

follows:given adataset of n points,the symmetric learning amountsto having

instead ≃|G|n datapoints whenever Gis discrete.In the continuoussetting, the

gains amountto using[x] instead ofx∈R

, since [x]liveson aspace oflower

dimensionality|Ω|–dim(G)thanx.Whilethismaybeasubstantialgain,itdoes

not unfortunately break thecurse of dimensionality. Indeed, inthe example of

images,theEuclideangroupinR

hasatmost6degreesoffreedom,atiny

number against the thousands or millions of pixels!

Wethusneed toaddressthesetwomajor aspectswithadditional ideas.For

thatpurpose,wewillnowintroducetwoimportantconceptsthatextendthe

notion of symmetricrepresentation and enable efficient learning in theface of

110Chapter 4

Figure 4.7

Twoobjectsmovingatdifferentvelocitiesinavideodefineatransformationoutside

the translation group.

high-dimensionaldata.Wefirstdiscussstabilitytogeneraldomaintransfor-

mationsthatare‘near’-symmetries,andthenintroducetheconceptofscale

separation,whichallowsustodecomposethelearningtaskacrossdifferent

resolutions by exploiting local interactions.

4.3Deformation Stability

The symmetryformalism introduced sofar captures anidealised world where

weknowexactlywhichtransformationsaretobeconsideredassymmetries,

andwewanttorespectthesesymmetriesexactly.Forinstanceincom-

putervision,wemightassumethatplanartranslationsareexactsymmetries.

However, the real world is noisy and this model falls short.

First,whilethesesimplegroupsprovideawaytounderstandglobalsym-

metriesofthedomainΩ(andbyextension,ofsignalsonit,X(Ω)),theydo

notcapturelocalsymmetrieswell.Forinstance,consideravideoscenewith

severalobjects,eachmovingalongitsown differentdirection.Atsubsequent

frames,theresultingscenewillcontainapproximatelythesamesemanticinfor-

mation,yetnoglobaltranslationexplainsthetransformationfromoneframe

toanother.Next,inothercases,suchasadeformable3Dobjectviewedby

acamera,itissimplyveryhardtodescribethegroupoftransformationsthat

preserve the object identity.

Theseexamplesillustratethat inrealitywe aremoreinterestedin afar larger

setoftransformationswhereglobal,exactinvarianceisreplacedbyalocal,

inexactone.Inourdiscussion,wewilldistinguishbetweentwoscenarios:

thesetting wherethedomain Ω isfixed, andsignalsx∈X(Ω) areundergoing

deformations, and the setting where the domain Ωitself may be deformed.

Foundations of Geometric Deep Learning111

Figure 4.8

Awarpingofthedomainthatisnot explainedbyanyrigidtransformationnearlypre-

serves the semantics.

4.3.1Stability to signal deformations

Inmany applications,weknow apriorithat asmalldeformation ofthesignal

x shouldnot changethe output off(x),so it istempting toconsider such defor-

mationsassymmetries.Forinstance,wecouldviewsmalldiffeomorphisms

τ∈Diff(Ω),orevensmall bijections, assymmetries. However, smalldeforma-

tionscanbecomposedtoformlargedeformations,so"smalldeformations"

donotformagroup

,andwecannotaskforinvarianceorequivarianceto

small deformations only. Sincelarge deformations can can actuallymaterially

changethesemanticcontentoftheinput,itisnotagoodideatousethefull

group Diff(Ω) as symmetry group either.

Abetterapproachistoquantifyhow"far"agivenτ∈Diff(Ω)isfroma

givensymmetrysubgroupG⊂Diff(Ω)(e.g.translations)withacomplexity

measure c(τ),so thatc(τ)=0 whenever τ∈G. We cannow replaceour previ-

ousdefinition ofexact invarianceand equivarance undergroupactions witha

‘softer’ notion of deformation stability (or approximate invariance):

∥f(ρ(τ)x)–f(x)∥≤Cc(τ)∥x∥,,∀x∈X(Ω),τ∈Diff(Ω) ,(4.44)

where ρ(τ)x(u)=x(τ

–1

u) as before,and whereC is some constantindependent

ofthesignalx.Afunctionf∈F(X(Ω))satisfyingtheaboveequationissaid

to be geometrically stable. We will seeexamples of such functionsin the next

Section 4.4.

Sincec(τ)=0 forτ∈G, thisdefinitiongeneralises theG-invarianceproperty

defined above. Itsutility inapplications dependson introducingan appropriate

deformationcost.InthecaseofimagesdefinedoveracontinuousEuclidean

plane,apopularchoiceisc

(τ):=

∥∇τ(u)∥

du,whichmeasuresthe‘elas-

ticity’ of τ,i.e., how different it isfrom the displacement bya constant vector

field. This deformation cost is in fact a norm often called the Dirichlet energy,

and can be used to quantify how far τis from the translation group.

112Chapter 4

Figure 4.9

The setofallbijective mappingsfromΩ intoitselfforms thesetautomorphismgroup

Aut(Ω),of whichasymmetrygroupG(shownas acircle)isa subgroup.GeometricSta-

bilityextendsthenotionofG-invarianceandequivarianceto‘transformationsaround

G’(shownasgrayring),quantifiedinthesenseofsomemetricbetweentransforma-

tions. In this example, a smooth distortion of the image is close to a shift.

4.3.2Stability to domain deformations

In many applications, the object being deformed is not the signal, but the geo-

metricdomainΩitself.Canonicalinstancesofthisareapplicationsdealing

withgraphsandmanifolds:agraphcanmodelasocialnetworkatdifferent

instanceoftimecontainingslightlydifferentsocialrelations(followgraph),

or a manifold can model a 3D object undergoing non-rigid deformations. This

deformationcanbe quantifiedas follows.If Ddenotes thespace ofall possible

variabledomains(suchasthespaceofallgraphs,orthespaceofRieman-

nianmanifolds), onecandefinefor Ω,

Ω∈Dan appropriatemetric(‘distance’)

d(Ω,

Ω)satisfyingd(Ω,

Ω)=0ifΩand

Ωareequivalentinsomesense:for

example,thegrapheditdistancevanisheswhenthegraphsareisomorphic,

and the Gromov-Hausdorff distance between Riemannian manifolds equipped

with geodesic distances vanishes when two manifolds are isometric

A commonconstruction ofsuch distancesbetween domainsrelies onsome

familyofinvertiblemappingη:Ω→

Ωthattryto‘align’thedomainsina

Foundations of Geometric Deep Learning113

waythatthecorrespondingstructuresarebestpreserved.Forexample, inthe

caseofgraphsorRiemannianmanifolds(regardedasmetricspaceswiththe

geodesicdistance),thisalignmentcancomparepair-wiseadjacencyordistance

structures (dand

d, respectively),

(Ω,

Ω)=inf

η∈G

∥d–

d◦(η×ばつη)∥

whereGisthegroupofisomorphismssuchasbijectionsorisometries,and

thenormisdefinedovertheproductspaceΩ×ばつΩ.Inotherwords,adistance

betweenelements ofΩ,

Ω is‘lifted’to adistancebetweenthe domainsthem-

selves, by accountingfor all thepossible alignmentsthat preserve the internal

structure

.Givenasignalx∈X(Ω)andadeformeddomain

Ω,onecanthen

consider the deformed signal

x=x◦η

–1

∈X(

Ω).

Byslightlyabusingthenotation,wedefineX(D)={(X(Ω),Ω):Ω∈D}

astheensembleofpossibleinputsignalsdefinedoveravaryingdomain.A

function f:X(D)→Yis stable to domain deformations if

∥f(x,Ω)–f(

Ω)∥≤C∥x∥d

(Ω,

Ω)(4.45)

forallΩ,

Ω∈D,andx∈X(Ω).Wewilldiscussthisnotionofstabilityinthe

contextof manifoldsin Chapter8, whereisometric deformationsplay acrucial

role. Furthermore, it can be shown that the stability to domain deformations is

a naturalgeneralisation ofthe stabilityto signaldeformations, byviewing the

latter intermsof deformationsof thevolume formGama,Ribeiro,andBruna

(2019).

4.4Scale Separation

While deformationstability substantiallystrengthens the globalsymmetry pri-

ors, itis notsufficient initself toovercome thecurse ofdimensionality,in the

sensethat,informallyspeaking,therearestill"toomany"functionsthatrespect

(4.44)asthesizeofthedomaingrows.Akeyinsighttoovercomethiscurse

istoexploitthe multiscalestructureofphysicaltasks. Letusfirstprovide the

essential idea before formalizing it with tools from harmonic analysis.

Byviewingtheinputfeaturesx

,...,x

=x(u

)∈Ctoalearningtask

as ‘particles’,one can interpretalearning systemas buildingan ‘energy’func-

tionorHamiltonianH(x

,...,x

),thatcomputesthelikelihoodofacertain

attribute,e.g. a cat detectorwould be associatedwith a ‘cat’function H so that

H(x)islargewhenevertheimagexcontainsacat.ThesimplestHamiltoni-

ans in physics are those which are separable: H(x)=

h(x

). In our example,

that wouldcorrespond to a catdetector that simplyaggregatesindividual pixel

features,irrespectiveof theirlocation.Sincehisalow-dimensionalfunction,

114Chapter 4

learningsuchseparablefunctionalsposesnostatisticalchallengence;how-

ever, this model,called amean-fieldmodel in physics,disregards any formof

interaction between features/particles,and as such isnot able to capture many

properties of interest.

Itthusseemslikeagoodideatoconsiderabroaderclassofenergyfunc-

tionsthataccountforinteractions.WhilegenericN-particleinteractions define

function spaces with the unavoidable curse of dimensionality, a natural idea is

to focus onlyon dominant interactions.As it turnsout, many physicalsystems

aredominatedbylocalinteractions,meaningthat‘nearby’particlesprovide

mostoftheinformation.Inotherwords,theenergymodelnowbecomes

H(x)=

h({x

;i∼j}),wherei∼jinformallydenotesthatparticlesiandj

areneighbors.Luckily,theseneighborhoodsareoftenofsizeonly depending

onthedimensionofthephysicaldomainΩ,andthus≪N.Backtoourcat

detector example, suchAnsatz wouldfocus on capturinginteractions between

neighboringpixels—whichmakessense,sinceshapesandtexturesform-

ingcomplexvisualconceptsaredefinedbynotionsofgeometry,andnatural

imagesareknowntohaveapower-lawcorrelationstructure(vanderSchaaf

andvanHateren1996).Theideaoffocusingonlocalinteractionshasbeen

presentinMLsinceatleasttheworkofGemanandGeman(1984)andtheir

seminal Markov Random Field models.

Wenowturntoformalizethispowerfulinductivebiasusingmultiscale

representations.Beforedescribingmultiscalerepresentations,weneedtointro-

duce themain elements ofFourier transforms, whichrely on frequency rather

than scale.

4.4.1Fourier Transform and Global invariants

ArguablythemostfamoussignaldecompositionistheFouriertransform,

thecornerstoneofharmonicanalysis.Theclassicalone-dimensionalFourier

transform

x(ξ)=

+∞

–∞

x(u)e

–iξu

expresses thefunction x(u)∈L

(Ω) onthe domainΩ=Rasa linearcombina-

tion oforthogonal oscillating basisfunctions φ

(u)=e

iξu

, indexedby theirrate

ofoscillation(orfrequency)ξ.Suchanorganisationintofrequenciesreveals

importantinformationaboutthesignal,e.g.itssmoothnessandlocalisation.

TheFourierbasisitselfhasa deepgeometric foundationand canbe interpreted

asthenaturalvibrationsofthedomain,relatedtoitsgeometricstructure(see

e.g. Berger (2012)).

Foundations of Geometric Deep Learning115

Figure 4.10

Left:Fourierbasisfunctionshaveglobalsupport.Asaresult,localsignalsproduce

energyacrossallfrequencies. Right:Waveletsψ

u,ξ

(v)=ψ



v–u



atdifferentresolutions

ξandpositionsu.ContrarytotheFourieratoms,wavelets arelocalizedbothinspace

and frequency.

TheFourier transform

playsacrucialrole insignalprocessingasitoffers

a dual formulation of convolution,

(x⋆θ)(u)=

+∞

–∞

x(v)θ(u–v)dv

a standard modelof linear signal filtering(here andin the following, x denotes

thesignaland θthefilter).As wewillshow inthefollowing,the convolution

operatorisdiagonalisedintheFourierbasis,makingitpossibletoexpress

convolution as the product of the respective Fourier transforms,

(x⋆θ)(ξ)=

x(ξ)·

θ(ξ),

a fact known in signalprocessing as theConvolutionTheorem. We have stated

the Fourier transformin theone-dimensional case for simplicity ofexposition,

butthereadershouldrestassuredthatalltheseimportantcorrespondences

remain valid in full generality.

Asitturnsout, manyfundamentaldifferentialoperatorssuch astheLapla-

cianaredescribedasconvolutionsonEuclideandomains.Sincesuchdiffer-

ential operatorscan be definedintrinsically over very general geometries,this

providesaformalproceduretoextendFouriertransformsbeyondEuclidean

domains,includinggraphs,groupsandmanifolds.Wewilldiscussthisindetail

in Chapters 6 and8.

AnessentialaspectofFouriertransformsisthattheyrevealglobalproperties

of the signal and thedomain, such as smoothnessor conductance. Such global

behaviorisconvenientinpresenceofglobalsymmetriesofthedomainsuch

astranslation,butnottostudymoregeneraldiffeomorphisms.Thisrequires

116Chapter 4

arepresentationthattradesoffspatialandfrequentiallocalisation,aswesee

next.

4.4.2Multiscale representations

Thenotionoflocal invariancecanbearticulatedbyswitchingfrom aFourier

frequency-basedrepresentationtoascale-basedrepresentation,thecorner-

stoneofmulti-scaledecompositionmethodssuchaswavelets

.Theessential

insightofmulti-scalemethodsistodecomposefunctionsdefinedoverthe

domainΩintoelementaryfunctionsthatarelocalisedbothinspaceand

frequency.ContrarytoFourier,waveletatomsarelocalisedandmulti-scale,

allowingtocapturefinedetailsofthesignalwithatomshavingsmallspatial

supportandcoarsedetailswithatomshavinglargespatialsupport.Theterm

atomhereissynonymouswith‘basiselement’inFourieranalysis,withthe

caveatthatwaveletsareredundant(over-complete).Inthecaseofwavelets,

thisisachievedbycorrelatingatranslatedanddilatedfilter(motherwavelet)

ψ,producing acombinedspatio-frequency representationcalled acontinuous

wavelet transform

x)(u,ξ)=ξ

–1/2

+∞

–∞



v–u



x(v)dv.

Thetranslated anddilatedfilters arecalledwavelet atoms;theirspatialposition

anddilationcorrespondtothecoordinatesuandξofthewavelettransform.

These coordinatesareusually sampleddyadically (ξ=2

–j

and u=2

–j

k),with j

referredtoasscale.Multi-scalesignalrepresentationsbringimportantbenefits

in termsof capturingregularity properties beyondglobal smoothness,such as

piece-wisesmoothness,whichmadethemapopulartoolinsignalandimage

processing and numerical analysis in the 90s.

4.4.3Deformation stability of Multiscale representations

ThebenefitofmultiscalelocalisedwaveletdecompositionsoverFourier

decompositions is revealed when considering the effect of small deformations

‘nearby’the underlyingsymmetrygroup.Letus illustratethisimportantcon-

ceptintheEuclideandomainandthetranslationgroup.SincetheFourier

representationdiagonalisestheshiftoperator(whichcanbethoughtofas

convolution,aswewillseeinmoredetailinSection6),itisanefficient

representationfortranslationtransformations.However,Fourierdecomposi-

tionsareunstableunderhigh-frequencydeformations.Incontrast,wavelet

decompositions offer a stable representation in such cases.

Indeed,letusconsiderτ∈Aut(Ω)anditsassociatedlinearrepresentation

ρ(τ).Whenτ(u)=u–visashift,aswewillverifyinChapter6,theoperator

Foundations of Geometric Deep Learning117

Figure 4.11

Multiscaledecomposition ofan inputimage usingwavelets: eachscale decomposesthe

existingcoarse-grainedimageintoanevencoarserversion,andthedetailsneededto

reconstruct it.

ρ(τ)=S

isashiftoperatorthatcommuteswithconvolution.Sinceconvolu-

tion operatorsarediagonalised bythe Fouriertransform, theaction ofshiftin

thefrequencydomainamountstoshiftingthecomplexphaseoftheFourier

transform,

(

x)(ξ)=e

–iξv

x(ξ).

Thus,theFouriermodulusf(x)=|

x|removingthecomplexphaseisasimple

shift-invariantfunction,f(S

x)=f(x).However,ifwehaveonlyapproximate

translation, τ(u)=u– ̃τ(u) with ∥∇τ∥

∞

=sup

u∈Ω

∥∇ ̃τ(u)∥≤ε, the situation is

entirely different: it is possible to show that

∥f(ρ(τ)x)–f(x)∥

∥x∥

=O(1)

irrespectiveofhowsmallεis(i.e.,howcloseisτtobeingashift).Conse-

quently,suchFourierrepresentationis unstableunderdeformations,however

small.Thisunstabilityismanifestedingeneraldomains andnon-rigidtransfor-

mations;wewill seeanotherinstanceofthis unstabilityintheanalysisof 3D

shapes usingthenatural extension ofFourier transformsdescribed inChapter

Waveletsofferaremedyto thisproblem thatalsorevealsthepowerofmulti-

scalerepresentations.Intheaboveexample,wecanshow(Mallat2012)that

118Chapter 4

the wavelet decomposition W

x isapproximately equivariantto deformations,

∥ρ(τ)(W

x)–W

(ρ(τ)x)∥

∥x∥

=O(ε).

In otherwords, decomposingthesignal informationinto scales usinglocalised

filtersratherthanfrequenciesturnsaglobalunstablerepresentationintoa

familyoflocallystablefeatures.Importantly,suchmeasurementsatdifferent

scalesarenotyetinvariant, andneed tobe progressivelyprocessed towards the

lowfrequencies,hintingatthedeepcompositionalnatureofmodernneuralnet-

works, and captured in our Blueprint forGeometric Deep Learning, presented

next.

4.4.4Scale Separation Prior

Wecanbuildfromthisinsightbyconsideringamultiscalecoarseningofthe

datadomainΩintoahierarchyΩ

,...,Ω

.Asitturnsout,suchcoarsening

canbedefinedonverygeneraldomains,includinggrids,graphs,andman-

ifolds.Informally,acoarseningassimilatesnearbypointsu,u

′

∈Ωtogether,

andthusonlyrequiresanappropriatenotionofmetricinthedomain.If

(Ω

):={x

:Ω

→C

}denotessignalsdefinedoverthecoarseneddomain

,weinformallysaythatafunctionf:X(Ω)→Yislocallystableatscalej

if itadmits a factorisationof theform f≈f

◦P

, whereP

:X(Ω)→X

(Ω

) isa

non-linearcoarsegraining andf

(Ω

)→Y. Inotherwords,while thetarget

function fmight depend on complex long-range interactions between features

overthewholedomain,inlocally-stablefunctionsitispossibletoseparate

theinteractionsacrossscales,byfirstfocusingonlocalisedinteractionsthat

are then propagated towards the coarse scales. Figures 4.12and 4.13 illustrate

simple examples of this scale factorization.

Suchprinciples

areoffundamentalimportanceinmanyareasofphysics

andmathematics,asmanifestedforinstanceinstatisticalphysicsintheso-

calledrenormalisationgroup,orleveraged inimportantnumericalalgorithms

suchastheFastMultipoleMethod.Inmachinelearning,multiscalerepre-

sentationsandlocalinvariancearethefundamentalmathematicalprinciples

underpinningtheefficiencyofConvolutionalNeuralNetworksandGraph

NeuralNetworksandaretypicallyimplementedintheformoflocalpooling.In

future work, we will further develop tools from computational harmonic anal-

ysisthatunifytheseprinciplesacrossourgeometricdomainsandwillshed

light onto the statistical learning benefits of scale separation.

Foundations of Geometric Deep Learning119

Figure 4.12

IllustrationofScaleSeparationfor imageclassificationtasks. Theclassifierf

′

defined

on signalson thecoarse gridX(Ω

′

) shouldsatisfy f≈f

′

◦P, whereP:X(Ω)→X(Ω

′

Here,thecoarsescalesdominate:todiscriminatebetweenthesetwoclasses,onecan

coarse grain the domain and drop all the details below a certain resolution.

Figure 4.13

IllustrationofScaleSeparationfor imageclassificationtasks. Theclassifierf

′

defined

on signalson thecoarse gridX(Ω

′

) shouldsatisfy f≈f

′

◦P, whereP:X(Ω)→X(Ω

′

Here,the finescales dominate:to discriminatebetween thesetwoclasses, itis sufficient

to focus on local statistics and then combine them with a simple averaging.

4.5The GDL Blueprint

ThegeometricprinciplesofSymmetry,GeometricStability,andScaleSep-

arationdiscussedsofarcanbecombinedtoprovideauniversalblueprintfor

learningstablerepresentationsofhigh-dimensionaldata.Theserepresentations

willbeproducedbyfunctionsfoperatingonsignalsX(Ω,C)definedonthe

domain Ω, which is endowed with a symmetry group G.

120Chapter 4

Thegeometricpriorswehavedescribedsofardonotprescribeaspecific

architectureforbuildingsuchrepresentation,butratheraseriesofnecessary

conditions. However, theyhint atanaxiomatic constructionthat provably sat-

isfiesthesegeometricpriors,whileensuringahighlyexpressive representation

that can approximate any target function satisfying such priors.

4.5.1Invariant and Equivariant Layers

Asimple initialobservationisthat,inordertoobtaina highlyexpressiverepre-

sentation, weare requiredto introducea non-linearelement, since if fis linear

and G-invariant, then for all x∈X(Ω),

f(x)=

μ(G)

f(g.x)dμ(g)=f



μ(G)

(g.x)dμ(g)



whichindicatesthatFonlydependsonxthroughtheG-averageAx=

μ(G)

(g.x)dμ(g).Inthecaseofimagesandtranslation,thiswouldentailusing

only the average RGB color of the input!

While this reasoning shows that the family of linear invariants is not a very

richobject,thefamily oflinearequivariantsprovides amuchmorepowerful

tool,sinceitenablestheconstructionofrichandstablefeaturesbycom-

positionwithappropriatenon-linearmaps,aswewillnowexplain.Indeed,

ifB:X(Ω,C)→X(Ω,C

′

)isG-equivariantsatisfyingB(g.x)=g.B(x)forall

x∈Xandg∈G,andσ:C

′

→C

′′

isanarbitrary(non-linear)map,thenwe

easily verifythat thecomposition U:=(σ◦B):X(Ω,C)→X(Ω,C

′′

) isalso G-

equivariant,whereσ:X(Ω,C

′

)→X(Ω,C

′′

)istheelement-wiseinstantiation

of σgiven as (σ(x))(u):=σ(x(u)).

ThissimplepropertyallowsustodefineaverygeneralfamilyofG-

invariants,bycomposingUwiththegroupaveragesA◦U :X(Ω,C)→C

′′

AnaturalquestionisthuswhetheranyG-invariantfunctioncanbeapprox-

imatedatarbitraryprecisionbysuchamodel,forappropriatechoicesofB

and σ. It isnothard to adapt the standard Universal Approximation Theorems

from unstructuredvectorinputs to show thatshallow ‘geometric’networks are

alsouniversalapproximators,byproperlygeneralisingthegroupaverageto

a generalnon-linear invariant

. However, asalready describedin thecase of

FourierversusWaveletinvariants,there isafundamental tensionbetweenshal-

low globalinvarianceanddeformationstability.Thismotivatesanalternative

representation, which considers instead localisedequivariant maps

AssumingthatΩisfurtherequippedwithadistancemetricd,wecallan

equivariantmapUlocalisedif(Ux)(u) dependsonlyonthe values ofx(v)for

={v:d(u,v)≤r},forsomesmallradiusr;thelattersetN

iscalledthe

receptive field.

Foundations of Geometric Deep Learning121

Figure 4.14

GeometricDeepLearningblueprint,exemplifiedonagraph.AtypicalGraphNeural

Networkarchitecturemaycontainpermutationequivariantlayers(computingnode-

wisefeatures),localpooling(graphcoarsening),andapermutation-invariantglobal

pooling layer (readout layer).

AsinglelayeroflocalequivariantmapU cannotapproximatefunctionswith

long-rangeinteractions,butacompositionofseverallocalequivariantmaps

◦U

J–1

···◦U

increasesthereceptivefield

whilepreservingthestabil-

itypropertiesof localequivariants.Thereceptivefieldisfurtherincreased by

interleavingdownsamplingoperatorsthatcoarsenthedomain(againassum-

ingametricstructure),completingtheparallelwithMultiresolution Analysis

(MRA, see e.g. Mallat (1999)).

Insummary,thegeometryoftheinputdomain,withknowledgeofan

underylingsymmetrygroup,providesthreekeybuildingblocks:(i)alocal

equivariantmap,(ii)aglobalinvariantmap,and(iii)acoarseningoperator.

Thesebuildingblocksprovidearichfunctionapproximationspacewithpre-

scribedinvarianceandstabilitypropertiesbycombiningthemtogetherina

scheme we refer to as the Geometric Deep Learning Blueprint (Figure 4.14).

122Chapter 4

Geometric Deep Learning Blueprint

LetΩ andΩ

′

bedomains,Gasymmetrygroupover Ω,andwriteΩ

′

⊆Ω

if Ω

′

can be considered a compact version of Ω.

We define the following building blocks:

LinearG-equivariantlayerB:X(Ω,C)→X(Ω

′

)satisfying

B(g.x)=g.B(x) for all g∈Gand x∈X(Ω,C).

Nonlinearity σ:C→C

′

applied element-wise as (σ(x))(u)=σ(x(u)).

Local pooling (coarsening) P:X(Ω,C)→X(Ω

′

,C), such that Ω

′

⊆Ω.

G-invariantlayer(globalpooling)A:X(Ω,C)→Ysatisfying

A(g.x)=A(x) for all g∈Gand x∈X(Ω,C).

Usingthese blocksallowsconstructing G-invariantfunctions f:X(Ω,C)→

Yof the form

f=A◦σ

◦B

◦P

J–1

◦...◦P

◦σ

◦B

wheretheblocksareselectedsuchthattheoutputspaceofeachblock

matchestheinputspaceofthenextone.Differentblocksmayexploit

different choices of symmetry groups G.

4.5.2Different settings of Geometric Deep Learning

Onecanmakeanimportantdistinctionbetweenthesettingwhenthedomain

Ωisassumedtobefixedandoneisonlyinterestedinvaryinginputsignals

definedonthatdomain,orthedomainispartoftheinputasvariestogether

withsignalsdefinedonit.Aclassicalinstanceoftheformercaseisencoun-

tered incomputer visionapplications, whereimages areassumed tobe defined

onafixeddomain(grid).Graphclassificationisanexampleofthelatterset-

ting,whereboththestructureofthegraphaswellasthesignaldefinedonit

(e.g.nodefeatures)areimportant.Inthecaseofvaryingdomain,geometric

stability(inthe senseofinsensitivity tothedeformationofΩ)playsa crucial

role in Geometric Deep Learning architectures.

Foundations of Geometric Deep Learning123

Thisblueprint hastherightlevelof generalitytobeused acrossawiderange

ofgeometric domains.DifferentGeometric DeepLearningmethodsthusdiffer

intheirchoiceofthedomain,symmetrygroup,andthespecificimplementation

details of theaforementioned building blocks. As wewill see inthe following,

alargeclassofdeeplearningarchitecturescurrentlyinusefallintothisscheme

and can thus be derived from common geometric principles.

InthefollowingChapterswewilldescribethevariousgeometricdomains

focusingonthe‘5G’,aswellasthespecificimplementationsofGeometric

Deep Learning on these domains.

ArchitectureDomain ΩSymmetry group G

CNNGridTranslation

Spherical CNNSphere / SO(3)Rotation SO(3)

Mesh CNNManifoldIsometry Iso(Ω) /

Gauge symmetry SO(2)

GNNGraphPermutation Σ

Deep SetsSetPermutation Σ

TransformerComplete GraphPermutation Σ

E(3) GNNsGeometric GraphPermutation Σ

×ばつEuclidean E(3)

LSTM1D GridTime warping

4.6Summary

Inthischapter,wehave takentheideasdeveloped inChapter3anddeployed

theminlearningproblemswithphysicalstructure.Thisphysicalstructureis

encodedinasimpledomainΩwhererichsymmetriescanbedefined,and

then extended viacomposition tothe signalspace X(Ω).In orderto makethe

symmetriclearningprincipleseffective,weneedtoaugmentthepriorofinvari-

ance/equivariance to includealso stability todeformations, aswell as thescale

separationprinciple.Takentogether,theseprinciplesdefineacomputational

blueprint(theGDLblueprint)thatenablesarich,efficientclassofinvariant

andgeometricallystablerepresentationsparametrizedbyneuralnetworks.In

the following chapters, wewill instantiate this GDLblueprint to key represen-

tative domainsand explorethe specificgeometric propertiesassociated toeach

of them.

124Chapter 4

Exercises

1.Count the number ofsymmetries of alabel function. Assumefinite input space and

label space.

2.Prove that if wehave the full symmetrygroup of a label function,we only need one

label per class to learn thetrue function. (Maybe start with a sub-question:what are

the orbits of the full symmetry group of a label function?)

3.Showbasicpropertiesofgroupactions:e.g.howtheinverseofagroupelement

yields the inverse map on Ω, how identity acts as the identity map, etc.

4.Multiplicationtables:doesanyfinitegrouphaveamultiplicationtable?Canany

table wewrite downbe interpretedas defininga group? Ifno, listthe constraintsthe

table needs to satisfy.

5.Uniquenessofgenerators.Aretheyunique?Ifnot,findanalternativesetof

generatorsforD3,differentfromtheonesinthetext(rotation,flip).Writethe

multiplication table and Cayley diagram in terms of these generators.

6.If we have a group with generators, and wetake a subset ofgenerators, do we geta

subgroup?

7.Ifwehave agroupwithgeneratorsandasubgroup,doesthesubgroupnecessarily

correspond to a subset of generators?

8.Ifwehaveagroupelement, itcanalways bewrittenasaproductofgenerators.Is

this way of writing it unique?

Notes

WhenΩhassomeadditionalstructure,wemayfurtherrestrictthekinds

ofsignalsinX(Ω,C).Forexample,whenΩisasmoothmanifold,wemay

require the signals to be smooth. Whenever possible, we will omit the range C

for brevity.

WhenthedomainΩisdiscrete,μcanbechosenasthecountingmeasure,

inwhichcasetheintegralbecomesasum.Inthefollowing,wewillomitthe

measure and use du for brevity.

Ωmust be a vector space in order for an expression αu+βv to make sense.

Think aboutit asthe generalization ofthe uniformmeasure over the group,

so that each group element has equal ‘mass’; see Appendix B.

E.g.,thecompositionoftwoε-isometriesisa2ε-isometry,violatingthe

closure property.

Thegrapheditdistancemeasurestheminimalcostofmakingtwographs

isomorphicbyasequencesofgrapheditoperations.TheGromov-Hausdorff

distancemeasures thesmallest possiblemetricdistortion ofa correspondence

between two metric spaces, see Gromov (1981).

Foundations of Geometric Deep Learning125

TwographscanbealignedbytheQuadraticAssignmentProblem(QAP),

whichconsidersin itssimplestformtwographsG,

Gofthesamesizen,and

solvesmin

P∈Σ

trace(AP

⊤

),whereA,

Aaretherespectiveadjacency

matricesandΣ

isthegroupofn×ばつnpermutationmatrices.Thegraphedit

distance can be associated with such a QAP (Bougleux et al. 2015).

In the following, we will use convolution and (cross-)correlation

(x⋆θ)(u)=

+∞

–∞

x(v)θ(u+v)dv

interchangeably, asitis commoninmachine learning:the differencebetween

thetwoiswhetherthefilterisreflected,andsincethefilteristypically

learnable, the distinction is purely notational.

See Mallat (1999) for a comperehensive introduction.

Thisnotationimpliesthatρ(τ)actsonthespatialcoordinateof(W

x)(u,ξ).

FastMultipoleMethod(FMM)isanumericaltechniqueoriginallydevel-

opedtospeedupthecalculationoflong-rangedforcesinn-bodyproblems.

FMM groups sources that lieclose together and treats themas a single source.

Here,μ(g)isknownastheHaarmeasureofthe groupG,andtheintegral

is performed over the entire group.

Such proofshave been demonstrated, forexample,for the Deep Setsmodel

by Zaheer et al. (2017).

Meaningful metricscan bedefinedon grids,graphs, manifolds,and groups.

A notable exception are sets, where there is no predefined notion of metric.

Theterm‘receptivefield’originated inthe neuroscienceliterature,referring

to the spatial domain that affects the output of a given neuron.