3
Foundations of Equivariant Deep Learning
Wedefinesymmetriesasstructure-preservingtransformationsofan
object, and study examples.
Symmetriesofalearningproblem aretransformationsoftheinputspace
that leave the output invariant.
Toexploitsymmetriesindeeplearning,weuseequivariantandinvariant
layers.
Symmetriesformgroups.Inthischapterwestudybasicnotionsfrom
group theory.
Groupsactondataviagroupactionsorrepresentations.Inthischapter
we study basic notions from representation theory.
Westudyequivariantmappingsandequivariantnetworksin general,and
provide an overview of common equivariant neural network layers.
Inthischapterwebeginourstudyofthefoundationsofgeometricdeep
learning.Thecentralconceptissymmetry:atransformationthatleavesall
relevantstructureofanobjectunchanged.Wewilllookatthewayinwhich
symmetries arisein machinelearning, andstudy thecentral algebraic structure
usedinthemathematicalstudyofsymmetries:thegroup.Wedefinegeneral
andlineargroupactions/representations,whichtellushowtheelementsof
anabstractgroupcanberepresentedbyconcrete(linear)maps.Usinggroup
representations, wecan specifyhow agroup actson various kinds ofgeomet-
ricdata,fromimagesto pointclouds totensor fields.Nextwe discussinvariant
andequivariantmaps,whicharethebuildingblocksofequivariantnetworks
and the primary toolswe use to leverage knowledge ofsymmetries to improve
thestatisticalefficiencyofdeeplearningmethods.Weendthechapterwith
theequivariancecookbook,whichprovidesanumberofcommonkindsof
equivariant maps from which equivariant networks can be constructed.
Ourfocusinthischapterisonthealgebraic(group-theoretic)aspectsof
GDL, whereas the next chapter will coverthe geometric ones.Concretely, this
meansthatinthepresentchapterwe considerdatathatlives inaplainvector
62Chapter 3
spacewithagroupofsymmetriesactingonit,butwithouttoomuchfurther
structure, while inthe nextchapter this vector spaceis taken to be thespace of
signals(functions)onaspace(e.g.graph,manifold,etc.).Thus,fornow,we
cantalkaboutsymmetries,groups,representationsandequivariance,butwe
willnot considerconceptssuch aslocalityandconvolution, whichdependon
the structure of the underlying space.
3.1Symmetry Transformations
Letusbeginbygivingageneraldefinitionofsymmetry,beforeweconsider
the role it plays in geometric deep learning. Just as how the best way to define
"vector"is"elementofavectorspace"(definingthethingbyitsrelationto
otherthingsofthesamekind
1
),onecoulddefine"symmetry"as"elementof
a symmetry group". Butalthough such a definition wouldbe general, compact
andprecise,itwouldnotbeveryintuitive.Sowebeginwithamoreinfor-
maldefinition,andthenshowhowthenotionofgroup(tobedefined)neatly
captures the idea.
Themodernconceptofsymmetryappliestoawiderangeofabstract
mathematicalstructures,butletusfirstconsiderthefamiliarexampleofan
equilateraltriangle(fig.3.1,left).Wesaythatrotationby120
isasym-
metryofthetriangle,becauseafterapplyingthistransformationthetriangle
looksexactlyasitdidbefore.Thesameistrueforreflectionaboutoneof
thethree bisectors,aswellas arbitrarysequencesofrotationsand reflections.
Furthermore, the inverse of any of these symmetries is itself also a symmetry.
Figure 3.1
Euclidean symmetries (left) and general isometries (right) of triangle.
Whendrawnonpaper,theoriginalandtransformedtrianglelookexactly
thesame.However,forotherwaysofencodingorrepresentingthetriangle,
wemayfindthatthereareactuallysomesuperficialchanges.Forinstance,if
weencodethetriangleasanorderedlistofthreecornerpointsin(x,y)coor-
dinates,thenarotationby2π/3ofeachcoordinatevectorleadstoanoverall
permutation (cyclic shift) ofthe three corners.Nevertheless, if weforget about
Foundations of Equivariant Deep Learning63
theordering,westillhavethesamesetofcorners.Sincewedonotconsider
the orderof thecorners to bepart ofthe essentialstructure of thetriangle, the
rotated triangle(list of cornercoordinates) should beconsidered equivalentto
theonewestartedwith. Thisphenomenoniskeyto geometricdeeplearning,
becauseanysensiblecomputationonsuchdata(includingforinstancethose
performedbyaneuralnetwork)shouldtreatequivalentinputsequivalently,
even if they have superficial differences.
In thisexample andalsoin general,a symmetrytransformationis aspecial
kindofstructure-preservingmap. Whatismeantby structureishighlycontext-
dependent,butinorder todiscusssymmetriesonehastochoose somenotion
ofstructureofinterest.IfwearedoingEuclideangeometry,thestructureof
interestisEuclideandistancestructure,andsoherestructure-preservingmap
meansdistance-preservingmap,akaisometryorEuclideantransformation.
More precisely, g:R
n
R
n
is an isometry wrt distance metric d, if
d(gu,gv)=d(u,v),
Forallu,vR
n
.Theisometriesoftheplaneconsistofrotations,reflections,
translations, and all possible compositions of these.
Applying an arbitrary isometry to ourtriangle, we obtain a new but congru-
ent(orisomorphic
2
)triangle(Fig.3.1,right).Whenthetransformedtriangle
isnot justcongruent butactually thesame astheone westartedwith, wecall
thetransformationasymmetryofthetriangle.Inotherwords,asymmetry
transformation is a way in which the triangle is congruent to itself.
With thisin mindwe couldsay ingeneralthat asymmetry isa wayin which
an object is equivalent (or isomorphic) to itself, or more prosaically:
A symmetry transformation (or automorphism) of an object is an
invertible and structure-preserving map from the object to itself.
Oftentimes,weareinterestedinthesymmetriesofaspace.Considerfor
exampletheplaneendowedwithaEuclideandistancemetric.Thisspacehasas
symmetriesall isometriesoftheplane, i.e.allrotations,reflections andtrans-
lations.Theyaresymmetriesbecausetheyaremapsfromtheplanetoitself,
theyareinvertible,and theypreservedistances.In thiscase ithappensto bethe
case that all structure-preserving maps are invertible, but in general one has to
add this as an additional requirement.
3
The same set may be endowed with different structures, leading to different
symmetries. For instance we could see the plane as:
1.A set, so that symmetries are bijections (invertible maps; permutations),
64Chapter 3
2.A vector space, so that symmetries are invertible linear transformations,
3.An affine space
4
, so thatsymmetries are invertible affine maps (linearmaps
plus translation),
4.Atopologicalspace,sothatsymmetriesarehomeomorphisms(continuous
maps with continuous inverse).
5.A smoothmanifold, so thatsymmetries are diffeomorphisms (smooth maps
with smooth inverse).
Wecouldcontinuethelist,regardingtheplaneeachtimeasaninnerproduct
space,Euclideanmetric space,andsoon.Thosenotfamiliarwiththese notions
shouldnotworry:theonlypointwewishtomakeisthatwhatcountsasa
symmetry depends on exactly what we consider to be the structure of interest.
3.2Symmetries of Learning Problems
Howdosymmetriesappearinmachinelearning,andwhyaretheyrelevant?
Inthissectionwelookattwokindsofsymmetriesthatariseinsupervised
learning: symmetriesof the target function (manifestedas invarianceor equiv-
ariance)andsymmetriesoftheparameterization.Theformerwillbeused
throughoutthisbooktodevelopstatisticallyefficientnetworksthatrespect
suchsymmetries,whereasthelatterhasrelevanceforoptimizationbutwill
not be the focus of this book.
3.2.1Invariance
In supervised learning (section 2.1) the goal is touse data samples to approxi-
matethetargetfunctionf:X→YmappingfrominputspaceXtooutputspace
Y. Forconcreteness we will consider theproblem of classification,where Yis
thesetofclasslabels,but wecouldjustaswellconsideracontinuousoutput
space in case of regression.
In the previous section, we defined a symmetry of an objectas an invertible
structure-preservingtransformationoftheobject.Forlearningproblemssuch
asclassification,theobjectofinterestistheinputspaceX,andweconsider
thelabelfunctionftodefinethestructureofinterest("classstructure").A
symmetryofthelearningproblemisthusaninvertibleandlabel-preserving
mapg:X→X.Tosaythatgpreservesthelabel,orleavesitinvariant,isto
saythat forall x∈X, wehave (fg)(x)=f(x). Thatis,applyingg andthen fto
anyinput x∈Xgivesthesameresult asjust applyingfdirectly. Indiagramatic
Foundations of Equivariant Deep Learning65
Figure 3.2
Rotation is a symmetry of the label function in image classification.
form, we can express this as:
Y
XX
g
ff
(3.7)
Acommutativediagramlikethis expresses thefactthat ifwe composemaps
(arrows)alonganydirectedpathwiththesamebeginandendpoints,theresult-
ing composite maps arethe same. Hence this diagram expresses the fact that g
followed by fequals f.
5
As an example of a label symmetry, consider image classification, where X
is the space of planar images and Yis a set of labels. Typically, the class label
doesnot dependonthepositionand orientationoftheobject,so atranslation
or rotation ofthe image, will bea symmetry (see fig.3.2). Similarly,if we are
classifyinga pointcloud,then Xisa spaceofpoint sequences.Typically,the
label doesnot dependon the orderof thepoints, soany permutationof points
will bea symmetry. Additionally,it maybe thatrotating and translatingall of
the points simultaneously does not change the label.
The examples of symmetries mentioned above are quite natural, but in gen-
eralthere willbemanymore. Forinstance, ifwehavetwoinputs x,x
∈Xwith
thesamelabel(i.e.f(x)=f(x
)),wecandefineatransformationgthatswaps
them, whileleaving all otherpoints unchanged.This will bea symmetryof f,
but it is unlikely we would know if we didn’t already know f!
If somehow we came to know all symmetries ofa label function f, learning
becomesalmostentirelytrivial.Thereasonisthatifweobserveadatapoint
xwith label,weimmediatelylearn thelabelof allpointsx
inthe sameclass,
becausethereexistsasymmetrythatswapsxandx
.Sointheorywewould
needonlyonelabelperclass.Thisexamplenicelydemonstrateshowknowl-
edgeofsymmetriescan inprinciplebeusedtoimprovedataefficiency,butit
also suggests thatif we knew all symmetries ofthe label functionwe wouldn’t
really need learning. In other words, for problems where we do need learning,
66Chapter 3
Figure 3.3
Oneinvariantandthreeequivariantcomputervisionproblems:imageclassification,
oriented keypoint detection, image segmentation, optical flow estimation.
weshould notexpectto beable toknowallofthesymmetries apriori. Inchap-
ter4,wewillconsideranaturalclassofsymmetriesarisingfromthedomain
on whichthedata lives,andwhich weoften know abouta priori.For now we
willsimplyassumeknowledgeofsomesymmetrieswithoutworryingabout
the specific kind or how we have come to know them.
3.2.2Equivariance
Forinvariant learningproblems, theoutput remainsunchanged when weapply
asymmetrytransformationtotheinput.Thereisanotherclassofproblems
where insteadthe target functionis equivariant, meaningthat theoutput trans-
formsalong withthe input(seefig. 3.3forexamples). Inimage segmentation
forinstance,theinputisanRGBimageandtheoutputisasegmentation
mask,whichassignsalabeltoeachpixel.Whenweshifttheinput,theout-
put should also shift. Another example is image registration, where the goal is
to output a transformation that puts the imagein a canonical pose (e.g. putting
abrainscaninuprightpositionandperhapsapplyingadeformationsothat
brain regions can be compared voxel-wise). Similarly, learned or hand-crafted
keypoint detectors(e.g.SIFT;Lowe1999; LencandVedaldi2016) shouldbe
equivariant:whentheimageistransformed,thekeypointposition,scale,and
orientation transforms along with it.
Astheseexamplesshow,theoutputwillingeneralbeadifferentkindof
geometric quantitythan the input,meaning that itlivesin a different space and
transforms ina different wayunder thesame symmetry. For instance,the same
rotationcanact onaninputsignal(e.g.brainscanwith n
3
voxels)andonthe
outputposevariable(e.g.a3×ばつ3rotationmatrix).Whatitmeansto"rotate"
avoxelgrid or3×ばつ3 matrixis different. Forthis reasonwewill formalizethe
concepts ofgroup(e.g. rotations)and group representations /actions (how to
Foundations of Equivariant Deep Learning67
applyarotationtovariouskindsofquantities)laterinthischapter.Fornow,
wewillsimplywritegforasymmetry,andwriteρ
X
(g):X→Xandρ
Y
(g):
Y→Yfor the way it acts on input and output space.
Equivariance of the target function f:X→Ycan then be stated as:
fρ
X
(g)=ρ
Y
(g)f.(3.8)
Thisisanequalityofmappings,whichmeansthatthemappingsontheleft
and right-handside shouldagree on allinputs x∈X. Hence,this equationsays
that ifwe applya symmetrytransformationg toan inputx usingρ
X
and then
apply f, we get the same outputas applying ffirst andthen applying the same
symmetry g to the output using ρ
Y
.
We can also express equivariance indiagramatic form (appendix E), so that
we can easily see the domains and codomains of the maps involved:
YY
XX
ρ
X
(g)
f
f
ρ
Y
(g)
(3.9)
To saythat this diagram commutesis to saythat starting inthe bottom leftX,
going up and then right is the same as right and then up.
Wecanobtainyetanotherwayofexpressingequivariancebymultiplying
eq. (3.8) on the left by ρ
Y
(g)
–1
and cancelling:
ρ
Y
(g)
–1
fρ
X
(g)=f.
Thisshows mostclearlythatalsointhecaseofequivariancewecanthinkof
gasasymmetry,becauseitleavesfinvariantwhenactingontheinputand
outputspacesimultaneously,similartotheinvarianceequationfρ
1
(g)=f.
Indeed onecan seethat invariance (eq.(3.7)) isa specialcase ofequivariance
(eq. (3.9)) where ρ
Y
(g)=id
Y
is the identity map for all g∈G.
3.2.2.1EquivarianceandDataEfficiencyAswithinvariance,equivari-
anceputsaconstraintonthehypothesisspaceF,therebyreducingthe
statisticalcomplexityoflearning(chapter2).Themoresymmetrieswehave,
themoreconstraintsweget,andthesmallertheremaininghypothesisspace.
Althoughwehavesofardiscussedequivarianceofthenetworkasawhole,
inpracticeonewouldbuildanequivariantnetworkbycomposinglearnable
equivariantlayers,eachofwhichissubjecttoaconstraintthatreducesthe
number of parameters in that layer.
To getter abetter understanding of therole of equivariance in pattern recog-
nition,considerfig.3.4.Inthisexample,aneuralnetworklayerlearnsa
68Chapter 3
style-agnosticrepresentationofthepattern‘A’(twoinputsx,x
∈X,repre-
senting two styles of ‘A,’ are mapped into the same point, y=f(x)=f(x
)∈Y).
However,whentheinputpatternsaretransformedbyg(inourexample,a
rotationρ
X
(g)xandρ
X
(g)x
),ageneralneuralnetworkdoesnotnecessarily
generalizeina symmetry-consistentmanner, i.e.,itmightbe thatf(ρ
X
(g)x)=
f(ρ
X
(g)x
).
Byimposing equivariance,weguarantee thatthemodel generalizesina way
that is consistent with the symmetry: from the equivariance condition,
f(ρ
X
(g)x)=ρ
Y
(g)f(x)
f(ρ
X
(g)x
)=ρ
Y
(g)f(x
);
togetherwiththeassumptionf(x)=f(x
),itfollowsthatf(ρ
X
(g)x)=
f(ρ
X
(g)x
).
A
A
A
A
X
Y
Figure 3.4
Example of the role of equivariance in pattern recognition.
Inadditiontotheseheuristicarguments,thereisagrowingbodyofwork
in learning theory that formally characterize the benefits of equivariance (Lyle
etal.2019;Sannai,Imaizumi,andKawano2021;ElesedyandZaidi2021;
Behboodi, Cesa, and Cohen 2022).
3.2.3Symmetries of the Parameterization
There isanother wayin whichsymmetries appear inmachine learning.When
we approximate the true labelfunction f:X→Yby a parameterizedfunction
f:X×ばつΘ→Y,wemaybeintroducingsymmetriesoftheparameterization.
Foundations of Equivariant Deep Learning69
Whereasinput-spacesymmetriesactonX,aparameter-spacesymmetryacts
onΘ.Amappingh:ΘΘisasymmetryoftheparameterizationifforall
u∈Xand θΘ:
f(u,hθ)=f(u,θ).(3.10)
Inotherwords,themappingsf(–,θ)andf(–,hθ)arethesame;theparameter
vectors θand hθcorrespond to the same classifier.
Inneuralnetworks, onecanusuallypermutehidden units(alongwiththeir
incoming and outgoingweights), without changingthe input-output mapping.
Furthermore, forcertain kindsofnon-linearities like ReLU(x)=max(x,0),we
canscaletheinputandinversescaletheoutputofanonlinearitywithout
changing the function: α
–1
ReLU(αx,0)=ReLU(x).
Knowledgeofparameter-spacesymmetriescanbeusefulinoptimization,
Bayesianinference,andmodelaveraging(Kuninetal.2021;Ainsworth,
Hayase,andSrinivasa2022;Weietal.2022;Kurleetal.2022;Zhaoet
al.2022;Zhao etal. 2023;Wei andLau 2023).In someapplications itisuseful
foroneneuralnetworktoprocesstheweightsofanother,inwhichcasethe
formershould respecttheweight-spacesymmetries(Navon etal.2023;Zhou
etal.2023).Intherestofthisbookwewillprimarilyfocusoninputspace
symmetries and how to use them to improve statistical efficiency.
3.3Symmetry Groups
Havingdefinedsymmetriesandseenhowtheyemergein machinelearning,we
willnowbeginourstudyofthemathematicalmachineryusedtothinkabout
symmetriesinasystematicandgeneralway.Webegininthissectionwith
theconceptofasymmetrygroup,beforestudyinggrouprepresentationsand
equivariant maps.
Thesetofallsymmetriesofanobjecthasanumberofsalientproperties,
which have beendistilled intoa few abstractgroupaxioms. Thefirstproperty
tonoteisthattheidentitytransformation("donothing")isalwaysasymme-
try,inatrivialway.Secondly,symmetriesmaybecombinedtoobtainnew
symmetries:ifgandharetwosymmetries,thentheircompositiongh(first
applyinghandtheng)isalsoasymmetry,andthiscompositionoperationis
associative.Finally,bydefinitionsymmetriesarealwaysinvertible,andthe
inverseisalsoasymmetry.Weencouragethereadertoverifyforthemselves
thatthesepropertiesindeedholdforsymmetriesaswehavedefinedthemin
section 3.1.
Itispossibletodefineasymmetrygroupasasetofsymmetrytransforma-
tionsg:X→X,satisfyingtheseaxioms(identity,closureundercomposition
andinverses).Howeveritturnsout,asitoftendoes,thatinordertostudya
70Chapter 3
groupitisnotsoimportantwhatsymmetriesare(i.e.whattheydotoele-
mentsofX),butratherhowtheycomposewithothersymmetries.Sothe
mostcommondefinitionofgroupdoesnotsaywhatthegroupelementsare
(symmetries), but only encodes the compositional structure:
AgroupisasetGalongwithabinaryoperation:G×ばつG→G
calledcompositionorthegroupoperation(forbrevity,denotedby
juxtaposition gh=gh) satisfying the following axioms:
Associativity: (gh)k=g(hk) for all g,h,k∈G.
Identity: there exists a unique e∈Gsatisfying eg=ge=g for all g∈G.
Inverse:Foreachg∈Gthereisauniqueinverseg
–1
∈Gsuchthat
gg
–1
=g
–1
g=e.
Notethataswithfunctioncompositionormatrixmultiplicationbutunlike
multiplicationofnumbers,thegroupoperationneednotbecommutative,i.e.
we mayhavegh=hg.Groups forwhich gh=hg forall g,h∈Garecalled com-
mutativeorAbelian.
6
Groupscanbefiniteorinfinite,discreteorcontinuous.
So-calledLiegroups,likethegroupofrotationsinR
n
,aresmoothmani-
folds,meaningwecandocalculusonthem(moreonthisinlaterchapters).
Section 3.3.3showsa number ofexamples of groups.It is worth checkingfor
each one that it indeed satisfies the axioms of a group.
Asmentionedbefore,wehavedefinedagroupasanabstractobject,meaning
wehavenotstatedwhatthegroupelementsare(e.g.symmetrytransfor-
mationsofsomeobject,aswedefinedtheminsection3.1),onlyhowthey
compose.Bycontrast,aconcretegroup(sometimescalledsymmetrygroup)
wouldhaveaselementssymmetrytransformations(section3.1)g:X→X,
andthegroupoperationisgivenbyfunctioncomposition.In thespecialcaseof
matrix groups, the group elements are matrices
7
A∈G⊂R
n×ばつn
, and the group
operationisgivenbymatrixmultiplication.Startingfromaconcretegroup
oftransformations,weobtainanabstractgroupbysimplyforgettingthatthe
elementsarefunctionsormatrices,whilerememberingthecompositionrule
:G×ばつG→G.Asitturnsout,knowinghowgroupelementscomposeisall
you need to study every aspect of the group itself.
In practice, almost all of the groups weconsider in this book can be defined
as matrix groups, so let us state the definition explicitly:
Foundations of Equivariant Deep Learning71
An n-dimensionalreal matrix group is a setof matricesG⊂R
n×ばつn
sat-
isfying the following axioms:
Closure: For all A,B∈G, the matrix product ABis also in G.
Identity: the identity matrix Iis in G.
Inverse: AllA∈Gareinvertible (fullrank)and thematrix inverse A
–1
is also in G.
In amatrix group,the compositionoperation is givenby matrix-matrix mul-
tiplication,whichisautomaticallyassociative,andsowedonotneedtoadd
thisaxiom.Inanabstractgroupwhereweexplicitlydefinethecomposition
rule asa map:G×ばつG→G, thecomposition of two elements inGis always in
G, but since the matrix product of two elements of G⊂R
n×ばつn
need not be in G,
we need to add this closure axiom explicitly for matrix groups.
Whereasaconcretegroupismadeupoffunctionsg:X→Xthatcanbe
appliedtoelementsx∈X,thedataspecifyinganabstract groupdoesnottellus
howto act on geometric data. This abilitycan be reintroduced by defining one
ormoregrouprepresentations(orgroupactions),asdiscussedinsection3.4.
Grouprepresentationsareusefulevenwhenwearedealingwithamatrix
group,becauseoftentimeswewantthesamegrouptoactondifferentkinds
ofgeometricquantities,suchastheinputs,outputsandinternalfeaturesofa
neural network.
3.3.1Subgroups
Wenotedin section3.1thatwhatcanbe calledasymmetrydependsonwhat
structurewewanttopreserve.Whenweaddstructure,therewillbefewer
symmetries; our group is reduced to a subgroup.
LetGbeagroup.AsubgroupH⊂GisasubsetofGthatisitselfa
group. Concretely, thismeans thatHisclosed undercomposition and
inverses:when wecompose twoelementsin Htheresultis againin H,
and the inverse of any element in His also in H.
For any group, the subsetH={e} is asubgroup, and similarlyany group is
a(non-proper)subgroupof itself.Foraslightlyless trivialexample, consider
the groupof planartranslations (R
2
,+), andthe subgroup givenby translations
along the x-axis preserving the y-axis.
72Chapter 3
AclassicalexamplecomesfromtheErlangenprogram:Kleinnotedthat
differentgeometries are associated with different groups,which in some cases
formsubgroupsthatpreserveprogressivelymoregeometricstructure.For
instance,thegroupofEuclideantransformationsisasubgroupoftheaffine
group,whichisasubgroupoftheprojectivegroup.Projective,affineand
Euclidean geometry are concernedwith the study of invariantsof their respec-
tivegroups,suchasdistancesforEuclideangeometry,ratiosofsurfaceareas
for affine geometry, and cross-ratios for Projective geometry.
3.3.2*Mappings between Groups: Homomorphisms and Isomorphisms
Whenever we study a new kind mathematicalgadget, it is a good idea to think
abouttheappropriatenotionofstructure-preservingmapping betweengadgets.
In thecaseof groups,these arecalled grouphomomorphisms.Theyaremaps
between groups that preserve the compositional structure of a group:
LetGandHbegroups. Agroup homomorphismismapping φ:G→H
such that for all g,h∈G:
φ(gh)=φ(g)φ(h).
When interpreting the equationabove, note thaton the left handside we are
composing g and h using thecomposition operation in G, whereas on the right
handsidewearecomposingφ(g)andφ(h)inH.Onecanshowthatagroup
homomorphism maps the identity of Gto the identity of H.
Ifwehavetwogrouphomomorphismsφ
1
:G→Handφ
2
:H→K,then
their composition φ
2
φ
1
:G→Kis also a homomorphism. Moreover, for any
groupGtheidentitymapid
G
:G→Gisahomomorphism,whichshowsthat
groups and group homomorphisms form a category (appendix E).
Onecanshowthattheimageofagrouphomomorphism,imφ={h
H|g∈G:φ(g)=h},isasubgroupofH.Similarly,thekernelkerφ={g
G|φ(g)=e
H
}isasubgroupofG.Theinterestedreadermaystudythe
firstisomorphismtheorem,whichrelatestheimageandkernelofagroup
homomorphism.
Group homomorphisms always preserve positive compositional relations of
theformgh=k,buttheyneednotpreservenegativerelationsoftheformgh=k.
Asanextreme example,forany groupsGandH,themappingφ(g)=e
H
that
sends every element of Gto the identity of H, is a group homomorphism.
Whenagrouphomomorphismdoespreservedistinctness,i.e. g=hφ(g)=
φ(h)itis calledaninjectivegrouphomomorphismormonomorphism.Hence
onecan thinkof aninjectivegrouphomomorphism aspointingto aG-shaped
Foundations of Equivariant Deep Learning73
subgroupofH,labellingtheelementsofthissubgroupbyelementsofG
without duplicates.
Ifamappingisbothinjective(mappingeachelementofthedomaintoa
differentelementofthecodomain)andsurjective(reachingeveryelementof
the codomain)it isbijective,or invertible.A bijectivegroup homomorphismis
calledagroupisomorphism. Isomorphicgroupsareforallintentsand purposes
thesame,becausewecantranslateanystatementaboutoneofthemtoan
equallytruestatementabouttheother,andsoitisverycommontoconsider
them as the same group.
3.3.3Examples of Groups
Throughoutthisbookwewillencountervariouskindsofsymmetrygroups;
commutativeandnon-commutative,finiteandinfinite,discreteandcontinuous.
Inordertobuildintuitionforthegeneralconceptofgroup,andtobecome
familiarwiththegroupswewillconsiderlateroninthisbook,wepresent
inthissectionanoverviewofthemostimportantones.Itisnotnecessaryto
understandandbecomefamiliarwithallofthese;itissufficientfornowto
look at a few examples to make the concept of group feel less abstract.
3.3.3.1FiniteGroupsWhenstudyinganewconceptitishelpfultolook
at simple examples first. What is the simplestexampleof a group? Looking at
the axiomsof a group, wenotice that theidentity axiomtells usthat any group
hastohaveanidentity element,sowecannothave anemptygroupwithzero
elements.Canwehaveaone-elementgroup,withjusttheidentityelement
e?WedefineT={e},compositionbyee=e(theonlypossiblechoice)and
inverse by e
–1
=e (idem).By definition we havean identity, and an inversefor
all elements (juste), so the identity andinverse axioms aresatisfied. We check
thatcompositionisassociative: (ee)e=(e)e=e=e(e)=e(ee),soindeedTisa
group,calledthetrivialgroup.Asamatrixgroup,wemayrepresenteasthe
1×ばつ1 identity matrix I
1
=[1], ora higher dimensional identitymatrix, as these
satisfyII=IandI
–1
=Ijustliketheabstractgroupelemente.Whenwehave
anobjectwhosesymmetrygroupistrivial,itmeansthattheobjectdoesnot
have any non-trivial (non-identity) symmetries.
Whatabout agroupwithtwo elements?The groupshouldhaveanidentity
e,and letscallthe otheronem. Bytheidentityaxiom, weshouldhaveee=e,
em=m,andme=m,fixing3of the4compositions.Forthelast one,wehave to
choose betweenmm=mor mm=e. However, thefirst option runsinto trouble
becauseifwemultiplybothsidesoftheequationmm=mbym
–1
(however
we choose m
–1
) we find m=e, a contradiction. Thus we must have mm=e and
m
–1
=m.Wecanrealize thisgroupasa matrixgroupwith1×ばつ1matrices(i.e.
74Chapter 3
scalars)bychoosingG={1,–1}.Thisalsorevealsthegeometricmeaningof
this group:it consists ofthe mirror reflectionm andidentity e. Like thetrivial
group, the two-element group is commutative.
Perhapsthemostfundamentalclassoffinitegroupsarethepermutation
groups.Recallthatapermutationofanorderedlistofnelementsiswayto
rearrange("shuffle")them,forexampleswapping twoelements.Wecanrep-
resentapermutationasabijective(i.e.invertible)functionσ:nn,where
n={1,...n}.Alternatively,wecanrepresentapermutationbyapermutation
matrixP,whichisamatrixwithasingle1ineachrowandcolumnandall
otherentries0.WhenactingonavectorxR
n
,thismatrixwillpermutethe
coordinates.
Thesetofallpermutationsofnelementsform agroupcalledthesymmet-
ricgroupS
n
(nottobeconfusedwiththegeneraltermsymmetrygroup).To
seethatS
n
isindeedagroup,notethat1)theidentityid
n
:nnisaper-
mutation,2)thecompositeoftwopermutationsisapermutation(closure),
and3)apermutationhasaninverseandthisinverseisitselfapermutation.
Tospecifyapermutationσ,weneedtochooseσ(1)(npossibilities),σ(2)
(n–1 possibilities),etc.,upto σ(n)(onlyonechoiceleft). Thus,thegroupS
n
has n·(n–1)·(n–2)···2·1=n! elements. We canequivalently define S
n
as a
concrete group of mappings σ, as a matrix group, or as an abstract group.
Fortheinterestedreader,wementionedafewmorefinitegroups,with-
outgoingintodetail.Therotationalsymmetriesofregularpolygonssuchas
triangles,squares,pentagonsandhexagons(andsoon)arecapturedbythe
cyclicgroupsC
n
,andtheirrotationandreflectionsymmetriesbythedihe-
dralgroupsD
n
.Finitegroupsofsymmetriesinthreedimensionsincludethe
crystallographic point groups and the groupsof symmetries of Platonic solids.
AccordingtoCayley’stheorem
8
,anyfinitegroupwithnelementscanbe
viewedasasubgroupofS
n
.Furthermore,inamajormilestoneinmathemat-
ics
9
,the finitesimple groups(fromwhich allfinitegroups canbemade) have
been classified; all such groups belong to one of several infinite families, or to
a relatively short list of sporadic groups.
3.3.3.2InfiniteDiscreteGroupsGroupsneednotbefinite.Considerfor
instancethe setof integers Z={...,–2,–1,0,1,2,...}. With compositiongiven
byadditionand0astheidentity,thissetformsagroup.Geometrically,we
canthinkofthisgroupasagroupofdiscretetranslationsinonedimension.
Similarly, we canthink of Z
2
=Z×ばつZas agroup ofdiscrete translations intwo
dimensions.Inchapter6wewillconsiderthesegroupsasthesymmetriesof
regular grids (e.g. a grid of image pixels).
Foundations of Equivariant Deep Learning75
Inadditiontodiscretetranslationalsymmetries,someregulargrids/lat-
ticesalsohavediscreterotationalandreflectionsymmetry,suchasfour-fold
rotationalsymmetryofasquaregridandsix-foldrotationalsymmetryofa
hexagonalgrid.Wewillconsidersuchgroupsinchapter7whenwestudy
groupconvolution.Arelatedclassofgroupsarethe17wallpapergroups,
whichdescribethepossiblesymmetriesofwallpaperpatterns(Schattschnei-
der 1978). Their3D analogues are called(crystallographic) space groups, and
these have been classified into 230 distinct groups (Aroyo 2017).
3.3.3.3TheClassicalMatrixGroupsClassicalgroupsareanameofan
importantclassofgroupsthatoccurthroughoutmathematics,physics,and
computerscience,andseveralofwhichwillplayanimportantroleinthis
book.Theyweresonamedby(Weyl1939)inhisfamousbook"TheClassi-
cal Groups" on invariant polynomials, representation theory,and the Erlangen
programme. Mostgenerally defined, theclassical groups aregroups ofinvert-
iblematriceswithreal,complexorquaternionic
10
entries,possiblysubjectto
additionalconstraints(e.g.,volumepreservationdet(A)=1orpreservationof
certain kinds ofgeneralized inner product). Here wewill only consider certain
importantexamplesofrealandcomplexclassicalgroups;manymoreexamples
and details may be found e.g. in Goodman and Wallach (2009).
ThequintessentialmatrixgroupsarethegenerallineargroupsGL(n,R)
andGL(n,C)ofalln×ばつnrealorcomplexinvertiblematrices.Theseform
groups,becausetheyincludetheidentitymatrix,inversesforeachelement,
andbecause theproductoftwoinvertiblematrices isagaininvertible. They can
bethoughtofasthesymmetrygroupsofthevectorspacesR
n
andC
n
,being
the invertibletransformationsthat preserve theirvector spacestructure.
11
The
classicalgroups(andindeedallmatrixgroups)aresubgroupsofthegeneral
lineargroups,whichisthe"largest"ofthematrixgroups.Inthissense,the
generallineargroupplaysthesameroleformatrixgroupsasthesymmetric
group plays for finite groups.
FromGL(n,R)wecanobtainsubgroupsthatpreservenotjustthelinear
structure of R
n
, but also lengths,angles, orientation, and volume. Theorthog-
onalgroupO(n)contains onlythosematrices thatpreserve thestandardinner
productx
y.Thatis,weshouldhave(Ax)
(Ay)=x
yforallx,yR
n
,or
equivalently, A
A=I:
O(n)={AGL(n,R):A
A=I}.(3.11)
Thus, this group consists of rotations and reflections.
Geometrically,the innerproduct representstheanglebetweentwovectors
12
,
so orthogonal matrices preserve angles. The determinant of a matrix A tells us
76Chapter 3
GroupGL(n,R)SL(n,R)O(n)SO(n)
Constraintsdet(A)=0det(A)=1A
A=I
det(A)=1
A
A=I
Dimension
n
2
n
2
–1
n(n–1)
2
n(n–1)
2
Mapping
Invariants
-
Volume,
Orientation
Volume,
Distance,
Angle
Volume,
Orientation,
Distance,
Angle
Table 3.1
Classicalrealmatrixgroupsandtheirproperties.Eachgroupisdefinedasthesetof
n×ばつnrealmatricessatisfyingaconstraint(row1).Anequalityconstraintmayelimi-
natedegreesoffreedom,leavingonlyacertainnumberofdimensions(row2).Note
thatO(n)andSO(n)havethesamedimensionality,becauseorthogonalmatriceshave
det(A)=±1sotheadditionaldet(A)=1constraintforSO(n)eliminatesonlyadiscrete
degree of freedom.In the mapping row we show for each group how a generic element
actsonthestandardbasise
1
=(1,0)(blue)ande
2
=(0,1)(red)forn=2dimensions.
Finally, we list the geometric properties left invariant by each group.All matrix groups
leaveinvariantthevectorspacestructure,meaningtheorigin0andalllinearrelations
z=αx+βy, and additionally the properties listed in the last row.
howmuch itexpands volume, with negative values indicatinga reversal ofori-
entation. Elementsof theorthogonalgroup have det(A)=±1,i.e., itpreserves
volume but not necessarily orientation (reflections have determinant –1).
ThespeciallineargroupSL(n,R)consistsofvolume-preservinglinear
maps:
SL(n,R)={AGL(n,R):det(A)=1}.(3.12)
It doesnot necessarilypreserve angles.Similarly, thespecial orthogonal group
consists of volume and inner-product preserving linear maps:
SO(n)={AGL(n,R):A
A=I,det(A)=1}.(3.13)
In table 3.1 we show these groups and some of their properties.
ThecomplexanalogsofSL(n,R),O(n)andSO(n)arecalledthecomplex
speciallineargroupSL(n,C),unitarygroupU(n)andspecialunitarygroup
SU(n).Whileinourbookwewillfocusprimarilyonrealmatrixgroups,we
should mention here thatthe unitary groups play animportant role in physics:
U(1)×ばつSU(2) isthe "electroweak symmetry" associatedwith electromagnetic
Foundations of Equivariant Deep Learning77
andweakforces,whereasSU(3)appearsinquantumchromodynamicsthat
describes strong interactions between quarks.
3.3.3.4MotiongroupsFurtherexamplesofgroupsthatareparticularly
important in applications are the Euclidean motion groups. Euclidean motions
canbedefinedasisometries(distance-preservingmaps)ofEuclideanspace
(e.g.R
n
withthestandardinnerproduct).Thisincludestranslations,rota-
tions,andreflections,andindeedonecanshowthatanyEuclideanmotion
isasequenceofthese.TheseformtheEuclideangroup,denotedE(n).When
reflections are excluded, it is called the special Euclidean group SE(n).
Itisnotpossibletoperformatranslationx7→x+t forx,tR
n
bymul-
tiplyingxbyann×ばつnmatrix,becauseA0=0foranymatrix.Forthis
reasonitiscommontoaddahomogeneouscoordinate,representingxby
̄
x=(x
1
,...,x
n
,1)R
n+1
.AEuclideantransformationcanthenberepresented
by a matrix of shape n+1×ばつn+1 of the following form:
Rt
01
,(3.14)
whereRO(n)isann×ばつnorthogonalmatrix(rotationandreflection)and
tR
n
isatranslationvector.Onemayverifythatmultiplysuchamatrixby
̄
xresultsintherequiredtransformationx7→Rx+t.Furthermore,thecom-
positionoperationofE(n)isgivenbymatrixmultiplication,andsoitcanbe
defined as a matrix group.
Byintroducinguniformscalingtransformations(replacingRbysRin
eq.(3.14)),weobtainthegroupofsimilaritytransformationsSim(n).The
groupofaffinetransformationscontainsinadditiontorotations,transla-
tions,andreflections,alsoscaleandsheartransformations.Usingthesame
homogeneouscoordinaterepresentation,wecanconvenientlyencodeaffine
transformations using the same form aseq. (3.14), but taking RGL(n,R) to
be an arbitrary invertible matrix rather than an orthogonal one.
Projectivetransformationscanalsobeencodedconvenientlyusinghomo-
geneouscoordinates.Ann-dimensionalprojectivetransformationmaybe
encodedasanon-singularn+1×ばつn+1matrix,andcompositionisgiven
bymatrixmultiplication.Inthisrepresentation,twomatricesrelatedbya
scalarmultiplicationA=λBaretobeconsideredequal.Toapplyaprojec-
tive transformationto a vector x, oneuses matrix-vector multiplication on the
homogeneouscoordinate vector
̄
x,followedby adivision bythehomogeneous
coordinate(whichisnolongerequalto1ingeneral).Projectivetransforma-
tionsarea mainstayofclassicalcomputervision(HartleyandZisserman 2004)
and graphics. Table 3.2 lists several motions groups and their properties.
78Chapter 3
GroupPGL(n+1)Aff(n)(S)Sim(n)(S)E(n))
MatrixH
At
01

sQt
01

Qt
01
Domain
HGL(n+1)
AGL(n)
tR
n
Q(S)O(n)
tR
n
sR
+
Q(S)O(n)
tR
n
Dimension
(n+1)
2
–1n
2
+n
n(n–1)
2
+n+1
n(n–1)
2
+n
Mapping
Invariants
Colinearity,
Cross-ratio
Parallelism,
Volume
Volume
Ratio,Dis-
tanceRatio,
Angle
Volume,
Distance,
Angle
Table 3.2
Motiongroupsandtheirproperties.Motionsinndimensionsarerepresentedbyn+1×ばつ
n+1dimensionalmatricesusinghomogeneouscoordinates(seetext).Foreach group
we show theform ofthematrix, thedomainof thesub-matrices,andthe dimensionof
thegroup.NotethatfortheprojectivegroupPGL(n+1)(correspondingtoprojective
transformationsofn-dimensionalspace),anymatrixHGL(n+1)representsavalid
projective transformation, but matrices related by ascalar multiple should be identified
andthusthedimensionisreducedby1.Themappingrowshowshowasquaregrid
couldbemappedbyanelementintherespectivegroups,andtheinvariantsrowlists
several invariants of the group.
3.4Group Actions And Representations
Ingeometryingeneral,andgeometricdeeplearninginparticular,oneoften
wishes toperform computationsinvolving different kindsof geometric quanti-
ties, such as scalars,vectors, andtensors, as well as fieldsof such quantities.
13
The key thing that distinguishesand in fact characterizesthese and other types
of geometric quantitiesis not their dimensionality, butrather the way in which
theytransformundersymmetrytransformations.Theideathatthetypeofa
geometricquantityisdetermined bythe wayin whichthe groupof symmetries
acts on it was championed by Weyl (1939) and is widely used in physics.
14
Asanexample,considerthespecialorthogonalgroupSO(2)consistingof
planarrotationsaroundtheorigin.Itisintuitivelyobviousthatwecanapply
thesamerotationgSO(2)toapointonthecircleS
1
,ontheplaneR
2
,orto
aplanarimage(afunctionon theplane).Wearethusdealingwithonegroup
Foundations of Equivariant Deep Learning79
actingonseveraldifferentspaces.Moreover,thegroupcouldactindifferent
ways on the same space, for instance rotating clockwise or counterclockwise.
Ingeometricdeeplearning,allfeaturespacesareassociatedwithagroup
representation,whichtellsushowto transformdata livinginthatfeaturespace.
Thisincludestheinput space,outputspace,andallintermediatefeaturespaces.
Therearetwoequivalentwaystogiveprecisemeaningtotheideaofhav-
ingagroupGactonasetX.Thefirstapproachistodefineagroupaction
as a map α:G×ばつX→Xsatisfying certain criteria. Bycurrying we can obtain
fromαanothermapρ:G→[X,X](again satisfyingcertaincriteria)calleda
grouprepresentation,where[X,X]isthesetofmappingsfromXtoitself.
The reasonfor the nameis that eachabstract group elementg isrepresented in
thespaceXbyaconcretemappingρ(g):X→X.Itshouldbenotedthatthe
term"representation" hastraditionallybeen usedmostly fortheimportant case
whereρ(g)isalinearmapactingonavectorspaceX.Wewillusetheterm
representationforthegeneralconcept,andwherenecessarymakeadistinc-
tionbetweenlinearor Vect-representationsand generalor Set-representations.
Furthermore,sincegroupactionsandrepresentationsareequivalent(upto
currying), we will also permit ourselves to use these terms interchangeably.
Let us now look atthe aforementioned "certain criteria" thata group action
mustsatisfy.Withoutfurtherrestrictions,amapofthetypeα:G×ばつX→X
coulddoratherstrangethings.For example,theidentityelemente∈Gmight
notactastheidentityonX,i.e.,onecouldhaveα(e,x)=x.Also,ifweact
firstwithhandthenwithg,theresultmightbedifferent fromactingwithgh
(composinggandhin G).Thus, wewantagroupactiontorespectthestructure
ofthegroup,whichconsistsofadesignatedidentityelementandtheruleof
composition:
LetGbeagroupandletXbeaset.AgroupactionofGonXisa
mapping α:G×ばつX→Xsatisfying:
α(e,x)=x(Identity)
α(gh,x)=α(g,α(h,x)),(Composition)
for all g,h∈G, x∈X, and where e∈Gis the identity.
When thereis littlechance ofconfusion, groupactions canalso bedenoted
as g·x=α(g,x).
Ifwehaveagroupactionα,wecanobtainagrouprepresentationρby
definingρ(g)=α(g,–).Wecanalsodefinearepresentationwithoutreference
80Chapter 3
toanaction, butagain notallassignmentswilldo; ρmustsatisfytwo criteria
equivalent to the ones for group actions:
LetGbeagroup,letXbeaset,anddenoteby[X,X]thesetof
mappingsfromXtoX.A(Set-)representationofGonXisamap-
pingρ:G→[X,X]thatassigns toeachg∈Gamappingρ(g):X→X
satisfying:
ρ(e)=id
X
(Identity)
ρ(g)ρ(h)=ρ(gh),(Composition)
for all g,h∈G, and where e∈Gis the identity.
Thefirst axiom(identity)says thattheidentity e∈Gshouldberepresented
bytheidentitymappingonX,i.e.itshoulddonothing
15
.Thesecondaxiom
(composition) ensures thatρpreserves all positive relations of theform gh=k
that hold in the group
16
. This includes in particular relationssuch as gg
–1
=e=
g
–1
gthatexpressthatg
–1
istheinverseofg,whichimpliesthatρ(g)mustbe
an invertible map for any g∈G.
Negativerelations,oftheformgh=kmaynotbepreserved,however.For
instance, forany group wecan define thetrivial representation
17
, whichmaps
allelementsgtotheidentitymaponX.Onemaycheckthatthisisindeeda
validgroupaction.Anactioniscalledfaithfulwhenρisinjective(meaning
thatitisdifference-preserving:g=hρ(g)=ρ(h)).Inthiscase,allnegative
properties are also preserved.
Inmanycases,GandXhaveadditionalstructure,suchastopologicalor
smooth(Liegroup)structure,inwhichcaseonehastoadaptthedefinition
inordertomakethemathworknicely.Forinstance,byrequiringρtobea
continuousor smoothmap,andtaking [X,X] tobethe spaceofcontinuous/
smooth maps from Xto itself
18
.
In GDL, datausually live in a vector space, andmost of the representations
weencounterinpracticearelinear,meaningthatthemapsρ(g):X→Xare
all lineartransformations ofa vector spaceX. Aswe will see inlater chapters,
thisistrueeventhoughtheneuralnetworksweworkwitharehighlynon-
linear.Suchlinear representationsinvectorspacesare definedanalogouslyto
ordinary representations in sets:
Foundations of Equivariant Deep Learning81
LetGbeagroup,letXbeavectorspace,anddenoteby[X,X]the
set oflinear mappingsfromXtoX. Alinear representation(or Vect-
representation) ofGon Xisamapping ρ:G→[X,X]that assignsto
each g∈Ga linear mapping ρ(g):X→Xsatisfying:
ρ(e)=id
X
(Identity)
ρ(g)ρ(h)=ρ(gh),(Composition)
for all g,h∈G, and where e∈Gis the identity.
We notethat onecan equivalently definea (linear) group representationas a
group homomorphism(section 3.3.2) fromGto the groupof invertible (linear)
transformations of X.
InpracticeweworkinabasisofXrelativetowhichthelinearmaps
ρ(g):X→Xcanberepresentedbymatrices
19
.Hence,wecanalsodefine
ρasamatrix-valuedfunction.Then,compositioncorrespondstomatrix
multiplication, and the application ρ(g)(x) is matrix-vector multiplication.
Linearrepresentationsareusefulbecausetheyturnabstractgrouptheory
intolinearalgebra,whichiswellunderstoodandforwhichwehavehighly
efficient specializedhardware.Evenwhenworkingwithagroupofmatrices,
it is useful to consider linear representations, and one should be careful to dis-
tinguishthe conceptsof matrixgroupsand linearrepresentations.Forinstance,
one couldconsider the groupSO(2) asa matrix group consistingof 2×ばつ2 rota-
tionmatrices,andhaveitactvian×ばつnmatricesρ(g)onthespaceofsignals
onthecircle,sampledatnpoints.Similarly,wecanconsiderthegroupof
d×ばつdpermutationmatrices,actingonthespaceofd×ばつdadjacencymatrices
by d
2
×ばつd
2
matrices ρ(g). We will cover these examples in more detail later.
Wehaveformallydefineda(linear)representationasamapρ:G→X
X
,
butitiscommontorefertotherepresentationspaceXastherepresentation
whenitisclearwhatρisassociatedwithX.Neverthelessitisimportantto
notethataccordingtoWeyl’sprinciple,thetypeofgeometricalquantityis
determinedbythe representation,notbythespacealone.Considerforexample
thegroup SO(2)of2×ばつ2rotationmatrices.An SO(2)-vectoris aquantityx
R
2
that transformsaccording tothe standard(vector) representation ρ
1
(g)=g.
Ontheotherhand,aquantityinthesamespaceR
2
iscalledapairofSO(2)-
scalars if it transformsaccording to the trivial representation ρ
0
(g)=I
2
(where
I
2
isthe2×ばつ2identitymatrix).Thustherepresentationdeterminesthetype
ofgeometricquantity,andaswewillseeamappingbetweentwotypesof
geometric quantities / representations is called an equivariant map.
82Chapter 3
3.4.1Examples of Representations
Itis hightimeto considersomeexamplesof grouprepresentationsand group
actions.Most groupsconsidered inthisbook arematrixgroups, andforthese
groupsthereisanaturalchoiceofrepresentationoftencalledthestandard
representation,whereeachgroupelement(matrix)A∈Gisrepresentedby
itself: ρ(A)=A. This definition trivially satisfies theidentity and composition
requirements to qualify as a representation.
Asanexample,considerthematrixgroupSO(2)={RR
2×ばつ2
|RR
=
I,det(R)=1}of2×ばつ2rotationmatrices.Clearly,eachrotationmatrixR
SO(2)definesalinearmappingR
2
R
2
givenbymatrixvectormultiplica-
tion: x7→Rx; thisis the standardrepresentation ρ(R)=Rof SO(2). Notethat
wecanequivalentlydefineSO(2) asthesetofrotation angles[0,2π)withcom-
positiongivenbyadditionmodulo2π.Inthatcasewecannotdefineρ(θ)=θ
becauseθ[0,2π]isanangleandnotamatrix,butwecanstilldefinean
equivalent representation:
ρ(θ)=
cos(θ)–sin(θ)
sin(θ)cos(θ)
(3.15)
Usingtherulesoftrigonometryonecanshowthatmatrixmultiplicationthen
mimics angle-addition: ρ(θ)ρ(θ
)=ρ(θ+θ
), and that ρ(0)=I.
Similarly,thestandardrepresentationofthepermutationgroupS
n
mapseach
abstract permutation to amatrix that permutes thecoefficients of the vector on
whichitacts.Forinstance,ifπ
12
isthepermutationthatswapsthefirsttwo
elements of a length-3 sequence, the corresponding permutation matrix is:
ρ(π
12
)=
010
100
001
(3.16)
ThisrepresentationisusefulinGDLapplicationsinvolvingsetsandgraphs
(chapter5),whereeachofthennodes/elementsisequippedwithafeature
x
i
Rthatwewish topermute.Indeedwe findthat(ρ(π)x)
i
=x
π(i)
.If instead
ofone scalarper elementwe havea d-dimensional vectorx
i
R
d
wecan stack
them into amatrix XR
n×ばつd
and act onit with the samematrix: X7→ρ(π)X.
Equivalently, we canstack theminto onecolumn vector xR
nd
in whichcase
we use an equivalent representation matrix with didentical blocks:
ρ
d
(π)=
ρ(π)
ρ(π)
.
.
.
ρ(π)
(3.17)
Foundations of Equivariant Deep Learning83
ThepermutationgroupS
n
canalsoacton matricesofsizen×ばつn,by simul-
taneouslypermutingtherowsandcolumns.Ifρ(π)isthestandardmatrix
representation of S
n
defined above, we get a representation on matricesvia the
tensorrepresentation(section3.6.6):(ρρ)(π)A=ρ(π)Aρ(π)
.Thisrepre-
sentationis usefulforinstance fordescribingthe transformationbehaviour of
the adjacency matrix or edge-feature matrix of a graph.
Anotherrepresentationofthepermutationgroupisthesign(orparity)
representation.Anypermutationπcanbedecomposedintoasequenceof
transpositions(swaps ofapair ofelements),and althoughthisdecomposition
is notunique, the numberof transpositions N(π) isalwaysthe same.The sign
representationisgivenbyρ(π)=(–1)
N(π)
,i.e.itis1foranevennumberof
transpositions, and –1 for an odd number of transpositions.
ForanygroupGwecandefinethetrivialrepresentationρ(g)=1,which
mapseveryelementtothe1×ばつ1identitymatrix.Onemayverifythatthis
isindeedarepresentation.Ingeometricdeeplearning,weusethetrivial
representation for features that should be invariant.
ThematrixdeterminanthasthepropertythatdetI =1anddetAB=
detAdetB foranysquare matricesAand Bof thesame size. Hence,it defines
arepresentationofthegenerallineargroupandanysubgroupthereof(and
hence,ofanymatrixgroup).Forexample,foranelementoftheorthogo-
nalgroupAO(n),thedeterminanttellsuswhetherA flipsthehandedness
(detA=–1)orleavesitunchangeddetA=1).Flippinghandednesstwiceis
the same as not changing it, and indeed (–1)·(–1)=1.
Theexamples aboveare alllinearrepresentations,soletusconsideraSet-
representation (groupaction). For any group G, the composition:G×ばつG→G
definesanactionofGonthesetX=G,i.e.onitself.Writtenasarepresen-
tation, thissend g∈Gtothe mappingρ(g):G→Gdefined byρ(g)=g–, i.e.
ρ(g)(h)=gh.
A finalexample thatwillplay animportant roleinthe next chapterandthe
restofthebook,istheactionofagrouponaspaceoffunctions.Ifwehave
agroupGactingonanyspaceΩ,wecannaturallydefineanactionofGon
thespaceofreal-valuedfunctionsx:ΩR.DenotingtheactionofGonΩ
by g·u, this is defined as: (ρ(g)f)(u)=f(g
–1
·u). For instance, if Gis the group
of rotations andtranslations of theplane Ω=R
2
, thenρisthe way in whichG
acts on the space of grayscale images on R
2
.
3.5Equivariant Maps & Equivariant Nets
Havingdefinedgroupsandrepresentations,wearefinallyreadytogivea
proper mathematical definition of equivariance:
84Chapter 3
Letf:X→Ybeamappingofsets(orvectorspaces),Gagroup,
andletρ
1
:G→[X,X]andρ
2
:G→[Y,Y]betwo(linear)represen-
tationsofGactingonthedomainXandcodomainYoff.Thenfis
G-equivariantif for all g∈G:
fρ
1
(g)=ρ
2
(g)f.(3.18)
An equivariant(linear) mapcan also becalled amapping or morphism
of representations, written f:(X,ρ
1
)(Y,ρ
2
).
Unlike our preliminary discussion of equivariance in section 3.2.2, this def-
initionmakesitclearthattheequivarianceconstraintshouldbesatisfiedfor
everyelement ina group, andthat ρ
i
are notarbitrary mapsbutshould satisfy
the requirements for being a representation (section 3.4).
If we have two equivariant maps then their composite is also equivariant:
f
2
f
1
ρ
1
(g)=f
2
ρ
2
(g)f
1
=ρ
3
(g)f
2
f
1
(3.19)
Asalwaysinmathematicsandprogramming,wecanonlycomposefunctionsif
the codomain(output space) of thefirst map matchesthe domain(input space)
ofthe second.Forexample,wecancomposef
1
:X
1
→X
2
andf
2
:X
2
→X
3
to
obtain f
2
f
1
:X
1
→X
3
(read: f
2
after f
1
), but we cannotform f
1
f
2
because f
1
expects input of type X
1
but f
2
returns output of type X
3
instead.
It isvery importantto notethat when composingequivariant maps,we must
ensurethat notonlythedomain/codomain matchassetsorvectorspaces, but
alsoas representations,orelsethe compositewillnotbe equivariant.Inother
words,weshouldviewanequivariantmapf
1
notasamappingofsetsbutof
representations. Soin the exampleabove,f
1
maps fromrepresentation (X
1
,ρ
1
)
to (X
2
,ρ
2
), andf
2
maps from(X
2
,ρ
2
) to(X
3
,ρ
3
). Sincethe output representa-
tion of f
1
matches the inputrepresentation of f
2
, we cancompose them and the
composite f
2
f
1
will be an equivariant map from (X
1
,ρ
1
) to (X
3
,ρ
3
).
Followingthisrule,wecancreatearbitrarycomputationgraphsfrom
equivariant maps, i.e. create equivariant networks:
Definition 1 (Equivariant Network)Let G bea group. A G-equivariantnet-
workisacomputationgraph(DAG),whereeverynodeisassociatedwitha
vector spaceX
i
and representationρ
i
of Gin X
i
, andall mappings are equiv-
ariant. Thatis, every nodeis computed asa functionof itsparentsin thegraph
bya(possibly parameterized)mappingthatisequivariant withrespect tothe
representations acting on the input and output space of the map.
Foundations of Equivariant Deep Learning85
Thus,whendesigninganequivariantnetwork,wenotonlyhavetochoose
the featurespaces X
i
(e.g. how manydimensions it has),but alsohow wewant
ittotransform
20
(exceptfortheinputandoutputspace,wheretherepresen-
tationisusuallydeterminedbytheproblem).Everymappinginthenetwork
shouldbeequivariant,includingnon-linearities,normalizationlayers,learn-
able linearlayers,etc., becauseeven asinglenon-equivariant layerbreaks the
equivariance of the network as a whole.
One can createan invariant network simply by makingthe last layer invari-
ant,i.e.choosingtherepresentationoftheoutputspacetobethetrivial
one.Forexampleinaclassicalconvolutionalnetwork,onecouldchoose
aglobalaveragepoolinglayer(averagingoverspace),whichproducesa
translation-invariant output
21
.
Mappings in ageneral computation graph can have multipleinputs and pro-
duce multipleoutputs. Inthis casea simultaneous transformationg appliedto
allinputsusingtheirrepresentationshouldleadtoacorrespondingtransfor-
mation ofthe outputs. Suchmulti-variablemappings can alsobe considered as
mappings witha singleinput andoutputthat isthe concatenationof in/output
vectorspaces. In anequivariant networkeach of those vectorspaces has a rep-
resentationassociated withit,andtheirconcatenation willtransformaccording
toa block-diagonalrepresentationobtainedbystacking representationsalong
the diagonal (see section 3.6.1).
Thefeaturespacesofequivariantnetworksusuallyconsistofanumbern
λ
ofcopies(calledthe multiplicity)ofafew (typicallylow-dimensional)repre-
sentationsρ
λ
actingonR
d
λ
.Asdiscussedinsection3.4,eachrepresentation
ρ
λ
correspondsto atypeofgeometricfeature (e.g.scalar,vector,tensor). Thus,
a geometric featurespace consists of a numberof copies ofgeometric features
ofvarioustypes.Thismeansthatthechannels/dimensionsofageometric
featurevectoraregroupedattwolevels:eachdimensionbelongstoapartic-
ulargeometricfeaturevectorofdimensiond
λ
(e.g.a3Dvector),andalln
λ
geometricfeaturevectorsofthistypebelongtoasingleisotypicsubspaceof
dimensionn
λ
d
λ
.Forexample,wecouldhavetwogeometricfeaturevectors
v
1,1
,v
1,2
of type ρ
1
(forming an isotypic subspace ofdimension 2d
1
), and one
feature vector v
2,1
of type ρ
2
:
v
1,1
v
1,2
v
2,1
7→
ρ
1
(g)
ρ
1
(g)
ρ
2
(g)
v
1,1
v
1,2
v
2,1
(3.20)
Inthenextchapterwewillconsiderfeaturespacesconsistingofsignalsona
domain,in whichcasethegeometric quantitiesarenotjuststacked"vertically"
as wehave done here,but also"horizontally". For example, apoint cloud(??)
86Chapter 3
consistsof3Dpointfeatures(thestandardrepresentationofSE(3))stacked
horizontally, and wehavea permutation groupacting along thehorizontal axis
bypermutingpoints.Aplanarvectorfield(chapter7)consistsof2Dvectors
transforming accordingtothe standardrepresentation ofSO(2), andaddition-
ally thegroup SE(2)acts bymoving vectors to anew positionon theplane as
well as rotating them.
Inadditiontogeometricfeatures,alearnablemappingtakesparametersas
input.Parametersshouldingeneralbeconsideredasinvariantquantities(trans-
formingaccordingtothetrivialrepresentation),andsodon’trequirespecial
treatment and don’t impose any constraints on the layer.
3.6The Equivariance Cookbook
Wehaveseenhowequivariantnetworkscanbeconstructedfromequivariant
buildingblocks,buthavenotdiscussedthebuildingblocksthemselves.Here
we considera numberof common neuralnetwork layers andtheir equivariant
counterparts.
3.6.1Concatenation, Direct Sums, and Geometric Feature Channels
A common operationin neural networksis to concatenate a number ofvectors
x
i
R
d
i
,usuallydenotedx=concat(x
1
,...,x
n
).WhenthevectorspacesR
d
i
areassociatedwithrepresentationsρ
i
ofthesamegroupG,theconcatenated
vector xtransforms according to a block diagonal representation:
ρ(g)=block-diag(ρ
1
,...,ρ
n
)=
ρ
1
(g)
.
.
.
ρ
n
(g)
(3.21)
Inotherwords,concat(ρ
1
(g)x
1
,...,ρ
n
(g)x
n
)=ρ(g)concat(x
1
,...,x
n
).Foreach
g∈G,thematrixρ(g)hasblocksofsized
i
×ばつd
i
onthediagonal,andzeros
elsewhere.
Inmore abstractand basis-freelanguage, concatenatingvectorscorresponds
tothedirectsumofvectorspaces,andblock-diagonalstackingofrepresen-
tationscorrespondstoadirectsumofrepresentations.Wewillnotgointo
thishere,butnotethattheseoperationsaredenotedby
L
i
R
d
i
R
P
i
d
i
and
ρ=
L
i
ρ
i
, respectively.
Oneexamplewhere directsums appearin applicationsis whenwe dealwith
pointcloud data.Hereour dataconsistsof anumberofpoints x
i
R
3
,which
wecanstackintoonebigvectorx=concat(x
1
,...,x
n
).Eachpointisassoci-
atedwith arepresentationof therotationgroup G=SO(3),namelythe standard
representation(section3.4.1)ρ
i
(R)x
i
=Rx
i
(whereRisarotationmatrix).
Thus, thewhole pointcloud transformsas ρ(R)=block-diag(ρ
1
,...,ρ
n
), i.e.a
Foundations of Equivariant Deep Learning87
matrix withn copiesof Ralong thediagonal. Ifwe want toaddsome rotation-
invariantfeatures,suchasatimestamp,wecandosobyaddinganumberof
copies of the trivial representationρ
0
(g)=1 along the diagonal.
Insteadof stackingthex
i
intoa flatvector,wecouldalso stackthevectors
intoamatrixXR
3×ばつn
,inwhichcasetheactionisgivenbyX7→RX.For
consistencyhowever,itismathematicallymoreconvenienttoalwayswork
withthematrix-vectorform,evenifinourimplementationwemaychoose
a different shape array for our data.
Concatenation can be performed explicitly as a computation step in the net-
work,butcanalsobeusedtodescribefeaturespacesconsistingofmultiple
geometric quantities mathematically.In conventional neural networks, feature
spacescanbedescribedasaconcatenationofscalarfeaturesorinthecase
ofconvolutionalnetworks,ofchannels.Inequivariantneuralnetworks,one
typicallyconsidersfeaturespacesconsistingofmultiplekindsofgeometric
features(representations)ofvarioustype,eachoccuringwithsomemulti-
plicity.Forinstance,SO(3)rotationequivariantnetworkmightuseacertain
number of scalar features, 3D vectors, and 3×ばつ3 tensors as features. If thei-th
featuretype hasdimensiond
i
andwe have c
i
copiesof it,thetotaldimension
of the representation space would be d=
P
i
c
i
d
i
.
3.6.2Vector Addition, Scalar Multiplication & Linear Combinations
Anothercommonoperationisvectoraddition:z=x+yR
d
.Thisoperation
isequivariant,provided thatxandynotonlyhavethesamedimension dbut
also transform according to the same representation ρ. In that case we have:
ρ(g)x+ρ(g)y=ρ(g)(x+y),(3.22)
that is, vector addition is equivariant.
Similarly,multiplyingbyaninvariantscalar(i.e.eitheraconstantoran
input-dependentbutgroupinvariantquantity)isalsoequivariant.Thereason
isthatscalarmultiplicationcommuteswithmatrixmultiplication:ρ(g)αx=
αρ(g)x.Combiningvectoradditionandscalarmultiplication,wecantake
linearcombinationsofvectors,andthisisequivariantaswellwhentherep-
resentationsarethesameandscalarsareinvariant:z=
P
i
α
i
x
i
,forscalars
α
i
R.
3.6.3Change of Basis
Ifwehavead-dimensionalmatrixrepresentationρandanarbitrarynon-
singular(invertible)matrixAR
d×ばつd
,wecancreateanewrepresentationby
changing the basis:
ρ
(g)=Aρ(g)A
–1
.
88Chapter 3
We verify that ρ
is indeed a representation:
ρ
(g)ρ
(h)=Aρ(g)A
–1
Aρ(h)A
–1
=Aρ(g)ρ(h)A
–1
=Aρ(gh)A
–1
=ρ
(gh).
Tworepresentationsρandρ
relatedinthiswayarecalledequivalentor
isomorphic representations.
Whenρandρ
areequivalentbyachangeofbasisA,thenAisan
equivariant linear map between them:
Aρ(g)=Aρ(g)A
–1
A=ρ
(g)A.(3.23)
3.6.4Permutation representations and pointwise nonlinearities
Atypicalneuralnetworkconsistsmostlyoflearnablelinearmapsandfixed
elementwisenonlinearities. Inan equivariantnetwork,allmaps mustbe equiv-
ariant,includingnonlinearities.Whereastheequivarianceconstraintapplied
to learnable linear layerssimply leads to a reduction in thenumber of parame-
ters,nonlinearities generallydon’thave anyparameters andsowe mustchoose
themtobeequivariant.Equivarianceisexpressedrelativetogrouprepresen-
tationsactingontheinputandoutputspaceofthenonlinearmap,andso
therepresentationsandnonlinearitymustbechosentogether.Inthissection
andseveralthatfollow,weconsidervariouskindsofnonlinearitiesandthe
representations to which they are equivariant.
Considera discretegroupG, afeaturespaceX=R
d
anda representationρ
ofGactingonit.Ifρisapermutationrepresentation,meaningthatρ(g)isa
permutationmatrixforallg∈G,thenanypointwisenonlinearityisequivari-
ant withrespect toρacting onthe inputand outputspace
22
. Thisis becausea
pointwisenonlinearityactsthesameway oneachcoordinate,andapermuta-
tionsimply shuffles thecoordinates,so itdoesnot matterifwe shufflebefore
or after applying the nonlinearity.
Ingraph andsetneuralnetworks (chapter5),wehaveapermutationgroup
actingbypermutationmatrices,andsowecanfreelyusearbitrarypoint-
wisenonlinearities.Similarly,discretetranslationsofgrids(chapter6)such
asimagesandaudiosignalsareanexampleofpermutationrepresentations,
becauseadiscreteshiftsimplymoveseachpixeltoanotherposition(i.e.
dimension)without changingthe numericalvalues.Thus, pointwisenonlinear-
ities in convolutional networks do not break discrete translation equivariance.
Foundations of Equivariant Deep Learning89
Moregenerally,wegetpermutationrepresentationswhenworkingwith
scalar signalson a spaceΩhaving a group action(chapter 4) suchas a homo-
geneousspace(chapter7).However,ifthedomainiscontinuousthenwe
areforcedtodiscretizeandasaresultaliasingcanoccur,whichwillbreak
equivariance tocontinuous transformations. Sincenonlinearities can introduce
introducehigh-frequencystructureintothesignal,theyarelikelytoleadto
aliasing. In order to preserve equivariance to pointwise nonlinearities it is thus
necessary to take measures to avoid aliasing (Cesa, Lang, and Weiler 2022).
3.6.5Gating and Attention
Whentherepresentationmatricesρ(g)arenot permutationmatrices, pointwise
nonlinearitieswillingeneralnotbeequivariant.Forinstance,considerthestan-
dardrepresentationof arotationθ[0,2π) asa2×ばつ2matrix(eq.(3.15))acting
on R
2
. Aπ/2 rotationmaps (x,y)7→(–y,x) forexample,and thisoperation will
notcommutewithReLUbecauseofthesignflip.Similarly,afieldofsuch
quantities (a vector field; see chapter 7) will require special nonlinearities.
In cases likethese, gating nonlinearities can be used.In conventional neural
networks,gatingreferstomultiplyingascalarorvectorfeaturebyascalar
valuecalledthegate,which istypicallysquashedintothe (0,1)or (–1,1)range
by a sigmoid or tanh nonlinearity:
gate(x,g)=sigmoid(g)x.(3.24)
Herex,gR
d
arefeaturevectors,andindicateselementwisemultiplica-
tion.Sometimes,asinglescalargategRisusedtoscaleanentirefeature
vector xR
d
orasub-vector ofit.GatingisusedinLSTMandGRUblocks
(HochreiterandSchmidhuber1997;Chungetal.2014),squeeze-and-excite
blocks (Hu et al. 2020), and many other architectures.
Inequivariantnetworks,gatingcanbeusedonequivariantfeatures,provided
thatthegateisinvariantandweuseonescalargatetomultiplyawholegeo-
metricfeaturevector(notonegate percomponentofthevector). Asnotedin
section3.6.2, multiplyingbyaninvariant scalarisequivariant. Invariantscould
be producedbyan invariant linearmap(section 3.6.8)which isa specialcase
ofanequivariantmap.Alternatively,iftherepresentationmatricesρ(g)are
orthogonal(asis oftenthe case)one couldcompute thenorm ofgeometric fea-
turevectors,or moregenerallyaninvariant polynomial(Hilbert1890; Derksen
and Kemper 2002;Villaret al.2021). Thenone canapply anonlinearity such
assigmoid ortanh(clearly,theoutput willstillbeinvariant). Finally,one can
multiplya geometricfeaturevectorx
i
(transformingaccording toarepresenta-
tion ρ
i
) bya scalargate. Since scalarand matrixmultiplication commute,this
is always equivariant.
90Chapter 3
Attention isan operationthat likegating involvesmultiplicative interactions
betweendata-dependentquantities,butadditionallyinvolvesanaveragingstep.
LetXR
N×ばつD
beamatrixofNfeaturevectorsofdimensionD,andletA
R
M×ばつN
be a matrixof logits orunnormalized attention weights. Thenattention
is computed as:
attention(X,A)
md
=
N
X
n=1
softmax(A)
mn
X
nd
,(3.25)
wherethesoftmaxisappliedalongtheNaxis,thusproducinganormalized
distribution over the Nfeature vectors X
n,:
.
There are two kinds of equivariance we may wishthe attention operation to
have.Ingraphneuralnetworksand(non-autoregressive)transformers(chap-
ter5)onwishesthisattentiontobeequivarianttopermutationsoftheNfeature
vectorsandattentionweights;thisisautomaticallysatisfied.Inapplications
involvinggeometry(e.g.geometricgraphs,??,onemaywishattentiontobe
equivariantto agroupG(e.g. rotations)actingoneachfeaturevectorX
n,:
iden-
tically. Sinceattention computes a weighted average (section 3.6.2) of vectors
withthe samerepresentation,it isequivarianttosuchtransformationsprovided
thatthelogitsA areinvariant.Inthecaseofself-attention,whereA iscom-
puted by inner products, this is guaranteed whenever the keys and values have
the same orthogonal group representation.
3.6.6Tensor product nonlinearities
Gatingnonlinearities,wherewemultiplyascalarandsomeothergeomet-
ricquantity,isaspecialcaseofamoregeneraloperationcalledthetensor
productofrepresentations.Givenvectorsx
i
R
d
i
,theirtensorproduct(or
outerproduct)T=
N
i
x
i
livesinR
d
1
×ばつd
n
.Theelementatindex(j
1
,...,j
n
)
istheproductofthecorrespondingelementsofthevectors:T
j
1
,...,j
n
=
Q
i
x
i,j
i
.
Forexample,thetensorproductoftwocolumnvectorsisamatrixgivenby
xy=xy
,i.e.(xy)
j
1
j
2
=x
j
1
y
j
2
.Wecanlinearlycombinetensors,sothe
tensors of a given shape form a vector space
23
.
Whenthevectorsx
i
haveagrouprepresentationassociatedwiththem,the
spaceoftensorsalsonaturallybecomesarepresentation.Applyingg∈Gto
x,yusing their representations ρ
1
,ρ
2
, we find:
(ρ
1
(g)x)(ρ
2
(g)y)=(ρ
1
(g)x)(ρ
2
(g)y)
=ρ
1
(g)(xy
)ρ
2
(g)
.(3.26)
Thus,T=xyR
d
1
×ばつd
2
ismappedtoρ
1
(g)Tρ
2
(g)
,andindeedthis
definesagrouprepresentationonR
d
1
×ばつd
2
.Wecouldwritethisasρ(g)T=
ρ
1
(g)Tρ
2
(g)
,butitshould benotedthatρ(g)isa linearmapfromthespace
of tensors R
d
1
×ばつd
2
to itself, i.e. as a matrix it has shape d
1
d
2
×ばつd
1
d
2
.
Foundations of Equivariant Deep Learning91
Mathematicallyitisoftenconvenienttoglossovertheparticularshape
(sincereshapingdoesn’treallydoanything),butwhenimplementingalgo-
rithms theshape doesmatter. If we flattent=vec(T)R
d
1
d
2
, wecan takeρ(g)
tobe amatrix ofsized
1
d
2
×ばつd
1
d
2
.The venerable MatrixCookbook(Petersen
and Pedersen 2012, equation 520) tells us that:
vec(ρ
1
(g)Tρ
2
(g)
)=
(
ρ
2
(g)ρ
1
(g)
)
vec(T),(3.27)
where ρ
2
(g)ρ
1
(g) denotes the tensor / Kronecker product of matrices:
AB=
a
11
B···a
1n
B
.
.
.
.
.
.
.
.
.
a
m1
B
.
.
.
a
mn
B
(3.28)
for matrices AR
m×ばつn
and BR
p×ばつq
.
In summary, we canwrite the actionof Gon thespace of matrices eitheras
T7→ρ
1
(g)Tρ
2
(g)
,orinflattenedformast7→ρ
2
(g)ρ
1
(g)t.Thisiscalled
thetensorproductofrepresentationsρ
1
andρ
2
,andcanbeextendedtoarbitrary
products
N
i
ρ
i
.
Whenpairwisetensorproductsareusedasanonlinearity,the dimensionality
isgreatlyincreasedwitheachapplication.Henceitiscommontocom-
binetensorproductswiththeClebsch-Gordandecomposition(Kondor,Lin,
andTrivedi2018),wherea(fixed)changeofbasisisappliedtothetensor
productinordertoblock-diagonalizeitintoirreduciblerepresentations(see
section 3.6.9), afterwhich only a limited numberof low-order representations
is retained.
Another way in whichthe tensor representation comes up is in graph neural
networks,whereeachgraphonnnodesisdefinedbyanadjacencymatrix
AR
n×ばつn
(inpractice,theentriesofthematrixarebinary,butonecanalso
considergeneraledgeornode-pairfeatures).Whenwepermutethenodesof
thegraph,wehavetopermuteboththerowsandcolumnsoftheadjacency
matrix.This correspondstothe tensorrepresentationρ
1
ρ
1
,where ρ
1
(P)=P
isthestandardmatrixrepresentationofthepermutationgroup.Thatis,the
adjacency matrix transforms as A7→PAP
.
3.6.7Equivariant Normalization Layers
Normalizationlayers,suchasBatch-,Layer-,Group-andInstance-
Normalization, areanimportant partof modernnetwork architecturesinclud-
ingconvolutionalandattention-basedmodels(IoffeandSzegedy2015;Ba,
Kiros,andHinton2016;WuandHe2018;Ulyanov,Vedaldi,andLempitsky
2016).Normalizingfeaturesthroughoutthe networkcan improve optimization
92Chapter 3
speedandstability,andimprovegeneralization.Generallyspeaking,normal-
izationlayerscomputeameanμandstandard deviationσoffeatures,usethem
tonormalizethefeaturessothattheyhavemeanzeroandstandarddeviation
one, andthen optionally applya learnedshift and scaleoperation. They differ
intheaxesoverwhichthemeanandvariancearecomputed,andhencethe
shape of μand σ.
Regardlessoftheaxesoverwhichnormalizationisperformed,afew
simplerulessufficetomakenormalizationlayersequivariant.Asnotedin
section3.6.2,wearealwayspermittedtolinearlycombinegeometricfea-
turevectors,providedthattheyallhavethesamerepresentationandthatthe
scalar coefficients are invariant. Thus, we cancompute the mean ofgeometric
featurevectorsofthesamerepresentation,andsubtractitfromthosefeature
vectors, andtheresultwill beequivariantwiththesamerepresentation again.
Weshouldhowevernotaverageoverthecomponentsofasinglegeometric
feature vector; the components are not in themselves geometric vectors with a
representationandsotheirmeanisnotageometricquantityeither.Thestan-
darddeviationisafunctionofthesumofsquaresofthe(mean-subtracted)
components,andthisquantity(i.e.thenorm)willbeinvariantaslongasthe
representation is orthogonal.
Letusdescribean equivariant layer-normalizationlayer.Consideranarray
XR
N×ばつC×ばつD
,whereNisthebatchsize,Cisthenumberofchannels(multi-
plicitY),andDisthedimensionalityofgeometricfeaturevectors
24
.Weassume
that eachvector X
nc
transforms accordingtothe sameorthogonalrepresenta-
tion. We will normalize over the Caxis separately for each N. Thus, our mean
has shape N ×ばつD and is computed as:
μ
nd
=
1
C
C
X
c=1
X
ncd
.
That is,we computeNmeanvectors of dimensionD byaveragingoverthe C
axis.Wesubtractthisfrom X toobtainX
ncd
=X
ncd
μ
nd
.Then,wecompute
thesquarednormforeachgeometricfeaturevector,and computetheir average:
σ
2
n
=
1
CD
C
X
c=1
D
X
d=1
X
2
ncd
.(3.29)
Finally, we divide: Y
ncd
=
X
ncd
σ
2
n
+ε
.
It is common toapply a gain and bias after normalization.As noted, scaling
byaninvariantgainisalwayspermitted,providedoneusesthesamegainforall
coefficients ofageometricfeaturevector.Applyingabiasisonly equivariant
ifthebiasvectorsatisfiesρ(g)b=bforallg∈G,i.e.ifthebiasisinvariant.
Foundations of Equivariant Deep Learning93
Dependingonρ,thismaymean thatithastobe0,whileinothercasesthere
may be a space of solutions of some dimensionality.
3.6.8Equivariant Linear Maps
One of the most common type of maps in neural networks are learnable linear
maps,suchasfullyconnectedlayers,convolutions,depth-wiseconvolutions,
and1×ばつ1convolutions.Inthelinearcaseitisoftenpossibletounderstand
thespaceofequivariantmapsandparameterizeit.Inlaterchapterswewill
considervariouscaseswithspecialstructure,suchasplanar convolution,group
convolution,andgraphconvolution.Inthissectionweconsiderthecaseofa
general linear map, also known as a fully connected layer.
Consider agroup G, matrix representationsρ
1
(g)R
n×ばつn
and ρ
2
(g)R
m×ばつm
ofG,andamatrixWR
m×ばつn
.ForW tobeequivariant,ithastosatisfythe
equationρ
2
(g)W=Wρ
1
(g)forallg∈G.Thissystemofequationsislinear
inW,andsowecansolveforWtofindabasisforthespaceofsolutions.
IfW
i
R
m×ばつn
(fori=1,...,k)isabasisforthespaceofsolutions,thenany
equivariant matrixWcan bewritten asa linearcombination ofbasis matrices:
W=
P
k
i=1
θ
i
W
i
, and any such linear combination is equivariant.
Itiscommontoaddalearnablebiasvectortotheoutput,computingthe
affine mapWx+binsteadofthelinearmapWx,butthisisnotnecessarily
equivariantevenifWis.Wecanreducetheaffinecasetothelinearoneby
appending a constant 1 dimension to x. Thisquantity is invariant, so we add a
constant1block(trivialrepresentation)ofsize1×ばつ1 toρ
1
aswell,yieldinga
n+1 dimensional matrix representation. Then, solving for Was before yields
an n+1×ばつm matrix containing the equivariant weight matrix and bias vector.
Dependingonthegroupandtherepresentations,itmightbepossibleto
solvethe equivariance constraintanalytically, and thishas been done formany
groupsofinterests.Indeed,aswewilldiscussinthenextsection,represen-
tationtheorypresentsuswithasystematicwaytounderstandthespaceof
equivariantlinearmapsbetweentworepresentationsbybreakingthemdown
intoirreduciblerepresentationsforwhichequivariant linearmapstakeavery
simple form.
Nevertheless,itcansometimesbemorepracticaltosolvetheequivariance
equationsnumerically.Todoso,we wouldlike tofirstbringthe equivariance
equationin thestandard matrix-vector formAw=0, wherew=vec(W)is the
flattenedweightmatrix andA is amatrixofcoefficients derivedfromρ
1
and
ρ
2
.We firstwrite theconstraint asρ
2
(g)Wρ
1
(g)
–1
W=0. Next, wevectorize
(flatten) usingthe vec operator. Usingthe generalfact (Petersen andPedersen
(2012),Section10.2.2.,equation520)thatvec(AXB)=(W
A)vec(X),
94Chapter 3
where is the Kronecker product eq. (3.28), we obtain:
(ρ
1
(g)
ρ
2
(g)–I
n
I
m
)vec(W)=0(3.30)
Thus,foreachgroupelementgwegetaconstraintmatrix(ρ
1
(g)
ρ
2
(g)–
I
n
I
m
)ofsizenm×ばつnm.Assumingthegroupisfinitewith|G|elements,we
can stackthese matricesinto constraintmatrix Aof size |G|nm×ばつnm. Wecan
thenuseastandardmethodsuch asSVDtosolvethelinearsystemAw=0and
obtain our (flattened)basis matrices w
i
=vec(W
i
). If thegroup is not finite,it
isoftensufficienttosamplealargenumberofgroupelementsandsolvethe
system for those.
Performingthe linearcombinationW=
P
k
i=1
θ
i
W
i
canbe expensivewhen
W
i
arelargematrices, orwhenthenumberof basismatriceskislarge. Once
training iscomplete, onecan perform thelinear combinationsonce and simply
storeWas parameters,asina conventionalneuralnetwork, butduring train-
ingthisoperationmustbeperformedduringeachforwardpass.Fortunately,
thematricesW
i
canoftenbe chosentobesparsewithnon-overlappingspar-
sitypatterns.Asaresult,thematrixW willdisplayaweight-sharingpattern
(Ravanbakhsh, Schneider, and Poczos 2017).
One exampleis convolution, which, as we discussin Chapter 6 corresponds
to a matrixwhere the entries on eachdiagonal are the same. Anotherexample,
discussedinChapter5arepermutation-equivariantlinearmaps.Considera
linear mapWfrom R
n
to itselfthat is equivariantto permutationsgS
n
acting
bypermutationofcoordinates.Onecanshowthatsuchamaphasonlytwo
parameters (Zaheer et al. 2017):
W=θ
1
W
1
+θ
2
W
2
,(3.31)
whereW
1
=I
n
istheidentitymatrix,andW
2
=11
I
n
isthematrixwith
zero diagonalandone everywhere else.Thus, insteadof computingthelinear
combination, wecould inthis casecopy theweight θ
1
across thediagonal, and
copy θ
2
to all off-diagonal entries ofW.
Asnotedinsection3.5andfurtherstudiedinsection3.6.1,itiscommon
in equivariantnetworks touse block-diagonal representations.In thiscase the
constraint for Wsplits into blocks as well:
ρ
2
1
(g)0
0ρ
2
2
(g)

W
11
W
12
W
21
W
22

ρ
1
1
(g)
–1
0
0ρ
1
2
(g)
–1
=
W
11
W
12
W
21
W
22
(3.32)
Theequivariance constraintcanthenbesolvedforeachblock ofWseparately,
sincethe(i,j)-thblockonlyneedstobeequivariantwithrespecttothei-th
block of ρ
2
and the j-th block of ρ
1
:
ρ
2
i
(g)W
ij
ρ
1
j
(g)=W
ij
.(3.33)
Foundations of Equivariant Deep Learning95
Furthermore, once we have solved these equations for all blocks, we can form
thefullmatrixW bylinearlycombiningbasisblocksandparameters,which
is more efficient than linearly combining full basis matrices.
3.6.9*Irreducible Representations
We have seenthat wecancombine representationsinto alarger oneby stack-
ingthemblock-diagonally,andwehave seenthatwecanchangethebasisof
arepresentationtoobtainanequivalentone.Usingtheseoperations,wecan
createratheralotofrepresentations.Itisnaturaltoaskwhetherwecango
in the other direction:Given some arbitrary representation,can we change the
basistomakeitblock-diagonal(withthesameblockstructureforallg∈G)?
And could we do this recursively on the blocks, until the representation matri-
ces arediagonal? These questionslead us toone ofthe most centraland useful
concepts in representationtheory, namelythe concept ofirreducible represen-
tations (irreps).Wewill seehow irreps providea systematicunderstanding of
the space representations and the equivariant linear maps between them.
Todefineirreduciblerepresentationswemustfirstdefineinvariantsub-
spaces.Letρbearepresentation inavectorspaceV.Aninvariantsubspace isa
linear subspaceW Vsuch thatfor allg∈Gand wW,we have ρ(g)wW.
So, whilevectors wWare notinvariant themselves (ρ(g)w=w in general),
they stayinsidethesubspaceWwhen actedonbyany groupelement,andso
we say that the subspace Wis invariant.
For any representation ρon V, the subspaces W ={0} andW=Vare invari-
ant,sothesearecalledtrivialinvariantsubspaces.Arepresentationwitha
non-trivialinvariantsubspaceiscalledreducible,andonewithoutsucha
subspace is called irreducible.
Ifwehaveareduciblerepresentationρwithnon-trivialinvariantsubspace
W V,wecanfind achangeofbasissothatρbecomesblockupper-triangular:
ρ
(g)=Aρ(g)A
–1
=
ρ
11
(g)ρ
12
(g)
0ρ
22
(g)
(3.34)
Avectoroftheform
̃
w=(w,0)willremainofthisform:ρ
(g)
̃
w=(ρ
11
(g)w,0),
i.e. it remains in the subspace W.
Ifitispossibletofindabasiswhereadditionallyρ
12
(g)=0,i.e.whereρ
isblock-diagonal,wesaythatρisdecomposable,andotherwiseitiscalled
indecomposable.Irreduciblerepresentationsarealwaysindecomposable,but
theconverseisnottrueingeneral.However,undersomeadditionalassump-
tions,whichholdformostcasesofpracticalinterestinGDL,theconverse
istrue,whichmeanswecankeepdecomposingarepresentationuntilallthe
blocks are irreducible (and thus indecomposable).In particular, this is true for
96Chapter 3
(complex)representationoffinitegroups,andforfinite-dimensionalunitary
representations.
Since (underbenign conditions)a representationcan bedecomposed intoa
direct sumof irreps, onewould liketo know allirreps for thegroup of interest.
Then,wewouldhavea completeunderstanding ofthe spaceof representations
ofourgroup.Fortunately,theirrepsofmostgroupswecareabouthavebeen
classified, and sousually this is amatter of looking up theirreps in a reference
work suchas VilenkinandKlimyk (1991).Computing theirrepsby handcan
for a group of interest can also be educational
25
.
Inchapter6wewillconsidergrids,andgroupsofshiftsactingonthem.
Thesegroups arecommutative,and suchgroupshaveone-dimensional(com-
plex) irreps,sothatany representationofsuchagroupcanbe fullydiagonal-
ized.Takeforinstance thegroupof 1DcyclicshiftsC
n
(section3.3.3.1), which
hasirrepsof theformρ
k
(t)=e
2πikt/n
(whereiis theimaginaryunit,tC
n
isa
translation and k{0,...,n–1}).
ThosefamiliarwithFourieranalysismayrecognizetheirrepsρ
k
asthe
basis functionsused inthe Fourier transform,and indeedrepresentation theory
revealsavastgeneralizationofFourieranalysisthatappliestoalargeclass
ofgroups.Eachgrouphasadifferentsetof(inequivalent)irrepsρ
λ
indexed
bysomesetΛcalledthespectrum.Thevaluesλcanbeinterpretedas(gen-
eralized)frequencies,anddependingonthegroup,thesetΛcouldbefinite,
countablyinfinite(e.g.Λ=Z),uncountablyinfinite(e.g.Λ=RorΛ=C).In
chapter 7 we will discuss this topic in more depth.
Thespectrumalsogivesusaconvenientwaytodescribeanarbitrary
representationinabasis-independentway.Ifwehaveafullyreduciblerep-
resentationρofagroupGwithspectrumΛ,wecandecomposeitasadirect
sumofirreps:ρ=
L
j
ρ
λ
j
forsomelistoffrequenciesλ
j
Λ.Anevenmore
compactdescriptionistocountthemultiplicitym
λ
,i.e.howofteneachirrep
ρ
λ
occursin ρ.Thelistof frequenciesormultiplicitiescan becalledthetype of
ρ.When wewish tobemore precise,wecan alsoinclude thechangeof basis
matrixAthatdecomposesρintoirrepsaspartofthetype,sothatthetype
precisely identifies one representation ρ.
3.6.10*Schur’s Lemma
Knowing how representationsdecompose intoirrepsisveryhelpful inunder-
standingthespaceofequivariantmaps.Wesawinsection3.6.8(eq.(3.32))
thatifwehavetworepresentationsρ
1
,ρ
2
thatsplitintoblocksρ
1
i
,ρ
2
j
,thena
(ρ
1
,ρ
2
)-equivariantlinearmapW splitsinto(ρ
1
i
,ρ
2
j
)-equivariantblocksW
ij
.
So by decomposing therepresentations into irreps, we canreduce the problem
ofcharacterizing thespaceofequivariantlinearmaps tothecaseofirreps. In
Foundations of Equivariant Deep Learning97
otherwords, ifweknow how toparameterizeequivariantW
ij
,allwe needto
doisstackthemtoformanequivariantW.Schur’slemmatellsushowto
parameterize the space of equivariant linear maps W
ij
between irreps i,j.
Schur’slemmahastwopartswhichdependondifferentconditions,sowe
treat them separately.
Lemma2(SchurI)LetGbeagroup,andletρ
1
,ρ
2
beirreduciblerepre-
sentationsofGonvectorspacesV
1
,V
2
overafieldK(e.g.realorcomplex
numbers). Then a(ρ
1
,ρ
2
)-equivariant linearmap is eitheran isomorphismor
the zero map.
Insimpleterms,assumingweareworkingwithfinite-dimensionalvector
spaces and concrete matrices, to say that representations ρ
1
,ρ
2
are isomorphic
or equivalent is to saythat an appropriate change of basis (section 3.6.3) turns
one into the other. That is, W
12
is invertible. Conversely if the representations
arenotequivalent,thatmeansthereisnoisomorphismbetweenthem,andso
according to Schur any equivariant linear map must bezero. Thus, already we
know thattheblocksW
ij
ofW corresponding toinequivalentirrepsmustbe
zero.
Lemma3(SchurII)letG,ρ
1
,ρ
2
beasinSchurI,andassumeadditionally
thatV
1
=V
2
andis finitedimensional, andthatKisalgebraically closed(e.g.
complexbutnotrealnumbers).Thena(ρ
1
,ρ
2
)-equivariantlinearmapisa
scalar multiple of the identity, i.e. W=cIwhere cK.
Thus,atleastinthecaseofcomplexrepresentations,weknowhowto
parameterizetheequivariant linearmaps W
ij
forirrepsi,j bya singlecomplex
parameter!Thesituationforrealrepresentationsisalittlebitmoreinvolved,
but one can show thatthe space of equivariant linear maps between real irreps
iseither 1,2or 4dimension, dependingonwhether thereal representationisof
real,complexorquaternionictype.Thiscanbeascertainedviaamapcalledthe
Frobenius-Schurindicator.MoremathematicaldetailscanbefoundinSerre
1977;Etingofetal.2011.Inpractice,itmaybemostconvenienttocompute
theequivariantlinearmapsforgivenirrepsnumericallyusingthetechnique
presented insection 3.6.8,or usea librarythat hasalready implementedthem
forthegroupofinterest(GeigerandSmidt2022;Geigeretal.2022;Cesa,
Lang, and Weiler 2022).
3.7Summary
To be expanded.
Introduced the notion of symmetry in the context of learning problems.
98Chapter 3
Symmetriesformagroup,actondataviarepresentations.Algebraicprop-
ertiesofthisgroupcanbeusedtodefineafirst‘blueprint’forequivariant
networks.
Nextsteps:‘augment’symmetrieswith‘near-symmetries’leveragingfurther
physical stucture in the problem.
Next Chapter augments the algebraic priors with geometric priors.
Exercises
1.Consider a function f:X→Yand assume that it is equivariant, i.e. that:
f(ρ
X
(g)x)=ρ
Y
(g)f(x),(3.35)
for g any element of some group Gand actions ρ
X
and ρ
Y
of G.
Provethatiftwoinputshavethesameoutputf(x)=f(x
),thenthetransformed
inputs ρ
X
(g)x and ρ
X
(g)x
also have the same output.
2.Consideralabelfunctionf:X→YbetweenfinitesetsX,Y.Howmanysymme-
tries (invariances) does fhave?
3.Consider an unknownlabel function f:X→Ybetween finite sets, and assume that
we know the full symmetry group Gof f. What is the minimum numberof labelled
examples (x,y) we need to determine f?
4.Consideragroupactionα(g,x)andshowthatthemapf
1
(x)=α(g,x)isinverseto
f
2
(x)=α(g
–1
,x).
5.Consider thecyclic groupC
4
=e,r,r
2
,r
3
,where eis theidentityandr isa90-degree
rotation.Createatablewiththe4groupelementsalongtherowsandcolumns,
andfillentryi,jwiththeproduct(groupcomposition)ofthecorrespondinggroup
elements. You have written the multiplication table for C
4
.
6.Givenapermutationσ:[1,...,n][1,...,n],how doweproducethe correspond-
ingn×ばつnpermutationmatrixP
σ
?Showthatthecompositionofmapsσσ
corresponds to matrix multiplication P
σ
P
σ
.
7.Showthatgivenmatrixrepresentationsρ
1
,...,ρ
n
ofG,themapρ(g)=
block-diag(ρ
1
,...,ρ
n
) is also a representation. (Sec. 3.6.1).
8.ConsiderG-invariantfunctionf:X→Y.Wesaythatfiscompleteifforanyx,x
forwhichthereexistsnog∈Gsuchthatgx=x
,wehavef(x)=f(x
).Showthatif
f:X→Yisacompleteinvariantmapping,anyinvariantmappingf
:X→Zcan
be written as f
(x)=f
′′
(f(x)) for some mapping f
′′
:Y→Z.
9.Show that a group homomorphism maps the identity to the identity.
10.ShowthatthenormalizationoperationY
ncd
=
X
ncd
σ
2
n
+ε
definedinSec3.6.7is
equivariant.
Foundations of Equivariant Deep Learning99
Notes
1
Morespecifically,vectorsareabstractobjectsthancanbeadded(vector
addition)andscaled(scalarmultiplication)accordingtocertainrules.One
oftenhearsvectorsdefinedas"arrows"or"arraysofnumbers";mathematically
speaking, noneof thesedefinitions is correct:the formerrequires an additional
structurecalledinnerproduct(whichdefinesdirection),thelatterrequiresan
additionalassumptionofabasis(inwhichavectorcanberepresentedusing
coordinates).
2
InthecontextofEuclideangeometry,congruentandisomorphicmeanthe
samething,butcongruentismorecommoninelementarycourseswhereas
isomorphism is applied throughout mathematics.
3
For example,alllinearmaps preservethestructureof alinearspaceinthat
theypreserve linearrelations: αx+βy=zα(Ax)+β(Ay)=Az. However,
onlytheinvertiblelinearmapsfromavectorspacetoitselfcouldbecalled
symmetries (automorphisms) of that space.
4
Anaffinespaceislikeavectorspace,exceptthatitdoesnothavea
designated origin 0 that must be preserved by symmetries
5
Thiskindofnotationiscommonincategorytheoryandisexplainedin
Appendix E.
6
After the Norwegian mathematician Niels Henrik Abel (1802–1829).
7
We mostly work finite-dimensional real matrices, but one may also consider
groups of complex matrices, and infinite-dimensional matrices.
8
Cayley’s theoremis a special caseof a deep result incategory theory called
the Yoneda Lemma.
9
Theclassificationoffinitesimplegroupsspanstensofthousandofpages
and was completed largely between 1955 and 2004.
10
Quaternionsarenon-commutativehypercomplexnumbersoftheforma+
bi+cj+dk. They are often used as a description of 3D rotations.
11
By"preserving vector spacestructure"we meanthatlinear mapsApreserve
linearrelations:αx+βy=zα(Ax)+β(Ay)=Azforscalarsα,βandvectors
x,y,z.
12
More precisely, (x,y)=cos
–1
(x
y/x∥∥y).
13
Afieldinthiscontextisascalar-,vector-,ortensor-valuedfunctiononsome
geometric domain.
14
Forexample,physicistswouldoftensaythat"atensorisanobjectthat
transformsasatensor".Inrepresentation-theoreticterms,onecouldsaythat
tensors arequantities that transformaccording to atensor representationof the
general linear group.
100Chapter 3
15
Recallthatmappingslikeρ(e)andid
X
areequalifandonlyiftheygive
equaloutputsforallinputs,sotheIdentityaxiomcouldalsobewrittenas
ρ(e)(x)=x for all x∈X.
16
Moreprecisely, topreservetheserelationsmeans thatgh=kρ(g)ρ(h)=
ρ(k).
17
Despiteits name,thetrivialrepresentationhas manyimportant uses,since
it is the representation we use for any invariant quantity.
18
One cangive ageneral definitionusing thecategory-theoreticnotion ofan
"action of a group object internal to a category".
19
Whenthevectorspaceisinfinitedimensional,asitwillbewhenweconsider
spacesofsignals/functionsonacontinuousdomain,the"matrix"willbe
infinite dimensional as well.
20
Learningthegrouporitsrepresentationisaninterestingandimportant
active area of research.
21
In/equivariance inCNNs holdsonly upto bordereffectsand aliasingunless
specifically addressed;see Azulay andWeiss 2019;Zhang 2019; Vasconcelos
et al.2021; Karras etal. 2021; Weiler and Cesa2019a; Cesa, Lang,and Weiler
2022
22
Onecanalsoconsiderrepresentationsbysignedpermutationmatri-
ces,whicharematriceswithasingle1or–1ineachrowandcolumn.
Forsuchrepresentations,onecanusetheCReLUnonlinearityCReLU(x)=
(ReLU(x),ReLU(–x)).SeeTacoSCohenandWelling(2017a)andShanget
al. (2016)
23
Inmathematics andphysics, oneusuallyconsiderstensor productsofvec-
torsanddualvectors(whichhavedifferentrepresentationsassociatedwith
them), leadingto tensors ofmixed type,with somenumber of upperand lower
indices. Here we will not consider dual vectors, though.
24
Hereweassumethatallgeometricfeaturevectorsareofthesamedimen-
sion and have the samerepresentation. The approach can be generalizedto the
casewherewehaveC
λ
copiesofvariousrepresentationsρ
λ
byapplyingthe
calculation for each λseparately.
25
Oneapproachwouldbetoexplicitlyconstructandthendecomposethe
regular representation (Gacting onthe space of functions on G). According to
thePeter-Weyltheorem,undersuitableconditions,theregularrepresentation
decomposesintoadirectsumofallirreps,withmultiplicityequaltotheir
dimension.

AltStyle によって変換されたページ (->オリジナル) /