Siamese neural network

Neural network working on two input vectors

A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.^[1]^[2]^[3]^[4] Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.^{[citation needed ]}

It is possible to build an architecture that is functionally similar to a twin network but implements a slightly different function. This is typically used for comparing similar instances in different type sets.^{[citation needed ]}

Uses of similarity measures where a twin network might be used are such things as recognizing handwritten checks, automatic detection of faces in camera images, and matching queries with indexed documents. The perhaps most well-known application of twin networks are face recognition, where known images of people are precomputed and compared to an image from a turnstile or similar. It is not obvious at first, but there are two slightly different problems. One is recognizing a person among a large number of other persons, that is the facial recognition problem. DeepFace is an example of such a system.^[4] In its most extreme form this is recognizing a single person at a train station or airport. The other is face verification, that is for example, to verify whether a photo in a passport matches the face of the passport's owner. The twin network might be the same, but the implementation can be quite different.

Learning

[edit ]

Learning in twin networks can be done with triplet loss or contrastive loss. For learning by triplet loss a baseline vector (anchor image) is compared against a positive vector (truthy image) and a negative vector (falsy image). The negative vector will force learning in the network, while the positive vector will act like a regularizer. For learning by contrastive loss there must be a weight decay to regularize the weights, or some similar operation like a normalization.

A distance metric for a loss function may have the following properties^[5]

Non-negativity: $\delta (x,y)\geq 0$ {\displaystyle \delta (x,y)\geq 0}
Identity of Non-discernibles: $\delta (x,y)=0\iff x=y$ {\displaystyle \delta (x,y)=0\iff x=y}
Commutativity: $\delta (x,y)=\delta (y,x)$ {\displaystyle \delta (x,y)=\delta (y,x)}
Triangle inequality: $\delta (x,z)\leq \delta (x,y)+\delta (y,z)$ {\displaystyle \delta (x,z)\leq \delta (x,y)+\delta (y,z)}

In particular, the triplet loss algorithm is often defined with squared Euclidean (which unlike Euclidean, does not have triangle inequality) distance at its core.

Predefined metrics, Euclidean distance metric

[edit ]

The common learning goal is to minimize a distance metric for similar objects and maximize for distinct ones. This gives a loss function like

{\begin{aligned}\delta (x^{(i)},x^{(j)})={\begin{cases}\min \ \|\operatorname {f} \left(x^{(i)}\right)-\operatorname {f} \left(x^{(j)}\right)\|,,円i=j\\\max \ \|\operatorname {f} \left(x^{(i)}\right)-\operatorname {f} \left(x^{(j)}\right)\|,,円i\neq j\end{cases}}\end{aligned}}

{\displaystyle {\begin{aligned}\delta (x^{(i)},x^{(j)})={\begin{cases}\min \ \|\operatorname {f} \left(x^{(i)}\right)-\operatorname {f} \left(x^{(j)}\right)\|,,円i=j\\\max \ \|\operatorname {f} \left(x^{(i)}\right)-\operatorname {f} \left(x^{(j)}\right)\|,,円i\neq j\end{cases}}\end{aligned}}}

i,j

{\displaystyle i,j} are indexes into a set of vectors

\operatorname {f} (\cdot )

{\displaystyle \operatorname {f} (\cdot )} function implemented by the twin network

The most common distance metric used is Euclidean distance, in case of which the loss function can be rewritten in matrix form as

\operatorname {\delta } (\mathbf {x} ^{(i)},\mathbf {x} ^{(j)})\approx (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})^{T}(\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})

{\displaystyle \operatorname {\delta } (\mathbf {x} ^{(i)},\mathbf {x} ^{(j)})\approx (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})^{T}(\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})}

Learned metrics, nonlinear distance metric

[edit ]

A more general case is where the output vector from the twin network is passed through additional network layers implementing non-linear distance metrics.

{\begin{aligned}{\text{if}},円i=j,円{\text{then}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {f} \left(x^{(j)}\right)\right],円{\text{is small}}\\{\text{otherwise}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {f} \left(x^{(j)}\right)\right],円{\text{is large}}\end{aligned}}

{\displaystyle {\begin{aligned}{\text{if}},円i=j,円{\text{then}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {f} \left(x^{(j)}\right)\right],円{\text{is small}}\\{\text{otherwise}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {f} \left(x^{(j)}\right)\right],円{\text{is large}}\end{aligned}}}

i,j

{\displaystyle i,j} are indexes into a set of vectors

\operatorname {f} (\cdot )

{\displaystyle \operatorname {f} (\cdot )}function implemented by the twin network

\operatorname {\delta } (\cdot )

{\displaystyle \operatorname {\delta } (\cdot )}function implemented by the network joining outputs from the twin network

On a matrix form the previous is often approximated as a Mahalanobis distance for a linear space as^[6]

\operatorname {\delta } (\mathbf {x} ^{(i)},\mathbf {x} ^{(j)})\approx (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})^{T}\mathbf {M} (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})

{\displaystyle \operatorname {\delta } (\mathbf {x} ^{(i)},\mathbf {x} ^{(j)})\approx (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})^{T}\mathbf {M} (\mathbf {x} ^{(i)}-\mathbf {x} ^{(j)})}

This can be further subdivided in at least Unsupervised learning and Supervised learning.

Learned metrics, half-twin networks

[edit ]

This form also allows the twin network to be more of a half-twin, implementing a slightly different functions

{\begin{aligned}{\text{if}},円i=j,円{\text{then}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {g} \left(x^{(j)}\right)\right],円{\text{is small}}\\{\text{otherwise}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {g} \left(x^{(j)}\right)\right],円{\text{is large}}\end{aligned}}

{\displaystyle {\begin{aligned}{\text{if}},円i=j,円{\text{then}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {g} \left(x^{(j)}\right)\right],円{\text{is small}}\\{\text{otherwise}}&,円\operatorname {\delta } \left[\operatorname {f} \left(x^{(i)}\right),,円\operatorname {g} \left(x^{(j)}\right)\right],円{\text{is large}}\end{aligned}}}

i,j

{\displaystyle i,j} are indexes into a set of vectors

\operatorname {f} (\cdot ),\operatorname {g} (\cdot )

{\displaystyle \operatorname {f} (\cdot ),\operatorname {g} (\cdot )}function implemented by the half-twin network

\operatorname {\delta } (\cdot )

{\displaystyle \operatorname {\delta } (\cdot )}function implemented by the network joining outputs from the twin network

Twin networks for object tracking

[edit ]

Twin networks have been used in object tracking because of its unique two tandem inputs and similarity measurement. In object tracking, one input of the twin network is user pre-selected exemplar image, the other input is a larger search image. The twin network's job is to locate the exemplar inside of the search image. By measuring the similarity between exemplar and each part of the search image, a map of similarity score can be given by the twin network. Furthermore, using a Fully Convolutional Network, the process of computing each sector's similarity score can be replaced with only one cross correlation layer.^[7]

After being first introduced in 2016, Twin fully convolutional network has been used in many High-performance Real-time Object Tracking Neural Networks. Like CFnet,^[8] StructSiam,^[9] SiamFC-tri,^[10] DSiam,^[11] SA-Siam,^[12] SiamRPN,^[13] DaSiamRPN,^[14] Cascaded SiamRPN,^[15] SiamMask,^[16] SiamRPN++,^[17] Deeper and Wider SiamRPN.^[18]

References

[edit ]

^ Chicco, Davide (2020), "Siamese neural networks: an overview", Artificial Neural Networks, Methods in Molecular Biology, vol. 2190 (3rd ed.), New York City, New York, USA: Springer Protocols, Humana Press, pp. 73–94, doi:10.1007/978-1-0716-0826-5_3, ISBN 978-1-0716-0826-5, PMID 32804361, S2CID 221144012
^ Bromley, Jane; Guyon, Isabelle; LeCun, Yann; Säckinger, Eduard; Shah, Roopak (1994). "Signature verification using a "Siamese" time delay neural network" (PDF). Advances in Neural Information Processing Systems. 6: 737–744.
^ Chopra, S.; Hadsell, R.; LeCun, Y. (June 2005). "Learning a Similarity Metric Discriminatively, with Application to Face Verification". 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 1. pp. 539–546 vol. 1. doi:10.1109/CVPR.2005.202. ISBN 0-7695-2372-2. S2CID 5555257.
^ ^a ^b Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. (June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1701–1708. doi:10.1109/CVPR.2014.220. ISBN 978-1-4799-5118-5. S2CID 2814088.
^ Chatterjee, Moitreya; Luo, Yunan. "Similarity Learning with (or without) Convolutional Neural Network" (PDF). Retrieved 2018年12月07日.
^ Chandra, M.P. (1936). "On the generalized distance in statistics" (PDF). Proceedings of the National Institute of Sciences of India. 1. 2: 49–55.
^ Fully-Convolutional Siamese Networks for Object Tracking arXiv:1606.09549
^ "End-to-end representation learning for Correlation Filter based tracking".
^ "Structured Siamese Network for Real-Time Visual Tracking" (PDF).
^ "Triplet Loss in Siamese Network for Object Tracking" (PDF).
^ "Learning Dynamic Siamese Network for Visual Object Tracking" (PDF).
^ "A Twofold Siamese Network for Real-Time Object Tracking" (PDF).
^ "High Performance Visual Tracking with Siamese Region Proposal Network" (PDF).
^ Zhu, Zheng; Wang, Qiang; Li, Bo; Wu, Wei; Yan, Junjie; Hu, Weiming (2018). "Distractor-aware Siamese Networks for Visual Object Tracking". arXiv:1808.06048 [cs.CV].
^ Fan, Heng; Ling, Haibin (2018). "Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking". arXiv:1812.06148 [cs.CV].
^ Wang, Qiang; Zhang, Li; Bertinetto, Luca; Hu, Weiming; Torr, Philip H. S. (2018). "Fast Online Object Tracking and Segmentation: A Unifying Approach". arXiv:1812.05050 [cs.CV].
^ Li, Bo; Wu, Wei; Wang, Qiang; Zhang, Fangyi; Xing, Junliang; Yan, Junjie (2018). "SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks". arXiv:1812.11703 [cs.CV].
^ Zhang, Zhipeng; Peng, Houwen (2019). "Deeper and Wider Siamese Networks for Real-Time Visual Tracking". arXiv:1901.01660 [cs.CV].

Retrieved from "https://en.wikipedia.org/w/index.php?title=Siamese_neural_network&oldid=1317085776"

Siamese neural network

Learning

Predefined metrics, Euclidean distance metric

Learned metrics, nonlinear distance metric

Learned metrics, half-twin networks

Twin networks for object tracking

See also

Further reading

References