Learning vector quantization

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems. LVQ can be understood as a special case of an artificial neural network, more precisely, it applies a winner-take-all Hebbian learning-based approach. It is a precursor to self-organizing maps (SOM) and related to neural gas and the k-nearest neighbor algorithm (k-NN). LVQ was invented by Teuvo Kohonen.^[1]

Definition

[edit ]

An LVQ system is represented by prototypes $W=(w(i),...,w(n))$ {\displaystyle W=(w(i),...,w(n))} which are defined in the feature space of observed data. In winner-take-all training algorithms one determines, for each data point, the prototype which is closest to the input according to a given distance measure. The position of this so-called winner prototype is then adapted, i.e. the winner is moved closer if it correctly classifies the data point or moved away if it classifies the data point incorrectly.

An advantage of LVQ is that it creates prototypes that are easy to interpret for experts in the respective application domain.^[2] LVQ systems can be applied to multi-class classification problems in a natural way.

A key issue in LVQ is the choice of an appropriate measure of distance or similarity for training and classification. Recently, techniques have been developed which adapt a parameterized distance measure in the course of training the system, see e.g. (Schneider, Biehl, and Hammer, 2009)^[3] and references therein.

LVQ can be a source of great help in classifying text documents.^{[citation needed ]}

Algorithm

[edit ]

The algorithms are presented as in.^[4]

Set up:

Let the data be denoted by $x_{i}\in \mathbb {R} ^{D}$ {\displaystyle x_{i}\in \mathbb {R} ^{D}}, and their corresponding labels by $y_{i}\in \{1,2,\dots ,C\}$ {\displaystyle y_{i}\in \{1,2,\dots ,C\}}.
The complete dataset is $\{(x_{i},y_{i})\}_{i=1}^{N}$ {\displaystyle \{(x_{i},y_{i})\}_{i=1}^{N}}.
The set of code vectors is $w_{j}\in \mathbb {R} ^{D}$ {\displaystyle w_{j}\in \mathbb {R} ^{D}}.
The learning rate at iteration step $t$ {\displaystyle t} is denoted by $\alpha _{t}$ {\displaystyle \alpha _{t}}.
The hyperparameters $w$ {\displaystyle w} and $\epsilon$ {\displaystyle \epsilon } are used by LVQ2 and LVQ3. The original paper suggests $\epsilon \in [0.1,0.5]$ {\displaystyle \epsilon \in [0.1,0.5]} and $w\in [0.2,0.3]$ {\displaystyle w\in [0.2,0.3]}.

LVQ1

[edit ]

Initialize several code vectors per label. Iterate until convergence criteria is reached.

Sample a datum $x_{i}$ {\displaystyle x_{i}}, and find out the code vector $w_{j}$ {\displaystyle w_{j}}, such that $x_{i}$ {\displaystyle x_{i}} falls within the Voronoi cell of $w_{j}$ {\displaystyle w_{j}}.
If its label $y_{i}$ {\displaystyle y_{i}} is the same as that of $w_{j}$ {\displaystyle w_{j}}, then $w_{j}\leftarrow w_{j}+\alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}+\alpha _{t}(x_{i}-w_{j})}, otherwise, $w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})}.

LVQ2

[edit ]

LVQ2 is the same as LVQ3, but with this sentence removed: "If $w_{j}$ {\displaystyle w_{j}} and $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have the same class, then $w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})} and $w_{k}\leftarrow w_{k}+\alpha _{t}(x_{i}-w_{k})$ {\displaystyle w_{k}\leftarrow w_{k}+\alpha _{t}(x_{i}-w_{k})}.". If $w_{j}$ {\displaystyle w_{j}} and $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have the same class, then nothing happens.

LVQ3

[edit ]

Some Apollonian circles. Every blue circle intersects every red circle at a right angle. Every red circle passes through the two points C, D, and every blue circle separates the two points.

Initialize several code vectors per label. Iterate until convergence criteria is reached.

Sample a datum $x_{i}$ {\displaystyle x_{i}}, and find out two code vectors $w_{j},w_{k}$ {\displaystyle w_{j},w_{k}} closest to it.
Let $d_{j}:=\|x_{i}-w_{j}\|,d_{k}:=\|x_{i}-w_{k}\|$ {\displaystyle d_{j}:=\|x_{i}-w_{j}\|,d_{k}:=\|x_{i}-w_{k}\|}.
If $\min \left({\frac {d_{j}}{d_{k}}},{\frac {d_{k}}{d_{j}}}\right)>s$ {\displaystyle \min \left({\frac {d_{j}}{d_{k}}},{\frac {d_{k}}{d_{j}}}\right)>s}, where $s={\frac {1-w}{1+w}}$ {\displaystyle s={\frac {1-w}{1+w}}}, then
- If $w_{j}$ {\displaystyle w_{j}} and $x_{i}$ {\displaystyle x_{i}} have the same class, and $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have different classes, then $w_{j}\leftarrow w_{j}+\alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}+\alpha _{t}(x_{i}-w_{j})} and $w_{k}\leftarrow w_{k}-\alpha _{t}(x_{i}-w_{k})$ {\displaystyle w_{k}\leftarrow w_{k}-\alpha _{t}(x_{i}-w_{k})}.
- If $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have the same class, and $w_{j}$ {\displaystyle w_{j}} and $x_{i}$ {\displaystyle x_{i}} have different classes, then $w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}-\alpha _{t}(x_{i}-w_{j})} and $w_{k}\leftarrow w_{k}+\alpha _{t}(x_{i}-w_{k})$ {\displaystyle w_{k}\leftarrow w_{k}+\alpha _{t}(x_{i}-w_{k})}.
- If $w_{j}$ {\displaystyle w_{j}} and $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have the same class, then $w_{j}\leftarrow w_{j}-\epsilon \alpha _{t}(x_{i}-w_{j})$ {\displaystyle w_{j}\leftarrow w_{j}-\epsilon \alpha _{t}(x_{i}-w_{j})} and $w_{k}\leftarrow w_{k}+\epsilon \alpha _{t}(x_{i}-w_{k})$ {\displaystyle w_{k}\leftarrow w_{k}+\epsilon \alpha _{t}(x_{i}-w_{k})}.
- If $w_{k}$ {\displaystyle w_{k}} and $x_{i}$ {\displaystyle x_{i}} have different classes, and $w_{j}$ {\displaystyle w_{j}} and $x_{i}$ {\displaystyle x_{i}} have different classes, then the original paper simply does not explain what happens in this case, but presumably nothing happens in this case.
Otherwise, skip.

Note that condition $\min \left({\frac {d_{j}}{d_{k}}},{\frac {d_{k}}{d_{j}}}\right)>s$ {\displaystyle \min \left({\frac {d_{j}}{d_{k}}},{\frac {d_{k}}{d_{j}}}\right)>s}, where $s={\frac {1-w}{1+w}}$ {\displaystyle s={\frac {1-w}{1+w}}}, precisely means that the point $x_{i}$ {\displaystyle x_{i}} falls between two Apollonian spheres.

References

[edit ]

^ T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.
^ T. Kohonen (1995), "Learning vector quantization", in M.A. Arbib (ed.), The Handbook of Brain Theory and Neural Networks, Cambridge, MA: MIT Press, pp. 537–540
^ P. Schneider; B. Hammer; M. Biehl (2009). "Adaptive Relevance Matrices in Learning Vector Quantization". Neural Computation. 21 (10): 3532–3561. CiteSeerX 10.1.1.216.1183 . doi:10.1162/neco.2009.10-08-892. PMID 19635012. S2CID 17306078.
^ Kohonen, Teuvo (2001), "Learning Vector Quantization" , Self-Organizing Maps, vol. 30, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 245–261, doi:10.1007/978-3-642-56927-2_6, ISBN 978-3-540-67921-9

External links

[edit ]

lvq_pak official release (1996) by Kohonen and his team

Retrieved from "https://en.wikipedia.org/w/index.php?title=Learning_vector_quantization&oldid=1310477019"

Definition

Algorithm

LVQ1

LVQ2

LVQ3

References

Further reading

External links