Minkowski distance
The Minkowski distance or Minkowski metric is a family of distance functions on a real coordinate space, defined by taking the th power along each coordinate, given a positive parameter . That is, it is the distance function associated to a p-norm. It is a generalization of both the Euclidean distance () and the Manhattan distance (). It is named after the mathematician Hermann Minkowski.
Definition
The Minkowski distance of order (where ) between two points is defined as: This is the metric associated to the norm
The Minkowski distance is a metric as a result of the Minkowski inequality,[1] Geometrically, this is a norm because the unit balls are bounded and open in the standard topology, contain the origin, and are convex and balanced (invariant under ).
When the same formula does not define a metric because it fails to satisfy the triangle inequality. For example, distance between and is but the point is at a distance from both of these points.
However, a metric can be obtained for these values by removing the exponent of : The resulting metric is also an F-norm, but does not come from a norm: the unit balls are not convex.
Minkowski distance is often used with or , which correspond to the Manhattan distance and the Euclidean distance, respectively.[2] In the limiting case of approaching infinity, we obtain the Chebyshev distance:
The limit as of the metric is sometimes called the 0-distance or Hamming distance; it counts the number of coordinates that differ from the coordinates . This distance function is a metric, but like , it does not arise from a norm.
The Minkowski distance can be viewed as a multiple of the power mean of the component-wise differences between and
The following figure shows unit circles (the level set of the distance function where all points are at the unit distance from the center) with various values of :
Applications
The Minkowski distance is useful in the field of machine learning.[3] Many popular machine learning algorithms use specific distance metrics such as the aforementioned to compare the similarity of two data points. Depending on the nature of the data being analyzed, various metrics can be used. The Minkowski metric is most useful for numerical datasets where one wants to determine the similarity of size between multiple datapoint vectors.
See also
- Generalized mean – N-th root of the arithmetic mean of the given numbers raised to the power n
- Lp space – Function spaces generalizing finite-dimensional p norm spaces
- Norm (mathematics) – Length in a vector space
References
- ^ Şuhubi, Erdoğan S. (2003), "Chapter V: Metric Spaces", Functional Analysis, Springer Netherlands, pp. 261–356, doi:10.1007/978-94-017-0141-9_5, ISBN 9789401701419
- ^ Zezula, Pavel; Amato, Giuseppe; Dohnal, Vlastislav; Batko, Michal (2006), "Chapter 1, Foundations of Metric Space Searching, Section 3.1, Minkowski Distances", Similarity Search: The Metric Space Approach, Advances in Database Systems, Springer, p. 10, doi:10.1007/0-387-29151-2, ISBN 9780387291512
- ^ Duda; Duda, Richard O.; Hart, Peter E. (2006). Pattern Classification. Wiley-India. p. 31. ISBN 978-81-265-1116-7.