# Kernel trick

### From Wikipedia, the free encyclopedia

In machine learning, the **kernel trick** is a method for using a linear classifier algorithm to solve a non-linear problem by mapping the original non-linear observations into a higher-dimensional space, where the linear classifier is subsequently used; this makes a linear classification in the new space equivalent to non-linear classification in the original space.

This is done using Mercer's theorem, which states that any continuous, symmetric, positive semi-definite kernel function *K*(*x*, *y*) can be expressed as a dot product in a high-dimensional space.

More specifically, if the arguments to the kernel are in a measurable space *X*, and if the kernel is positive semi-definite — i.e.

for any finite subset {*x*_{1}, ..., *x*_{n}} of *X* and any real numbers {*c*_{1}, ..., *c*_{n}} — then there exists a function φ(*x*) whose range is in an inner product space of possibly high dimension, such that

The kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. Thus, a linear algorithm can easily be transformed into a non-linear algorithm. This non-linear algorithm is equivalent to the linear algorithm operating in the range space of φ. However, because kernels are used, the φ function is never explicitly computed. This is desirable, because the high-dimensional space may be infinite-dimensional (as is the case when the kernel is a Gaussian).

The kernel trick was first published by Aizerman et al.^{[1]}

It has been applied to several kinds of algorithm in machine learning and statistics, including:

- Perceptrons
- Support vector machines
- Principal components analysis
- Canonical correlation analysis
- Fisher's linear discriminant analysis
- Clustering

The origin of the term *kernel trick* is not known.^{[citation needed]}

## [edit] References

**^**M. Aizerman, E. Braverman, and L. Rozonoer (1964). "Theoretical foundations of the potential function method in pattern recognition learning".*Automation and Remote Control***25**: 821–837.