Multivariate statistics

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Multivariate statistics or multivariate analysis in statistics describes a collection of procedures which involve observation and analysis of more than one statistical variable at a time. The word multivariate is defined as: "having or involving a number of independent mathematical or statistical variables" [1]. Sometimes a distinction is made between univariate (e.g., ANOVA, t-tests) and multivariate statistics. The special case of two variables is called bivariate.

There are many different models, each with its own type of analysis:

  1. Clustering systems assign objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.
  2. Hotelling's T-square is a generalization of Student's t statistic that is used in multivariate hypothesis testing.
  3. Multivariate analysis of variance (MANOVA) methods extend analysis of variance methods to cover cases where there is more than one dependent variable and where the dependent variables cannot simply be combined.
  4. Discriminant function or canonical variate analysis attempt to establish whether a set of variables can be used to distinguish between two or more groups.
  5. Regression analysis attempts to determine a linear formula that can describe how some variables respond to changes in others. Regression analyses are based on forms of the general linear model.
  6. Principal components analysis attempts to determine a smaller set of synthetic variables that could explain the original set.
  7. Redundancy analysis a canonical (constrained) variant of Principal components analysis; the synthetic variables are a linear combination of a set of explaining variables.
  8. Correspondence analysis or Reciprocal Averaging also attempts to determine a smaller set of synthetic variables that could explain the original set.
  9. Linear discriminant analysis (LDA) computes a linear predictor from two sets of normally distributed data to allow for classification of new observations.
  10. Logistic regression allows regression analysis to estimate and test the influence of covariates on a binary response.
  11. Artificial neural networks extend regression methods to non-linear multivariate models.
  12. Multidimensional scaling covers various algorithms to determine a set of synthetic variables that best represent the pairwise distances between records. The original method is principal coordinate analysis.
  13. Canonical correlation analysis tries to establish whether or not there are linear relationships among two sets of variables (covariates and response).
  14. Recursive partitioning creates a decision tree that strives to correctly classify members of the population based on a dichotomous dependent variable

[edit] See also

[edit] References

  1. ^ Multivariate - Definition from the Merriam-Webster Online Dictionary


[edit] External links

Personal tools