Introduction :
Multivariate analysis is statistical methods that allow
the simultaneous investigation of more than two variables. Two basic groups of
multivariate techniques are classified. dependable methods and interdependable
methods.
Analysis of dependence :
It is a collective
term to describe any multivariate statistical technique that attempts to
explain or predict the dependent variable on the basis of two or more
independent variables.
Influence of measurement
scales :
The nature of the
measurement scale will determine which multivariate analysis requires consideration of the types
of measures and for both independent and dependent sets of variables.
Analysis of
interdependence :
It is a collective
term to describe any multivariate statistical technique that attempts to give
meaning to a set of variables or seeks to group things together.
Applications of
Multivariate analysis :
The multiple
regression equation is
y=a+b1 X1+ b2 X2+ b3 X3+............+bn Xn
The co-efficient b1, b2,......... and so are called
co-efficient of partial regression because the independent variables are
usually correlated with other independent Variables. Thus the Correlation
between Y and X1 with the correlation that X1 and X2
have in common with Y held constant is the partial correlation.
The co-efficient of
multiple determination or the index of determination indicates the percentage
of variation in Y explained by the variation in the independent variables. The
amount of variations explained by two independent variables in the same equation
usually explains more variation in Y than either one explains separately.
The dependent
variable is required in multiple regressions, as it is in bivariate regression.
Interval scaling is also a requirement for the independent variable how ever A
dummy variable is a variable that has two (or more) distinct levels, Which are
coded 0 and 1. There are several other assumptions for multiple regressions
(and other multivariate techniques)
Multiple Correlation
It is used for
finding the influence of multivariable.
Example
If you want find
intelligence and self confidence and Academic Achievement of studies. It is the
joint effect of independent variables and dependent variable.
Multiple Regression
It depends on one
dependents variable and independent variable and predict the result.
Example
Academic
Achievement of intelligence and self esteem of independent variable. We can
predict the value of independent variable and dependent variable.
IMPORTANT
METHODS OF FACTOR ANALYSIS
There
are several methods of factor analysis, but they do not necessarily give same
results. As such factor analysis is not a single unique method but a set of
techniques. Important methods of factor analysis are;
(i) the centriod method
(ii) the principal
components method
(iii) the maximum likelihood
method
Before we describe these different methods of factor
analysis, it seems appropriate that some basic terms relating to factor
analysis be well understood.
(i) Factor : A factor is an underlying dimension that account for several
observed variables. There can be one or more factor, depending upon and nature
of the study and the number of variables involved in it.
(ii) Factor -
loadings : Factor loading are those values which
explain how closely the variables are related to each one of the factors
discovered. They are also known as factor-variable correlations. In fact,
factor-loadings work as key to understanding what the factors mean. It is the
absolute size (rather than the signs, plus or minus) of the loadings that is
important in the interpretation of a factor.
(iii) Community
(h2): Community, symbolized as h2,
shows how much of each variable is accounted for by the underlying factor taken
together. A high value of community means that not much of the variable is left
over after whatever the factors represent is taken into consideration. It is
worked out in respect of each variable as under:
h2 of the ith variable = (ith factor loading
of factor A)2
+ (ith factor loading of factor B)2
+…..
(iv) Eigen value
(or latent root): When we take the sum of squared
values of factor loadings relating to a factor, then such sum is referred to as
Eigen value or latent root. Eigen value indicates the relative importance of
each factor in accounting for the particular set of variables being analysed.
(v) Total sum of
squares: When eigen values of all factors are
totaled, the resulting value is termed as the total sum of squares. This
values, when divided by the number of variables (involved in a study), results
in an index that shows how the particular solution accounts for what all the
variables taken together represent.
(vi) Rotation: Rotation, in the context of factor analysis, is something like
staining a microscope slide. Just as different stains on it reveal different
structures in the tissue, different rotations reveal different structures in
the data.
(vii) Factor
scores: Factor score represents the degree to which
each respondent gets high scores on the group of items that load high on each
factor. Factor scores can help explain what the factors mean. With such scores,
several other multivariate analyses can be performed.
We can now take up
the important methods of factor analysis.
(A) Centroid
Method of factor analysis
This method of
factor analysis, developed by L.L. Thurstone, was quite frequently used until
about 1950 before the advent of large capacity high speed computers. “The
centroid method tends to maximize the
sum of loadings, disregarding signs; it is the method which extracts the
largest sum of absolute loadings for each for each factor in turn.
(i) This method starts with the computation of
a matrix of correlations, R. where in unities are place in the diagonal spaces.
The product moment formula is used for working out the correlation co-efficient.
(ii) It is correlation matrix so-obtained happens
to be positive manifold (i.e. disregarding the diagonal elements each variable
has a large sum of positive correlations than of negative correlations) the
centriod method requires that the weights for all variables be +1.0. In other
words, the variables are not weighted; they are simply summed. But in case the
correlation matrix is not a positive manifold, then reflections must be made
before the first centriod factor is obtained.
(iii) The first centriod
factor is determined as under.
(a) The
sum of the co-efficients (including the diagonal unity) in each column of the
correlation matrix is worked out.
(b) Then the sum of these column sum (T) is
obtained.
(iii)
To obtain second centriod factor B),
one must first obtain a matrix of residual coefficients. For this purpose, the
loadings for the two variables one the first centriod factor are multiplied.
(B) Principal-components
method of factor analysis
Principal
components method (or simply P.C. method) of factor analysis, developed by H.
Hotelling, seeks to maximize the sum of loadings of each factor extracted in
turn. Accordingly PC factor explains more variance than would the loadings
obtained from any other method of factoring.
The aim of the
principal components method is the construction out of a given set of
variables, X’s(j=1,2,……..,k) of new variables (p1), called principal components
which are linear combinations of the X1.
The method is being applied mostly by using standardized
variables, i.e.
The are called loadings
and are worked out in such a way that the extracted principal components
satisfy two conditions (i) principal components are uncorrelated (orthogonal)
and (ii) the first principal component (p1) has the maximum
variance, the second principal
components (p2) has the next maximum variance and so on.
Following steps are usually involved in principal components method
(i) Estimates of are obtained with
which X’s are transformed into orthogonal variables i.e., the principal
components. A decision is also taken with regard to the question how many of
the components to retain into the analysis?
(ii) We then proceed with
the regression of Y on these principal components ie.,
(iii) From the aij and yij we
may find bij of the original model, transferring back from the p’s
into the standardized X’s
R-TYPE AND Q-TYPE FACTOR
ANALYSIS
Factor analysis may
be R-type factor analysis or it may be Q-type factor analysis. In R-type factor
analysis, high correlations occur when respondents who score high on variable 1
also score high on variable 2 and respondents who score low on variable 1 also
score low on variable 2. factor emerge when there are high correlations with in
group of variables.
Merits : The main merits of factor analysis can be
stated thus
(i) The technique of factor analysis is quite
useful when we want to condense and simplify the multivariate data.
(ii) The technique is helpful is pointing out
important and interesting, relationships among observed data that were there
all the time, but not easy to see from the data alone.
(iii) The technique can reveal the latent factors
(i.e. underlying factors not directly observed) that determine relationships
among several variables concerning a research study. For example, if people are
asked to rate different cold drinks (say, Limca, Nova-cola, Gold Spot and so
on) according to preference, a factor analysis may reveal some salient
characteristics of cold drinks that underline the relative preferences.
(iv) The technique may be used in the context of
empirical clustering of products, media or people i.e. for providing a
classification scheme when data scored on various rating scales have to be
grouped together
Limitation : One should also be aware of several limitations of factor analysis.
Important ones are as follow.
(i) Factor analysis, like all multivariate
techniques, involves laborious computations involving heavy cost burden. With
computer facility available these days, there is no doubt that factor analysis
has become relatively faster and easier, but the cost factor continues to be
the same i.e. large factor analyses are still bound to be quite expensive.
(ii) The results of a single factor analysis are
considered generally less reliable and dependable for very often a factor
analysis starts with a set of imperfect data. “The factors are nothing but
blurred averages, difficult to be identified”. To overcome this difficulty, it
has been realized that analysis should at least be done twice. If we get more
or less similar results from all round of analyses, our confidence concerning
such results increases.
(iii) Factor-analysis is a complicated decision
tool that can be used only when one has thorough knowledge and enough
experience of handling this tool. Even
then, at times it may not work well and may even disappoint the user.
Cluster Analysis
Cluster analysis
consists of methods of classifying variables into clusters. Technically, a
cluster consists of variables that correlate highly with one another and have
comparatively low correlations with variables in other clusters. The basic objective
of cluster analysis is to determine how many mutually and exhaustive groups or
clusters, based on the similarities of profiles among entities, really exist in
the population and then to state the composition of such groups. Various groups
to be determined in cluster analysis are not predefined as happens to be the
case in discriminate analysis.
In general cluster analysis contains the following steps
to be performed
(i) First of all, if some variables have a
negative sum of correlations in the correlation matrix, one must reflect
variables so as to obtain a maximum sum of positive correlation for the matrix
as a whole.
(ii) The second step consist in finding out the
highest correlation in the correlation matrix and the two variables involved
(i.e. having the highest correlation in the matrix) form the nucleus of the first
cluster.
(iii) Then one looks for those variables that
correlate highly with the said two variables and includes them in the cluster.
This is how the first cluster is formed.
(iv) To obtain the nucleus of the second cluster,
we find two variables that correlate highly but have low correlations with
members of the first cluster. Variables that correlate highly with the said two
variables are then found. Such variables along the said two variables thus
constitute the second cluster.
(v) One proceeds on similar
lines to search for a third cluster and so on.
References:
i)
Hair Joseph F. et. al (1996),
Multivariate Data Analysis (5th Edu).
New Jersey – Prentice – Hall international, Inc.
New Jersey – Prentice – Hall international, Inc.
ii)
Cohen, Louis et. al (2008),
Research Methods in Education London : Rout Ledge.
iii)
Winkler / Hays Stratifies
Probability, Inference And Decision Second Edition, (1991).
No comments:
Post a Comment