Search This Blog

Sunday, October 11, 2015

MULTIVARIATE ANALYSIS


Introduction :
            Multivariate analysis is statistical methods that allow the simultaneous investigation of more than two variables. Two basic groups of multivariate techniques are classified. dependable methods and interdependable methods.

Analysis of dependence :
            It is a collective term to describe any multivariate statistical technique that attempts to explain or predict the dependent variable on the basis of two or more independent variables.
Influence of measurement scales :
            The nature of the measurement scale will determine which multivariate  analysis requires consideration of the types of measures and for both independent and dependent sets of variables.
Analysis of interdependence :
            It is a collective term to describe any multivariate statistical technique that attempts to give meaning to a set of variables or seeks to group things together.


Applications of Multivariate analysis :
            The multiple regression equation is
                        y=a+b­1 X1+ b2 X2+ b3 X3+............+bn Xn
            The co-efficient b1, b2,......... and so are called co-efficient of partial regression because the independent variables are usually correlated with other independent Variables. Thus the Correlation between Y and X1 with the correlation that X1 and X2 have in common with Y held constant is the partial correlation.
            The co-efficient of multiple determination or the index of determination indicates the percentage of variation in Y explained by the variation in the independent variables. The amount of variations explained by two independent variables in the same equation usually explains more variation in Y than either one explains separately.
            The dependent variable is required in multiple regressions, as it is in bivariate regression. Interval scaling is also a requirement for the independent variable how ever A dummy variable is a variable that has two (or more) distinct levels, Which are coded 0 and 1. There are several other assumptions for multiple regressions (and other multivariate techniques)

Multiple Correlation
            It is used for finding the influence of multivariable.
Example
            If you want find intelligence and self confidence and Academic Achievement of studies. It is the joint effect of independent variables and dependent variable.
Multiple Regression
            It depends on one dependents variable and independent variable and predict the result.
Example
            Academic Achievement of intelligence and self esteem of independent variable. We can predict the value of independent variable and dependent variable.


IMPORTANT METHODS OF FACTOR ANALYSIS
            There are several methods of factor analysis, but they do not necessarily give same results. As such factor analysis is not a single unique method but a set of techniques. Important methods of factor analysis are;
(i)      the centriod method
(ii)     the principal components method
(iii)    the maximum likelihood method
Before we describe these different methods of factor analysis, it seems appropriate that some basic terms relating to factor analysis be well understood.
(i) Factor : A factor is an underlying dimension that account for several observed variables. There can be one or more factor, depending upon and nature of the study and the number of variables involved in it.
(ii) Factor - loadings : Factor loading are those values which explain how closely the variables are related to each one of the factors discovered. They are also known as factor-variable correlations. In fact, factor-loadings work as key to understanding what the factors mean. It is the absolute size (rather than the signs, plus or minus) of the loadings that is important in the interpretation of a factor.
(iii) Community (h2): Community, symbolized as h2, shows how much of each variable is accounted for by the underlying factor taken together. A high value of community means that not much of the variable is left over after whatever the factors represent is taken into consideration. It is worked out in respect of each variable as under:
h2 of the ith variable = (ith factor loading of factor A)2
+ (ith factor loading of factor B)2 +…..
(iv) Eigen value (or latent root): When we take the sum of squared values of factor loadings relating to a factor, then such sum is referred to as Eigen value or latent root. Eigen value indicates the relative importance of each factor in accounting for the particular set of variables being analysed.
(v) Total sum of squares: When eigen values of all factors are totaled, the resulting value is termed as the total sum of squares. This values, when divided by the number of variables (involved in a study), results in an index that shows how the particular solution accounts for what all the variables taken together represent.
(vi) Rotation: Rotation, in the context of factor analysis, is something like staining a microscope slide. Just as different stains on it reveal different structures in the tissue, different rotations reveal different structures in the data.
(vii) Factor scores: Factor score represents the degree to which each respondent gets high scores on the group of items that load high on each factor. Factor scores can help explain what the factors mean. With such scores, several other multivariate analyses can be performed.
            We can now take up the important methods of factor analysis.
(A) Centroid Method of factor analysis
            This method of factor analysis, developed by L.L. Thurstone, was quite frequently used until about 1950 before the advent of large capacity high speed computers. “The centroid method tends to maximize  the sum of loadings, disregarding signs; it is the method which extracts the largest sum of absolute loadings for each for each factor in turn.
(i)      This method starts with the computation of a matrix of correlations, R. where in unities are place in the diagonal spaces. The product moment formula is used for working out the correlation co-efficient.
(ii)     It is correlation matrix so-obtained happens to be positive manifold (i.e. disregarding the diagonal elements each variable has a large sum of positive correlations than of negative correlations) the centriod method requires that the weights for all variables be +1.0. In other words, the variables are not weighted; they are simply summed. But in case the correlation matrix is not a positive manifold, then reflections must be made before the first centriod factor is obtained.
(iii)    The first centriod factor is determined as under.
         (a)     The sum of the co-efficients (including the diagonal unity) in each column of the correlation matrix is worked out.
         (b)     Then the sum of these column sum (T) is obtained.
(iii)    To obtain second centriod factor B), one must first obtain a matrix of residual coefficients. For this purpose, the loadings for the two variables one the first centriod factor are multiplied.  
(B) Principal-components method of factor analysis
            Principal components method (or simply P.C. method) of factor analysis, developed by H. Hotelling, seeks to maximize the sum of loadings of each factor extracted in turn. Accordingly PC factor explains more variance than would the loadings obtained from any other method of factoring.
            The aim of the principal components method is the construction out of a given set of variables, X’s(j=1,2,……..,k) of new variables (p1), called principal components which are linear combinations of the X1.
The method is being applied mostly by using standardized variables, i.e.
The  are called loadings and are worked out in such a way that the extracted principal components satisfy two conditions (i) principal components are uncorrelated (orthogonal) and (ii) the first principal component (p1) has the maximum variance, the second  principal components (p2) has the next maximum variance and so on. 
Following steps are usually involved in principal components method
(i)      Estimates of  are obtained with which X’s are transformed into orthogonal variables i.e., the principal components. A decision is also taken with regard to the question how many of the components to retain into the analysis?
(ii)     We then proceed with the regression of Y on these principal components ie.,
        
(iii)    From the aij and yij we may find bij of the original model, transferring back from the p’s into the standardized X’s
R-TYPE AND Q-TYPE FACTOR ANALYSIS
            Factor analysis may be R-type factor analysis or it may be Q-type factor analysis. In R-type factor analysis, high correlations occur when respondents who score high on variable 1 also score high on variable 2 and respondents who score low on variable 1 also score low on variable 2. factor emerge when there are high correlations with in group of variables.
Merits : The main merits of factor analysis can be stated thus
(i)      The technique of factor analysis is quite useful when we want to condense and simplify the multivariate data.
(ii)     The technique is helpful is pointing out important and interesting, relationships among observed data that were there all the time, but not easy to see from the data alone.
(iii)    The technique can reveal the latent factors (i.e. underlying factors not directly observed) that determine relationships among several variables concerning a research study. For example, if people are asked to rate different cold drinks (say, Limca, Nova-cola, Gold Spot and so on) according to preference, a factor analysis may reveal some salient characteristics of cold drinks that underline the relative preferences.
(iv)    The technique may be used in the context of empirical clustering of products, media or people i.e. for providing a classification scheme when data scored on various rating scales have to be grouped together
Limitation : One should also be aware of several limitations of factor analysis. Important ones are as follow.
(i)      Factor analysis, like all multivariate techniques, involves laborious computations involving heavy cost burden. With computer facility available these days, there is no doubt that factor analysis has become relatively faster and easier, but the cost factor continues to be the same i.e. large factor analyses are still bound to be quite expensive.
(ii)     The results of a single factor analysis are considered generally less reliable and dependable for very often a factor analysis starts with a set of imperfect data. “The factors are nothing but blurred averages, difficult to be identified”. To overcome this difficulty, it has been realized that analysis should at least be done twice. If we get more or less similar results from all round of analyses, our confidence concerning such results increases.
(iii)    Factor-analysis is a complicated decision tool that can be used only when one has thorough knowledge and enough experience of handling this tool. Even  then, at times it may not work well and may even disappoint the user.
Cluster Analysis
            Cluster analysis consists of methods of classifying variables into clusters. Technically, a cluster consists of variables that correlate highly with one another and have comparatively low correlations with variables in other clusters. The basic objective of cluster analysis is to determine how many mutually and exhaustive groups or clusters, based on the similarities of profiles among entities, really exist in the population and then to state the composition of such groups. Various groups to be determined in cluster analysis are not predefined as happens to be the case in discriminate analysis.   
In general cluster analysis contains the following steps to be performed
(i)      First of all, if some variables have a negative sum of correlations in the correlation matrix, one must reflect variables so as to obtain a maximum sum of positive correlation for the matrix as a whole.
(ii)     The second step consist in finding out the highest correlation in the correlation matrix and the two variables involved (i.e. having the highest correlation in the matrix) form the nucleus of the first cluster.
(iii)    Then one looks for those variables that correlate highly with the said two variables and includes them in the cluster. This is how the first cluster is formed.
(iv)    To obtain the nucleus of the second cluster, we find two variables that correlate highly but have low correlations with members of the first cluster. Variables that correlate highly with the said two variables are then found. Such variables along the said two variables thus constitute the second cluster.
(v)     One proceeds on similar lines to search for a third cluster and so on.  
References:
i)                    Hair Joseph F. et. al (1996), Multivariate Data Analysis (5th Edu).
New Jersey – Prentice – Hall international, Inc.
ii)                  Cohen, Louis et. al (2008), Research Methods in Education London : Rout Ledge.
iii)                Winkler / Hays Stratifies Probability, Inference And Decision Second Edition, (1991).





No comments:

Post a Comment