The least-squares estimators are the fitted values, y ^ = X β ^ = X ( X T X) − 1 X T y = X C − 1 X T y = P y. P is a projection matrix. r The covariance matrix of ^ is Cov( 0^) = ˙2(XX) 1 3. X = [5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix Recall that H = [h ij]n i;j=1 and h ii = X i(X T X) 1XT i. I The diagonal elements h iiare calledleverages. {\displaystyle \mathbf {X} } A A In particular, U is a set of eigenvectors for XXT, and V is a set of eigenvectors for XTX.The non-zero singular values of X are the square roots of the eigenvalues of both XXT and XTX. The hat matrix is a matrix used in regression analysis and analysis of variance.It is defined as the matrix that converts values from the observed variable into estimations obtained with the least squares method. X By the definition of eigenvectors and since A is an idempotent, A x = λ x ⟹ A 2 x = λ A x ⟹ A x = λ A x = λ 2 x. q beta hat is a scalar, k transpose y is a scalar. {\displaystyle A} b Now, we can use the SVD of X for unveiling the properties of the hat matrix obtained, when performing {\displaystyle \mathbf {r} } A {\displaystyle (\mathbf {H} )} This column should be treated exactly the same as any other column in the X matrix. We call this the \hat matrix" because is turns Y’s into Y^’s. A Theorem: (Solution) Let A 2 IRm£n; B 2 IRm and suppose that AA+b = b. (Similarly, the effective degrees of freedom of a spline model is estimated by the trace of the projection matrix, S: Y_hat = SY.) H B I Moreover, the element in the ith row and jth column of 1 − ( 2. . {\displaystyle \mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {b} }, Suppose that we wish to estimate a linear model using linear least squares. is the identity matrix. {\displaystyle A} Suppose that the covariance matrix of the errors is Ψ. ] {\displaystyle (\mathbf {P} )} Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. The matrix criterion is from the previous theorem. Additional information of the samples is available in the form of Y (also as above). and again it may be seen that } A [ without explicitly forming the matrix Prove that if A is idempotent, then det(A) is equal to either 0 or 1. In statistics, the projection matrix The matrix Z0Zis symmetric, and so therefore is (Z0Z) 1. T , is It describes the influence each response value has on each fitted value. Section 3 formally examines two A Hat Matrix and Leverages Basic idea: use the hat matrix to identify outliers in X. Hat Matrix Properties • The hat matrix is symmetric • The hat matrix is idempotent, i.e. HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(A) = trace(A), the number of non-zero eigenvalues of A. Residuals The residuals, … The projection matrix corresponding to a linear model is symmetric and idempotent, that is, T A related matrix is the hat matrix which makes yˆ, the predicted y out of y. The formula for the vector of residuals P H = P 3. T However, the points farther away at the extreme of … ) , this reduces to:[3], From the figure, it is clear that the closest point from the vector X Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. and {\displaystyle \mathbf {M} \equiv \left(\mathbf {I} -\mathbf {P} \right)} , and is one where we can draw a line orthogonal to the column space of onto 1 By properties of a projection matrix, it has p = rank(X) eigenvalues equal to 1, and all other eigenvalues are equal to 0. {\displaystyle \mathbf {y} } It describe is also named hat matrix as it "puts a hat on is just {\displaystyle \mathbf {I} } = ,[1] sometimes also called the influence matrix[2] or hat matrix For the case of linear models with independent and identically distributed errors in which It is has the following properties: idempotent, meaning P*P = P. symmetric. Show that H1=1 for the multiple linear regression case (p-1>1). Just note that yˆ = y −e = [I −M]y = Hy (31) where H = X(X0X)−1X0 (32) Greene calls this matrix P, but he is alone. {\displaystyle \mathbf {Ax} } . {\displaystyle \mathbf {A} } X Since our model will usually contain a constant term, one of the columns in the X matrix will contain only ones. P So λ 2 = λ and hence λ ∈ { 0, 1 }. M 1 GDF is thus defined to be the sum of the sensitivity of each fitted value, Y_hat i, to perturbations in its corresponding output, Y i. ) These estimates are normal if Y is normal. Trace of a matrix is equal to the sum of its characteristic values, thus tr(P) = … T I The variable Y is generally referred to as the response variable. As you can see, the two x values furthest away from the mean have the largest leverages (0.176 and 0.163), while the x value closest to the mean has a smaller leverage (0.048). X The minimum value of hii is 1/ n for a model with a constant term. Then any vector of the form x = A+b+(I ¡A+A)y where y 2 IRn is arbitrary (4) is a solution of Ax = b: (5) , which might be too large to fit into computer memory. y , or Σ Exercise problem/solution in Linear Algebra. P X ≡ , the projection matrix, which maps A , the projection matrix can be used to define the effective degrees of freedom of the model. Recall that M = I − P where P is the projection onto linear space spanned by columns of matrix X. In statistics, the projection matrix $${\displaystyle (\mathbf {P} )}$$, sometimes also called the influence matrix or hat matrix $${\displaystyle (\mathbf {H} )}$$, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values).   and the vector of fitted values by { (A+B)T=AT+BT, the transpose of a sum is the sum of transposes. . , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). T A The residual vector is given by e = (In−H)y with the variance-covariance matrix V = (In−H)σ2, where Inis the identity matrix of order n. 1 Hat Matrix 1.1 From Observed to Fitted Values The OLS estimator was found to be given by the (p 1) vector, b= (XT X) 1XT y: The predicted values ybcan then be written as, by= X b= X(XT X) 1XT y =: Hy; where H := X(XT X) 1XT is an n nmatrix, which \puts the hat … X In the classical application As �GIE/T_�G�,�T����:�V��*S� !�a�(�dN$I[��.���$t���M�QXV�����(��@�KsS��˓eZFrl�Q ~�� =Ԗ�� 0G����ΐ*��ߏ�n��]��7ೌ��`G��_���&D. H = X ( XTX) –1XT. x A is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. y ) {\displaystyle H^{2}=H\cdot H=H} (The term "hat ma-trix" is due to John W. Tukey, who introduced us to the technique about ten years ago.) {\displaystyle \mathbf {b} } = 2 x {\displaystyle \mathbf {x} } X We prove if A^t}A=A, then A is a symmetric idempotent matrix. ( {\displaystyle \mathbf {b} } Then, we can take the first derivative of this object function in matrix form. y X Proof: The subspace inclusion criterion follows essentially from the deflnition of the range of a matrix. 2 . Therefore, when performing linear regression in the matrix form, if \( { \hat{\mathbf{Y}} } \) ( positive semi-definite. Proof: 1. ^ observations which have a large effect on the results of a regression. The matrix X is called the design matrix. {\displaystyle \mathbf {r} } P denoted X, with X as above. The aim of regression analysis is to explain Y in terms of X througha functional relationship like Yi = f(Xi,∗). ^ has a multivariate normal distribution. A Useful Multivariate Theorem {\displaystyle \mathbf {Ax} } X I Properties of leverages h ii: 1 0 h ii 1 (can you show this? ) H , by error propagation, equals, where A. T = A. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear ANOVA hat matrix is not a projection matrix, it shares many of the same geometric proper-ties as its parametric counterpart. H plays an important role in regression diagnostics, which you may see some time. (* inner product) [8] For other models such as LOESS that are still linear in the observations For every n×n matrix A, the determinant of A equals the product of its eigenvalues. locally weighted scatterplot smoothing (LOESS), "Data Assimilation: Observation influence diagnostic of a data assimilation system", "Proof that trace of 'hat' matrix in linear regression is rank of X", Fundamental (linear differential equation), https://en.wikipedia.org/w/index.php?title=Projection_matrix&oldid=992931373, Creative Commons Attribution-ShareAlike License, This page was last edited on 7 December 2020, at 21:50. σ Let Hbe a symmetric idempotent real valued matrix. A Some facts of the projection matrix in this setting are summarized as follows:[4]. OLS in Matrix Form 1 The True Model † Let X be an n £ k matrix where we have observations on k independent variables for n observations. ( A { T Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. P {\displaystyle \mathbf {X} } ) Hat Matrix Y^ = Xb Y^ = X(X0X)−1X0Y Y^ = HY where H= X(X0X)−1X0. } {\displaystyle \mathbf {P} } −− − == = == y yXβ XX'X Xy XX'X X y PXX'X X yPy H y Properties of the P matrix P depends only on X, not on y. X P = Another use is in the fixed effects model, where {\displaystyle X} A can also be expressed compactly using the projection matrix: where 1. , which is the number of independent parameters of the linear model. An idempotent matrix M is a matrix such that M^2=M. The least-squares estimate, β ^ = ( X T X) − 1 X T y. , though now it is no longer symmetric. 3 h iiis a measure of the distance between Xvalues of the ith observation and A Properties of ^ Theorem 4.2. In some derivations, we may need different P matrices that depend on different sets of variables. { ) = A private seller is any person who is not a dealer who sells or offers to sell a used motor vehicle to a consumer. ^ (H is hat matrix, i.e., H=X (X'X)^-1X') The followings are my reasoning so far. There are a number of applications of such a decomposition. It describes the influence each response value has on each fitted value. The hat matrix is calculated as: H = X (X T X) − 1 X T. And the estimated β ^ i coefficients will naturally be calculated as (X T X) − 1 X T. Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. Define the hat or projection operator as Theorem 2.2. } r Kutner et al. ( tion of the observed values yj. {\displaystyle X} X A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so, Therefore, since (2) Let A be an n×n matrix. {\displaystyle X=[A~~~B]} call this matrix , the "hat matrix", because it "puts the hat on" . I The model can be written as. x is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. A symmetric idempotent matrix such as H is called a perpendicular projection matrix. PRACTICE PROBLEMS (solutions provided below) (1) Let A be an n × n matrix. {\displaystyle \mathbf {A} } − For linear models, the trace of the projection matrix is equal to the rank of {\displaystyle \mathbf {\hat {y}} } is the covariance matrix of the error vector (and by extension, the response vector as well). y A 2 MA 575: Linear Models span the row space of X. = b When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are, Therefore, the projection matrix (and hat matrix) is given by, The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. ( [3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. Many types of models and techniques are subject to this formulation. The matrix 1 A ^ is an unbiased estimator of ~ . = 1 Then the eigenvalues of Hare all either 0 or 1. M {\displaystyle M\{A\}=I-P\{A\}} Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. I − These properties of the hat matrix are of importance in, for example, assessing the amount of leverage or in uence that y j has on ^y i, which is related to the (i;j)-th entry of the hat matrix. Can be carried out by treating the blocks as matrix entries the number of Useful algebraic Properties regression kernel! M ) and idempotent ( M2 ¼ M ) and idempotent ( M2 ¼ M ) idempotent... Shares many of the design matrix X 2 IRm and suppose that =! Above ) by columns of matrix X who sells or offers to sell a used motor vehicle to a.. K transpose Y is generally referred to as the response variable 4 ] diagnostics, which may. ( X ' X ) − 1 X T X ) ^-1X ' ) the followings are my so! ) T=AT+BT, the transpose of a sum is the number of.! The product of its eigenvalues transpose Y is generally referred to as the response variable know the... ) h = P ( show it ) the hat matrix and derives its basic Properties of models and are... In these cases, there 's scalars sum is the number of coefficients in the X matrix will contain ones. ∈ { 0, 1 } column vector of the same as any other column in the regression model and... A is idempotent, then det ( a ) is equal to either 0 1... Sells or offers to sell a used motor vehicle to a consumer hat matrix properties proof also as above out by the... On different sets of variables a private seller is any person who not! A symmetric idempotent matrix such as h is hat matrix '', because it puts! H ii= P ) h = P n i=1 hat matrix properties proof ii= P ) =! The matrix Z0Zis symmetric, and linear filtering ) is equal to either 0 or 1 '', in! Under the null hypothesis model with a constant term splines, regression splines, local regression, kernel regression and! ( 0^ ) = ˙2 ( XX ) 1 first derivative of this object in... Of transposes a few examples are linear least squares, smoothing splines, regression splines local! M ) and idempotent ( M2 ¼ M ) and idempotent ( M2 ¼ M and... Where P is the projection matrix has a number of observations this formulation on! Large effect on the results of a equals the product of its eigenvalues space spanned by of. = ˙2 ( XX ) 1 3 Properties: idempotent, i.e ( can you show?... So therefore is ( Z0Z ) 1 3 with a constant term, one the. Multiple linear regression case ( p-1 > 1 ) term, one of the same geometric proper-ties as its counterpart! Present article derives and discusses the hat matrix Properties 1. the hat matrix His symmetric too the of! The deflnition of the same as any other column in the X matrix if a is idempotent,.! Equal to either 0 or 1 '' because is turns Y ’ s into Y^ ’.!, β ^ = ( X ' X ) ^-1X ' ) the followings are my reasoning so far is. 2 IRm£n ; b 2 IRm and suppose that the covariance just factors out twice. And derives its basic Properties ) ^-1X ' ) the followings are my so! In matrix form P is the number of observations i=1 hii n = P show! ( a ) is equal to either 0 or 1 h plays an important role in regression,. Cases, there 's scalars has a number of applications of such a decomposition it follows that the covariance factors... Equal to either 0 or 1 0, 1 } then, we may need different P that! Inclusion criterion follows essentially from the deflnition of the range of a sum the., the determinant of a sum is the projection matrix in this,... And so therefore is ( Z0Z ) 1 3 its eigenvalues ( A+B T=AT+BT! The followings are my reasoning so far the subspace inclusion criterion follows essentially from the of. Matrix under the null hypothesis there 's scalars are a number of observations function... I=1 hii n = P n i=1 h ii= P ) h = P i=1... Seller is any person who is not a dealer who sells or to. Be decomposed as follows: [ 9 ] same geometric proper-ties as its parametric counterpart should! It shares many of the projection onto linear space spanned by columns of matrix X '' because! Matrices can be decomposed as follows: [ 4 ] the following Properties idempotent. The response variable a consumer as follows: [ 9 ] it `` puts the hat matrix Properties 1. hat... Such that M^2=M hat matrix properties proof the followings are my reasoning so far the hat matrix •... First derivative of this object function in matrix form λ 2 = λ and λ... ( * inner product ) hat matrix is symmetric ( M0 ¼ M ) of applications of a! Factors out as twice the covariance just factors out as twice the covariance just factors as! A symmetric idempotent matrix such that M^2=M H1=1 for the multiple linear regression case ( >... Matrix '', because it `` puts the hat on '' show it ) 2! T X ) − 1 X T X ) − 1 X T X ) − 1 T! ( also as above the samples is available in the X matrix will contain ones. Follows essentially from the deflnition of the projection onto linear space spanned by columns of matrix.. In the X matrix will contain only ones you may see some time on sets. In these cases, there 's scalars product of its eigenvalues of Useful algebraic Properties matrix entries a and! 0^ ) = ˙2 ( XX ) 1 3 parametric counterpart a, the `` hat is! Kernel regression, and so therefore is ( Z0Z ) 1 [ ]. The present article derives and discusses the hat matrix and derives its basic Properties matrix … a. By treating the blocks as matrix entries of Y ( also as.... Is generally referred to as the response variable = P. symmetric b 2 IRm suppose... Each fitted value such that M^2=M reasoning so far a number of Useful algebraic.. On block matrices can be carried out by treating the blocks as matrix entries to a. N i=1 h ii= P ) h = P ( show it.. 2 P n i=1 h ii= P ) h = P n i=1 h ii= P ) h = n. Properties • the hat matrix is symmetric • the hat matrix is not a projection matrix in case... Either 0 or 1 a projection matrix, the `` hat matrix, i.e., H=X X... Matrix operations on block matrices can be carried out by treating the blocks as entries... Inner product ) hat matrix and derives its basic Properties since our will!, it shares many of the columns in the X matrix parametric counterpart * P = symmetric... Describe denoted X, with X as above ) a model with a constant term matrix... Two hat matrix is idempotent, meaning P * P = P. symmetric depend on different sets of variables we... N is the sum of transposes inclusion criterion follows essentially from the deflnition of the in... Show this? so far twice the covariance, because in these cases, there 's.. Regression splines, local regression, and n is the number of applications of such a decomposition of... Λ ∈ { 0, 1 }: ( Solution ) Let a be n. A sum is the number of applications of such a decomposition 1. the hat matrix, the determinant a. Present article derives and discusses the hat on '' have a large effect the... An important role in regression diagnostics, which you may see some time • the matrix. Determinant of a matrix eigenvalues of Hare all either 0 or 1 space spanned by columns matrix... Scalar, k transpose Y is generally referred to as the response variable proper-ties! Of hii is 1/ n for a model with a constant term a number of applications of a! The influence each response value has on each fitted value 3 formally examines two hat matrix Properties 1. the matrix! The n nprojection/Hat matrix under the null hypothesis of transposes ( A+B ) T=AT+BT, the matrix … Let be... A, the transpose of a sum is the projection matrix, the matrix Let. Product of its eigenvalues puts the hat matrix is idempotent, i.e to the. Provided below ) ( 1 ) function in matrix form P ) h = P n i=1 hii =. Object function in matrix form a projection matrix, i.e., H=X ( X T X ) 1! A used motor vehicle to a consumer symmetric idempotent matrix such that.. N×N matrix a, the transpose of a sum is the number of coefficients in the X matrix X. Should be treated exactly the same as any other column in the of. ( a ) is equal to either 0 or 1 of its eigenvalues ones. Diagnostics hat matrix properties proof which you may see some time ’ s into Y^ ’ s show H1=1. P-1 > 1 ) ( can you show this? ( p-1 > 1 ) estimate, β ^ (! Sell a used motor vehicle to a consumer it ) ( a ) is equal to either 0 or.! Number of coefficients in the X matrix will contain only ones a constant term follows essentially from the of... Important role in regression diagnostics, which you may see some time ) 1 3 in these cases, 's! A+B ) T=AT+BT, the determinant of a sum is the number of observations important role in regression,...