We have come a long way this semester (in spite of weather cancellations!) and each of you should be congratulated on your QBA performance to date. Just one more hurdle to go – Business Statistics Forum #6!
The last section of Part II (see “BSF #6 Guidelines” on the left side of this site) has you analyze the potential for employment discrimination at IBM using hypothetical employee data and an SPSS procedure that produces a multiple regression model based on the data. We have been reviewing the relationship between correlation (r and r2) and regression (R, R2 and 1-R2) in class, through lectures, blog .pdfs and a PowerPoint presentation (also found on the left side of the site), and have integrated SPSS procedures into the discussion. Though an important part of the discussion has centered on what to do when the numbers are in, I thought I’d continue this part of our in-class dialogue here to help sharpen your data analytic skills.
Specifically, you are to test and analyze the following regression equation and include the implications in the recommendations section written for your boss at IBM:
The last section of Part II (see “BSF #6 Guidelines” on the left side of this site) has you analyze the potential for employment discrimination at IBM using hypothetical employee data and an SPSS procedure that produces a multiple regression model based on the data. We have been reviewing the relationship between correlation (r and r2) and regression (R, R2 and 1-R2) in class, through lectures, blog .pdfs and a PowerPoint presentation (also found on the left side of the site), and have integrated SPSS procedures into the discussion. Though an important part of the discussion has centered on what to do when the numbers are in, I thought I’d continue this part of our in-class dialogue here to help sharpen your data analytic skills.
Specifically, you are to test and analyze the following regression equation and include the implications in the recommendations section written for your boss at IBM:
(Click to Enlarge)
Since your task is to interpret MRA output, I will assume you have already reviewed the assigned textbook readings, class notes and .pdfs, and the PowerPoint presentation and are ready to roll up your sleeves to get to work. For the purpose of this discussion, I will not run and analyze all seven independent variables ON the dependent variable (salary) that are required in this MRA. Instead, I will run only four: age, education, previous experience in months, and months since being hired (see FIG 1: Explaining Salary from Four Predictors using MRA). Notice that there are three tables in this figure: Model Summary, ANOVA table and Coefficients table. I will discuss the role of each table in your MRA in turn.
The model summary output stems from a default forced entry procedure in SPSS (i.e., METHOD=ENTER), rather than FORWARD, STEPWISE or some other useful method. The implications of METHOD=ENTER are that all predictor variables are entered into the regression equation at one time and subsequent analysis then follows. Note: The decision to choose a specific method will depend on the theoretical interests and understandings of the researcher. As an example, refer to our fifth BSF (i.e., Laurie Burney; Nancy Swanson. The Relationship between Balanced Scorecard Characteristics and Managers' Job Satisfaction. Journal of Managerial Issues, v22 i2, Summer 2010: 166-181) which used MRA to explain manager’s job satisfaction from multiple sets of predictor variables.
Since your task is to interpret MRA output, I will assume you have already reviewed the assigned textbook readings, class notes and .pdfs, and the PowerPoint presentation and are ready to roll up your sleeves to get to work. For the purpose of this discussion, I will not run and analyze all seven independent variables ON the dependent variable (salary) that are required in this MRA. Instead, I will run only four: age, education, previous experience in months, and months since being hired (see FIG 1: Explaining Salary from Four Predictors using MRA). Notice that there are three tables in this figure: Model Summary, ANOVA table and Coefficients table. I will discuss the role of each table in your MRA in turn.
The model summary output stems from a default forced entry procedure in SPSS (i.e., METHOD=ENTER), rather than FORWARD, STEPWISE or some other useful method. The implications of METHOD=ENTER are that all predictor variables are entered into the regression equation at one time and subsequent analysis then follows. Note: The decision to choose a specific method will depend on the theoretical interests and understandings of the researcher. As an example, refer to our fifth BSF (i.e., Laurie Burney; Nancy Swanson. The Relationship between Balanced Scorecard Characteristics and Managers' Job Satisfaction. Journal of Managerial Issues, v22 i2, Summer 2010: 166-181) which used MRA to explain manager’s job satisfaction from multiple sets of predictor variables.
There are two essential pieces of information in the Model Summary table: R and R2. The multiple correlation coefficient (R) is a measure of the strength of the relationship between Y (salary) and the four predictor variables selected for inclusion in the equation. In this case, R=.667 which tells us there’s a moderate-to-strong relationship. By squaring R, we identify the value of the coefficient of multiple determination (i.e., R2). This statistic enables us to determine the amount of explained variation (variance) in Y from the four predictors on a range from 0-100 percent. Thus, we’re able to say that 44.5 percent of the variation in Y (salary) is accounted for through the combined linear effects of the predictor variables. THIS IS MISLEADING, however, since we don’t yet know which of the predictors has contributed significantly to our understanding of Y and which ones have not. We will address this important issue in the last table (Coefficients) when we explore each predictor’s beta (i.e., standardized regression coefficient) and its level of significance.
Question: How do we know if the MRA model itself is statistically significant or if we’re just wasting our time staring at non-significant output? The answer is found in the ANOVA table. Because R2 is not a test of statistical significance (it only measures explained variation in Y from the predictor Xs), the F-ratio is used to test whether or not R2 could have occurred by chance alone. In short, the F-ratio found in the ANOVA table measures the probability of chance departure from a straight line. On review of the output found in the ANOVA table, in one sentence we can address the above-cited question: We find that the overall equation was found to be statistically significant (F=93.94, p<.000).
Figure 1: Explaining Salary from Four Predictors using MRA
(Click to Enlarge)
Finally, as I previously pointed out, we need to identify which predictors are significant contributors to the 44.5 percent of explained variance in Y (i.e., R2=.445) and which ones are not – and in what way(s) do the significant ones help us to explain Y. Answers to these important questions are found in the Coefficients table shown above. In terms of our focus, note that for each predictor variable in the equation, we are only concerned with its associated (1) standardized beta and (2) t-test statistic’s level of significance (Sig.). As always, whenever p <.05, we find the results statistically significant. For the purpose of understanding MRA output, this means that when a p-value (SIG.) is less than or equal to .05, the corresponding beta is significant in the equation. So what did we find?
From this equation, the level of education was found to be the only independent variable with a significant impact on employee salary (b=.673, p<.000) when all of the variables were entered into the regression equation. We found that the higher the level of employee education at IBM, the greater the salary level. The employee’s age, previous experience in months, and time on the job since being hired did not meet the necessary criteria to significantly impact employee salary, so they played no role at this stage of the analysis. We should note that an employee’s previous experience (measured in months) did approach significance (b=.107, p=.06).
I hope this helps!
Professor Ziner
From this equation, the level of education was found to be the only independent variable with a significant impact on employee salary (b=.673, p<.000) when all of the variables were entered into the regression equation. We found that the higher the level of employee education at IBM, the greater the salary level. The employee’s age, previous experience in months, and time on the job since being hired did not meet the necessary criteria to significantly impact employee salary, so they played no role at this stage of the analysis. We should note that an employee’s previous experience (measured in months) did approach significance (b=.107, p=.06).
I hope this helps!
Professor Ziner
thank you very, very much! I found this information very helpful in completing my assignment of interpreting the data of the SPSS output tables.
ReplyDeleteDear Professor, there is too much contrast in the background and the text which puts the eyes under too much pressure. I would be thankful if you can change the style of the blog.
ReplyDeleteDear Prof., kindly advise on Factor Analysis. I'm confused about exploratory analysIs and factor loadings. PLEASE EXPLAIN like you did for regression on your blog.
ReplyDeleteI think I just don't know what's happening!
Regards,
Bianca
Thank you very very much. This is very helpful
ReplyDeleteWhat should you do if the ANOVA was not statistically significant? How do you report it?
ReplyDeleteI’m not that much of a online reader to be honest but your blogs really nice, keep it up! I’ll go ahead and bookmark your website to come back later. Many thanks
ReplyDeleteHire a Statistician
analysis in statistics
spss method of data analysis
what statistical test to use
statistics help online