SPSS/PASW 18 Core System and Statistics User’s Guides are Available to Download

There are many important reasons for obtaining a user’s guide for software packages. The least of which is to be able to identify and access each available software option and, once accessed, understand exactly how to “put it to use” in practice. This is especially true when opening the Statistical Package for the Social Sciences or SPSS (now rebranded by IBM as PASW).  SPSS helps researchers to address a series of unavoidable challenges when they conduct quantitative research (e.g., How to: build a versatile database, enter and define data, transform/recode variables, merge external data into an existing data set, run a series of statistical tests, generate tables and charts, develop reports, and so on). The problem is that student researchers often have both procedural and data analysis-oriented questions about SPSS that often occur outside of the time spent conducting in-class SPSS assignments.

To address this concern, the SPSS/PASW 18 Core System User's Guide and SPSS/PASW 18 Statistics Base User’s Guide are now available for you to download to your hard drive. The documents range from 3-5MB so be sure you have a little time to download them if your internet access is not high speed. There are two ways to access each user’s guide:
  1. Locate “SPSS 18 User’s Guides” on the left side of Broken Pencils, right-click over the User's Guide of interest, click on SAVE TARGET AS and save it to the location of your choice (such as your hard drive or a flash drive).
  2. Right-click on this link: SPSS 18 Core System User’s Guide, click on SAVE TARGET AS and save it to the location of your choice.
  3. Right-click on this link: SPSS 18 Statistics Base User’s Guide, click on SAVE TARGET AS and save it to the location of your choice.
Once they're downloaded, be sure to put them in a location that is easily accessible. When you open a user's guide in Adobe Reader (i.e., in .pdf format), use the tree structure on the left side of the screen to scroll down to the procedure you want explained. Did you forget how to RECODE a variable into a new one?  Just open the SPSS/PASW 18 Core System User's Guide, scroll down to Data Transformations (CH8), click on the RECODE sections and apply the procedure to your current needs.

Note: I have also provided the SPSS/PASW 18 Brief Guide on this site's menu of SPSS user's guides. This document usually accompanies the SPSS Student Version of the software program so, while it is more limited in scope, it can provide you with quick answers to your SPSS-based research needs.

I hope these documents are helpful.

Professor Ziner

How To Study Statistics

Since the outset of the term, I have been contacted by many students who seem to be asking the same question: “How do I study for your class?”  Relax.  All of you are not alone.  Many students have trouble learning statistics because they never develop the particular study habits which are conducive to success in my statistics classes. If you pay close attention to the following suggestions they should prove invaluable to you.
  1. READ CAREFULLY AND DELIBERATELY. The way in which you should read in statistics is quite different from the way you may read a history book, newspaper, or a novel. In statistics you must read slowly, absorbing each word. It is sometimes necessary to read a textbook discussion or problem many times before it begins to "make sense" to you. In some types of reading, such as a novel, it is desirable to skim and read rapidly, because there are usually a few thoughts "sprinkled" among many words. However, in reading statistics each word or symbol is important because there are many thoughts condensed into a few statements. Keep in mind that the little words mean a lot in statistics.

  2. THINK WITH PENCIL AND SCRATCH PAPER. Always have pencil in hand and scratch paper ready and use them when you read and study statistics. Test out the ideas on paper that the authors are discussing. When they propose a question, try to answer it before going on. Even though an example may be worked out completely in the text, work it out for yourself on scratch paper. This will help to clinch the ideas and procedures in your mind before starting the exercises. After you have read and reread a problem carefully, if you still don't see what to do, don't just sit and look at it. Get your pencil going on scratch paper and try to "dig it out.” If, in attempting to solve a problem, you have nothing written on paper, then you have not yet exerted enough effort to justify seeking help.

  3. BE INDEPENDENT. Try to complete each lesson without assistance. If you seek help needlessly, either from me, a classmate, this Broken Pencils’ blog, or a math tutor, you will not gain the maximum benefit from your work. It takes exercise, as you know, to become strong.  You cannot learn statistics through someone else's exercise.  However, you must ask questions when necessary.  That’s what this blog is for.  Sometimes little things cause considerable confusion. Do not be afraid that your question may sound "dumb."  The only "dumb" action is to fail to ask about a topic that you have really tried to grasp and still do not get.  Some people seek help too soon and some wait too long.  You will have to use good common sense in this matter.

  4. LISTEN IN CLASS AND READ MY BLOG (Broken Pencils).  Many of the finer points, fundamental principles and modes of thought will be developed in class.  You must pay careful attention to these educational objectives to really understand what is going on.  Take notes!  Also, many of these points are (or will be) available at this course blog. Join it to receive notification of new posts in your in-box as soon as they are posted.  I’ve already had students ask basic questions that would have been fully addressed if they tuned into the blog.  Note that Broken Pencils is growing weekly (first launched on 8/28/2010), so join to receive updates and benefit from the discussion.

How to Find the Upper and Lower Limits of Class Intervals

J.D. asked about the way statisticians identify upper and lower limits of class intervals found in a variable's frequency distribution.  When data are comprised of interval/ratio numbers or class intervals, e.g., (20-29) (30-39) (40-49) and so on, the limits of such numbers or class intervals are understood in terms of “true (real) limits.”  True/real limits are defined by the highest possible value – the upper limit – and the lowest possible value – the lower limit. The general rules for calculating the true limits of class intervals represented by numbers are:

Upper True Limit:  Add a 5 to the decimal place to the right of the last number appearing in the highest value specified by the number in the class interval.

Lower True Limit:  Subtract a 5 to the decimal place to the right of the last number appearing in the lowest value specified by the number in the class interval.

If the class intervals of a variable are defined by whole numbers, to find the upper limit we add .5 to the highest value specified by the category, and to find the lower limit we subtract .5 from the lowest value specified in the category.  The limits of other numbers could be similarly determined.  The table below provides some illustrations.

Click to Enlarge

Exporting SPSS Graphs and Tables

After generating output in SPSS and deciding to export some of the charts and tables into MS Word, several students found that some tables "don't quite fit" the (default) margins of the page. In this brief discussion, you will learn to export output (e.g., graphs and tables) from SPSS into Microsoft Word and other programs such as PowerPoint.

Procedure

You can export a graph or table by right clicking to bring up a pop-up menu. You have several choices in how to save the file and how to import it into another software program. While there are many choices, I have found that exporting tables and graphs into a Microsoft Word (.doc) format works well. Use of the .pdf format may be a bit premature, since statistical  output represent the building blocks of analysis and report writing in various word-processing and presentation software -- not the final product. You are encouraged to experiment with alternatives to see what best meets your needs.

Once a statistical procedure is performed in SPSS, we'll need to export a table, chart or text into MS Word so the output can be analyzed and a report developed. Just right-click over the object(s) -- a chart or table -- you want to export. An "Export Output" window will appear (see inset image below). The default format, Word/RTF (.doc), is likely the best format for course-based assignments.



Click to Enlarge

Under "File Name," you can determine where the Word/RTF document will be located, including a specific subdirectory on your hard drive or flash drive or to disc.  From within the newly saved document, you can change the appearance of the table, including altering its color, font style, size and direction and many more options.  Word interprets each SPSS table in ways similar to when you create your own within the program.  Just highlight an imported table and the "DESIGN" tab among Word's Table Tools appears.  It's your choice from there.

Sizing your Chart in MS Word

You may find an imported table or chart is wider than the margins in your document.  To address this, in the Word document, go to the "layout" portion, click "cell size" and then follow the chart below to make the chart shink to the size of a page.  This feature addresses the width of the output object.


Another way to shrink a table to make it fit better is to click to highlight a table in WORD and then click to single space it. Single spacing is found under the paragraph option in the HOME ribbon in MS Word.  You'll see the table shrink in height.

Happy Exporting!

Properties of Data in Statistical Analysis: Three Levels of Measurement (Nominal, Ordinal and Interval/Ratio)

S.B. asked me to comment on what are the “levels of measurement” I have been referring to in class, and importantly, what they have to do with our statistical forums conducted using SPSS. Good question, grasshopper. Let me try to put it in perspective.

So you plan to conduct a statistical analysis on a dependent variable (Y) and several independent variables (Xs) in your statistical forum due shortly. Where do you start? To be able to determine what statistical test is most appropriate for your task at hand, first you must assign a level of measurement to your study’s dependent variable: nominal, ordinal or interval/ratio. Your choice will depend on the type of variable that’s involved in your analysis. You should keep in mind that not only are variables measured differently but many variables can be measured at more than one level (see “Note” at the end of Pros and Cons to a Univariate Analysis). Although levels of measurement differ in many ways, they have certain similarities as well and can be classified using a few basic principles.

What do I mean by “measurement”?  Here are a few orienting points:
  • Measurement can be defined as the assignment of numbers to a variable according to sets of predetermined rules
  • The things we observe (gender or race, e.g.) are variables; any particular observation of that variable is an assigned number (e.g., "1" = male)
  • We view the property of numbers that define the values of a variable

Here are the three fundamentally different ways in which numbers are used in statistical research:

  1. Nominal Level Data: To name or identify
  2. Ordinal Level Data: To represent position in a series or scale
  3. Interval/Ratio: To represent quantity
Data Measured at the Nominal Level

When measuring a variable at the nominal level, the properties of the variables you’re working are categories. A number is then assigned to each category (e.g., for the variable “sex,” 1=male and 2=female). Race, region, and religion are additional examples of the numerous variables measured at the nominal level (sometimes referred to as the nominal scale). The main principle underlying nominal data is that they do not imply any ordering among the responses. Using “Party Affiliation" as an example (see inset chart below), the value of “1” for Republican is no more or less of the property of party affiliation than the value of "2" for Independent or “3” for Democrat. These numeric values are simply categories of the variable “Party Affiliation.” Data measured at the nominal level represent the lowest level of measurement.


Principles of Nominal Data: If the measurement tells only what class a case (e.g., a person) falls into with respect to the variable. Categories of nominal data are mutually exclusive and exhaustive.

Examples of Nominal Data:

1. Are you: __ (1) Male __ (2) Female
2. Are you: __ (1) Protestant __ (2) Catholic __ (3) Jewish __ (4) Muslim __ (5) Other

Second Class Poll Ends: A Statistic is to a Sample what a Parameter is to a Population

I’m not surprised that most (84%) who participated in the second class poll got it right (see the inset chart below). Over the first few class periods, my focus was to link the following four concepts: statistic, sample, parameter and population. Any discussion in a statistics class begins with this conceptual framework.


Since we seldom have complete knowledge of the parameters of a population, such as the mean of the population (µ) and its standard deviation (σ), we must obtain a representative (aka, random) sample from that population. In doing so, we are then working with statistics that describe the sample, not the population. Statistics calculated from sample data will, in turn, serve two broad purposes:

1. Descriptive Analysis. There are four ways that statistics describe sample data. They allow the researcher to examine levels of skewedness (degree of symmetry) and kurtosis (level of peakedness) of a variable’s distribution. They also quantify the central tendency (mode, median and mean) and dispersion (range, standard deviation and variance) of sample data that comprise a variable’s distribution.  Refer to my "Weekly PowerPoints" for details on the role of descriptive statistics.

2. Inferential Analysis. An important second purpose of sample statistics is to generalize outcomes found in sample data to the population from which the sample was drawn. The two broad roles for inferential analyses were reviewed in a previous blog which examined the outcome of our first class poll: Estimating Population Parameters and Testing Hypotheses.

The upshot is that for every statistic that describes some characteristic of a sample there is a corresponding parameter that describes the same characteristic of a population. We analyze sample statistics so that we may generalize statistical outcomes from the sample to the population of interest.

First Class Poll Ends: Inferential Analysis Involves …

The results of our first class poll (see inset chart below) reveal that most who participated correctly identified inferential analysis to involve two important forms of statistical research. The first is that we estimate population (N) parameters. Parameters describe populations of interest and include the mean of the population (µ) and its standard deviation (σ). Since researchers do not always know these characteristics of a population under study, they must be estimated from sample data. In the case of µ, we calculate confidence intervals (usually based on 95% or


99% confidence) to assess the range within which a true population parameter will fall. If you provided a single estimate of 190 lbs. for my weight, for example, how confident would you be that it’s correct? If you then widened your point estimate to an interval estimate of 180-200 lbs., wouldn’t you be more confident that my true weight is in this range? How about 160-220 lbs.? The point is that the wider the interval, the greater our confidence that the true µ falls within that interval.

Second, inferential analysis also involves hypothesis testing — a building block of research. A hypothesis is a statement of the relationship between two (or more) variables where one is seen as the independent variable (symbolized as X or the “cause”) and the other is the dependent variable (symbolized as Y or the “effect”). What is the effect of cigarette smoking (X) on rates of cancer (Y)? Hours studying statistics (X) on course grade performance (Y)? Size of a company (X) and employee job satisfaction (Y)? When we are interested in determining the statistical significance of these relationships by testing the effects of one variable on the other, we test hypotheses. Note that we cannot determine causality through hypothesis testing alone. Causality involves demonstrating statistical covariation and determining time-order (via variable manipulation procedures) and establishing statistical control (removing the possibility that some third factor can explain away X and Y’s significant covariation). Of the three elements of causality (covariation, manipulation and control), hypothesis testing provides the means to demonstrate statistical covariation of a bivariate (two-variable or "X on Y") relationship – an important first step in understanding and explaining the world around us.

The last two choices, “Generalizing to Samples” and “Heavy Drinking on Weekends,” are simply wrong. While conducting inferential analysis may lead some people to drink heavily on weekends (an interesting study in its own right), the process does not involve such behavior any day of the week.

I encourage you to participate in future class polls found at Broken Pencils! I will always analyze the results when the polling period expires and examine the answers to the questions I pose.

NOTE: Percentages do not add up to 100. This poll gave students the option of choosing more than one response. Future polls will be limited to one choice.

Pros and Cons to a Univariate Analysis

One purpose of our SPSS statistics forums is to effectively communicate quantitative information about sample data to your audience (e.g., your client, boss or, in your case, professor). We have already discussed how studies conducted on sample data involve a combination of descriptive and inferential statistics. The first task of a researcher is to examine what has been collected using one or more descriptive statistics. Often referred to as a “univariate” (one-variable) analysis, the goal is to describe each variable of interest in your database using the best possible statistic(s). The usual suspects include measures of central tendency (mean, median and mode) and dispersion (range and standard deviation). To visually display the distribution of a variable, a percentage distribution, bar chart or histogram is useful. The question is whether the descriptive statistics chosen fit the level of measurement that characterizes a variable (i.e., whether X or Y is nominal, ordinal or interval/ratio). For example, if you create the variable “sex” using 1=male and 2=female in SPSS, reporting its mean would be meaningless (a mean of 1.5 tells us what?). Consequently, the only useful statistic is the frequency or percentage of each sex in the sample (e.g., 60 percent of the sample were female). If you want, you can add a bar chart to graphically depict the distribution.

So how do I evaluate a good univariate analysis? The answer is straightforward. I look for two fundamentals. First, given the variable you selected to describe, did the chosen statistics fit the variable’s level of measurement (mentioned above and covered in class).  Second, how effective was your presentation of the variable?  In the case of the latter, I’m referring to the information you provided to your audience and how you presented it (visually). The following is a good example of a univariate analysis of the GSS "Political Outlook" scale (i.e., polviews):

Differences Between a Distribution of Scores and Dispersion of Scores

R.W. asked me to distinguish between a distribution and dispersion. When we speak of distributions, we normally refer to its shape (e.g., normal, skewed, bimodal, leptokurtic, etc.) to describe the nature of the respondents’ scores. Using one of the examples I just gave (and, for PS 205, covered in class), a leptokurtic distribution indicates uniformity or homogeneity of scores with respect to the variable under study (such as test scores in a class). Nothing exact here – we’re just eyeballing the shape of the distribution. However, if we want to specify exactly how much variability around a mean there is in any distribution of scores, we speak of dispersion as measured or quantified by the range or, better yet, standard deviation.

Example: Three different classes took my statistics exam. Each improbably earned the same mean, say 70, but were shaped much differently according to the following distributions:


While their respective shapes tell us that one class performed more uniformly than the other two (as evinced by a leptokurtic distribution), and another class revealed a lack of uniformity in their scores (a platykurtic distribution), we need a measure of variability to quantify precisely how much dispersion exists in each distribution. That’s where the standard deviation (s) comes in. The smaller the s, the smaller the dispersion around the mean and the more likely its shape will tend towards being leptokurtic (or homogeneous). Naturally, the opposite holds true. The larger the s, the larger the dispersion around the mean and the more likely its shape will tend towards being platykurtic (or heterogeneous).

In short, three classes with the same mean will reveal three different distributions (shapes) only when their respective s values (i.e., dispersion measured by the standard deviation) differ sizably in the ways I just described.