In this article
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights.
Although this correlation is fairly obvious, your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data.
There are several different correlation techniques. Reportal uses the most common type, called the Pearson or product-moment correlation. The full formula is included in Appendix A.
Correlation Coefficient
The result of a correlation calculation is called the correlation coefficient. It ranges from +1, indicating a perfect positive linear relationship, to -1, indicating a perfectly negative linear relationship.
The closer this value is to +1 or -1, the more closely the two variables are related.
If the value is close to 0, it means there is no relationship between the variables.
If the correlation coefficient is positive, it means that as one variable gets larger, the other also gets larger. If the value is negative, this means that as one variable gets larger, the other gets smaller (often called an "inverse" correlation).
Considerations when using Correlations
Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color. The question must therefore be a Numeric, or a Single with a score.
Another key thing to remember when working with correlations is that correlation does not show causation; never assume a correlation means that a change in one variable causes a change in another. Sales of computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).
The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health care. They are related, but the relationship does not follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults.
Correlation Header Properties
To access the correlation header properties, double-click the header object or right-click on it and select Edit. The properties page opens .
Figure 1 - The Correlation header properties
The properties are as follows:
- Decimals - sets the number of decimals that the correlation calculation is to display.
- Hide Data - if selected, all data below this header is hidden from the table. The data is removed from the table after formulas are calculated. This setting only applies to leaf headers, that is headers without other headers nested inside. This setting is useful if you wish to use some data in a formula calculation but do not want to include the data in the table. In the designer, you can choose see the hidden data by selecting "Show Hidden Data".
- Hide Header - if selected, the header cells are collapsed in the table if possible. This setting is useful if for example the header is only to be used to filter a column.
- Label - allows a label to be set for the correlation column to display on the table.