Correlation and Significance
Correlation and Significance
(OP)
Hi all,
I have not studied statistics in a long time and could use some advice. I have carried out six numerical analyses to investigate the deformation of six similar devices. Now I wish to determine whether there is a relationship between the geometrical properties of the device, such as its width/thickness, and the stress that is calculated within the device.
I have plotted the device thickness against the maximum stress predicted within the vessel and noticed a linear(ish) relationship. As my sample size is quite small I am not sure which correlation coefficient I should use to quantify the correlation between the two variables. If I have done the maths correctly, I get r = 0.88 and p < 0.05 using Pearson's coefficient and r = 0.97 and p < 0.002 using Spearman's coefficient.
Based upon what I have read I am more inclined to use Spearman's coefficient but I am not very confident.
Any advice would be appreciated!
Dave
I have not studied statistics in a long time and could use some advice. I have carried out six numerical analyses to investigate the deformation of six similar devices. Now I wish to determine whether there is a relationship between the geometrical properties of the device, such as its width/thickness, and the stress that is calculated within the device.
I have plotted the device thickness against the maximum stress predicted within the vessel and noticed a linear(ish) relationship. As my sample size is quite small I am not sure which correlation coefficient I should use to quantify the correlation between the two variables. If I have done the maths correctly, I get r = 0.88 and p < 0.05 using Pearson's coefficient and r = 0.97 and p < 0.002 using Spearman's coefficient.
Based upon what I have read I am more inclined to use Spearman's coefficient but I am not very confident.
Any advice would be appreciated!
Dave





RE: Correlation and Significance
http://en.wikipedia.org/wiki/Spearman%27s_rank_cor...
http://en.wikipedia.org/wiki/Pearson_product-momen...
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
Thanks for the quick reply. I understand the coefficients are designed to measure the strength of the linear/monotonic correlation. I'm just not sure when one should be employed over the other. In a number of articles it is stated that Pearson's coefficient assumes bivariate normal distribution (or something relatively close to bivariate normal distribution). This can be hard to verify when the sample size is small.
As such, should the Spearman coefficient be employed to be safe?
Also, you state that my Pearson coefficient implies that there is not a linear relationship. Wouldn't this depend upon the choice of statistical significance? i.e. Assuming a 5% level of significance wouldn't my coefficient indicate a strong, statistically-significant linear correlation?
Thanks for the advice,
Dave
RE: Correlation and Significance
They are not mutually exclusive, i.e., you do not "use" one instead of the other, since they are not equivalent measures. A Spearman score of 1 only tells you that the variables are monotonically correlated, and tells you absolutely nothing about the shape of the curve. A Pearson score of 1 tells you explicitly that you have a straight-line correlation.
Your specific example is telling you that while your variables are correlated, they are NOT obviously "linear(ish)" This could mean that that there's too much noise, or that the relationship is vaguely resembles a linear one.
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
I understand the Pearson and Spearman coefficients are designed to measure different correlations (linear and monotonic, respectively). As I have a small sample size, however, I do not know whether my data is parametric (i.e. normal distribution) or non-parametric (i.e. not normal distribution). The Pearson coefficient assumes the data is parametric whilst the Spearman coefficient assumes the data is non-parametric.
I am trying to figure out if I should I assume my distribution is parametric and use the Pearson coefficient to analyse the correlation or should I play it safe, assume my distribution is non-parametric and use the Spearman coefficient to analyse the correlation?
Also, I don't understand how Pearsons r = 0.882 does not indicate a strong linear correlation. A value of 0 indicates no linear correlation and a value of +1 or -1 indicates a perfect inverse or perfect positive linear correlation. Shouldn't a value of 0.88 thus indicate a strong positive linear correlation?
Thanks again,
Dave
RE: Correlation and Significance
No, an exponential, x2.535, will produce a correlation coefficient of 0.882, and x2 produces a correlation coefficient of 0.94
As for Spearman's, the Wiki article on the subject shows similar examples of clearly non-line distributions that produce very high >0.9 coefficients.
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
The Pearson correlation coefficient requires that the underlying relationship is assumed to be linear. Having plotted the variables against each other and noted a distinct linear correlation I have adopted Pearson's correlation to quantify the linear correlation. Based on these assumptions, my Pearson r = 0.88 indicates a strong positive linear correlation.
I have also looked into the alternative scenario of quantifying the monotonic correlation between my variables using the Spearman coefficient. The Spearman coefficient only assumes that the relationship is monotonic so I dont see any problem with nonlinear distributions producing high correlation coefficients. My Spearman r = 0.97 indicates a strong positive monotonic correlation.
My problem is that I do not know if my data is parametric (normally-distributed) or non-parametric (not normally distributed). As such, I am looking for advice on whether to go with the Pearson coefficient (which assumes that the data is parametric) or the Spearman coefficient (which does not assume that the data is parametric)?
I am thinking the Spearman coefficient is the safer bet?
Thanks,
Dave
RE: Correlation and Significance
You continue to attempt, in my opinion, to squeeze a square peg down the round hole of "linear fit." Again, not seeing your data, I don't know that's what you're doing, obviously, but the fact that you seem to want a coefficient that supports your conclusion, rather than finding something that demonstrates that the data is indeed linear, raises all sorts of warning flags. Your quest, in my view, should be to find a fitting function that minimizes the mean square error (MSE), and that is not a brute-force spline or polynomial fit that would set the MSE to 0. Perhaps you need to revisit the actual physics of your problem and look at the equations from Roark's.
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
I am not trying to prove that there is a linear dependence between my two variables. I simply want to comment on the fact that there appears to be a general dependence between the two variables and to quantify the strength of this dependence.
Having plotted the data, the dependence between the two variables appears to be linear. Unfortunately, I am a bit concerned over the normality of my data (i.e. whether it is parametric or non-parametric). As I see it I now have two options:
1) I can assume that my data is parametric and that the dependence between the two variables is linear. If I make these assumptions I may quantify the strength of the assumed linear dependence using the Pearson coefficient (A parametric measure of linear dependence).
2) I can assume that my data is non-parametric and that the dependence between the two variables is monotonic. If I make these assumptions I may quantify the strength of the assumed monotonic dependence using the Spearman coefficient (A non-parametric measure of monotonic dependence).
I am not asking which coefficient is better or safer. I am asking whether, in my situation, it would be better/safer to simply assume that the data is non-parametric and to adopt option number 2 above.
Sorry if I am not communicating my problem well, I appreciate your time!
Dave
RE: Correlation and Significance
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
I am not trying to measure the same thing using two the two different coefficients. I am asking, since I don't know whether my data is parametric or non-parametric, if I should make one assumption (data=parametric / dependence=linear) and measure linear dependence or make a completely different assumption (data=non-parametric / dependence=monotonic) and, instead, measure the monotonic dependence. In either case I'm measuring two different things - the strength of either the linear or the monotonic dependence.
My problem is I don't know If I can make the assumption that allows me to measure the linear dependence as I don't know if my data is parametric.
I found an article that suggests that you can evaluate the normality of the data by comparing the skewness of the variables to the standard error. If the skewness is less than twice the standard error it is reasonable to assume that the data is parametric. Otherwise, it is likely that your data is not normally distributed and, as such, it should be assumed that your data is non-parametric (i.e. you cant use Pearsons parametric coefficient to measure linear dependence and should instead use Spearmans non-parametric coefficient to measure monotonic dependence).
Here is the link:
http://www.statstutor.ac.uk/resources/uploaded/spe...
I am no expert so I am not sure if this is reliable?
Dave
RE: Correlation and Significance
To further clarify, neither measure is telling you that you have proper fitting function, so the question of parametric vs. nonparametric is completely moot.
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
I just want to comment on the strength of the perceived linear/monotonic dependence.
You state that my data does not fit a line as the Pearson's coefficient is not an absolute value. To use Pearson's coefficient I must already assume that there is a linear dependence. The coefficient only measures the strength of the assumed linear dependence. If I assume that there is a linear dependence between my variables - Pearson's r = 0.88 implies that this assumed linear dependence is strong (not perfect).
You also state that my data does not fit a line as the Spearman's coefficient is not an absolute value. To use Spearman's coefficient I only assume that there is a monotonic dependence. The coefficient only measures the strength of the assumed monotonic dependence. If I assume that there is a monotonic dependence between my variables - Spearman's r = 0.97 implies that this assumed monotonic dependence is very strong (not perfect).
Following the guidelines given in the article that I linked (comparing the skewness of my variables to the standard error) I believe that my data may be skewed (non-parametric). As such, there is good reason not to assume Gaussian statistics.
As I believe some of my data may be skewed (non-parametric), I have decided to employ the non-parametric Spearman rank correlation coefficient to measure the strength of the perceived/assumed monotonic dependence between my variables.
RE: Correlation and Significance
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers
RE: Correlation and Significance
I have identified similar dependencies between the geometrical properties of the device and my other variables of interest. The other variables of interest describe the impact of the device upon its surroundings and these are what I am primarily interested in. It is not possible to intuitively identify dependencies between the geometric properties of the device and these other variables. As such, a strong positive/negative monotonic correlation is not irrelevant and may eventually prove useful in optimising the performance of the device.
My problem was that, because I only have a small sample size, I could not tell whether my variables were parametric (normally-distributed) or non-parametric (not normally distributed) by inspection. I was asking advice on whether to assume that the data was parametric or not. As it turns out you can determine whether your data is parametric or not by calculating the skewness and kurtosis ratio as follows:
Skewness ratio = skewness / standard error of skewness
Kurtosis ratio = kurtosis / standard error of kurtosis
If the magnitude of both the skewness ratio and the kurtosis ratio is less than a value of 2, it is safe to assume that your data is parametric (normally distributed). Conversely, if the magnitude of both the skewness ratio and kurtosis ratio is greater than a value of 2, you should assume that your data is non-parametric (not normally distributed). Now you can decide whether or not you can use a parametric correlation coefficient or not.
With a small sample size this is still not really ideal and I will bear that in mind when interpreting any strong correlations but I think it is the best solution to my problem? Sorry again if I wasn't clearly communicating my problem. I really do appreciate you taking the time to respond.
Many thanks,
Dave
RE: Correlation and Significance
TTFN

FAQ731-376: Eng-Tips.com Forum Policies
Need help writing a question or understanding a reply? forum1529: Translation Assistance for Engineers