Linear regression analysis for y = ax +b
Linear regression analysis for y = ax +b
(OP)
I need to know if I can use the linear regression analysis function in Excel with relevanct to comparing two values which although are not related, should be equally paired.
More specifically in-situ pH determined valueas are being compared to lab determined values from sample analysis. These values should be the same (x=y), however haveing plotted a scatter graph in excel of one set of values verses their paired other value, and fitting a line of best fit (through least squares). Is this applicable when looking for a bias and also when comparing the R-squared values of the line of best fit when removing out-lier points.
Thanks all,
Dan ;)
More specifically in-situ pH determined valueas are being compared to lab determined values from sample analysis. These values should be the same (x=y), however haveing plotted a scatter graph in excel of one set of values verses their paired other value, and fitting a line of best fit (through least squares). Is this applicable when looking for a bias and also when comparing the R-squared values of the line of best fit when removing out-lier points.
Thanks all,
Dan ;)





RE: Linear regression analysis for y = ax +b
I have seen a method for ignoring "rogue" points from your data but I'd be suspicious of it unless there were some reason in the experiment for excluding them.
RE: Linear regression analysis for y = ax +b
On the first point, Excel's built-in regression function will not directly give your the confidence bands around your regression line, but it will give you the various statistical measures that you will need to calculate it. (The method is described in most introductory statistical texts.)
One his second point, the one concerning outliers, he is absolutely correct. You should not even start to consider any statistical testing for outliers unless you have strong a priori reasons for suspecting that your data might contain them. Otherwise, based on a whim, you will end up distorting your data.
If you do have genuine reasons, then there are many papers around that will give you a method for detecting an outlier in a single sample. (For example Dixon's Test as described in the Statistics text book by Sokal & Rohlf, or Grubbs's method whose reference I could find if pressed.) But you have two samples, supposedly tightly correlated. An extremely high value of y might look odd if examined only in the context of the set of y-observations, but it might not be odd at all if it was paired with a high value of x. The variable that you test for outliers will have to be some sort of composite variable, probably the deviation of the y variable from the calculated regression line.
But even then you are not out of the woods, because the variance of this variable is not constant. (Horrible sentence, I know, but I cannot think of a better way to express it.) Technically, the variable you are testing is not homoskedastic, and this invalidates the methodology of the test. One way around this is to "normalise" the variable before you test it, by dividing each value by the variance of its expected value. (This "expected value" is what comes out of the original regression, and the formula for its variance should be given in whatever text book you use to establish the confidence bands around your regression line as discussed in my second paragraph.)
HTH
RE: Linear regression analysis for y = ax +b
Can I use the equation y = ax + b (or y = mx + c, for engineers) with regards to comparing the in-situ pH and the lab determined pH, as they aren't really related to each other by y = mx + c.
The intercept was set at zero (origin).
The line of best fit fell pretty much on gradient = 1. However I wish to use the gradient and R-squared value to illustrate errors between the two, but the points were scattered both sides, so no real bias.
Dan
RE: Linear regression analysis for y = ax +b
RE: Linear regression analysis for y = ax +b
RE: Linear regression analysis for y = ax +b
If Dandalf wants to be able to predict his in-situ pH from his lab-measured pH, then the former is Y and the latter is X. If all he wants to do is see how well the measurement methods compare, then he is not positting a functional relationship, and so perhaps he would be better calculating a correlation coefficient rather than establishing a regression relationship.
Almost as a passing commment, Dandalf states that he forced his intercept to be zero. When I was framing my first post I thought about that issue. If he is trying to establish a functional relationship between the two measurements, then he should NOT force the intercept to be zero. We all know that it should be zero in an ideal world, but that is not the point: he is trying (say) to establish a systematic "error" in the use of the in-situ test as a substitute for a full lab test, and there is absolutely no reason to assume that this error will be so considerate as to have a zero intercept. (If you consider that mandating a zero intercept is acceptable because the ideal relationship would have a zero intercept, then you would have to accept also that a slope of exactly unity should be mandated for similar reasons. Then you would have totally castrated your regression.)
RE: Linear regression analysis for y = ax +b
The correct method for comparing two sets of values that should be the same on a point-by-point basis is with a paired T-test. This can be accomplished in Excel through Tools-->Data Analysis. Comparing the two columns on an X-Y plot is very useful for flagging outliers, and for developing a calibration curve once you have proven that the values are not statistically different.
Regards,
Joe