Standard Deviation question
Standard Deviation question
(OP)
I have two plots with a fairly scattered data points. I created a trend line through both plots to obtained a general trend line. I would also like to create another set of trendlines for +1 standard deviation, and -1 standard deviation, like you see in a lot of published sources. How can I do it on excel?
Attached is one of the plots I am talking about.
htt p://files. engineerin g.com/getf ile.aspx?f older=8332 0669-25ce- 40f6-a61f- 295fcd4254 8c&fil e=Fig3.JPG
Attached is one of the plots I am talking about.
htt





RE: Standard Deviation question
RE: Standard Deviation question
Create your x-y scatter plot. Right click on the series and select "format data series". Then click on the "Y error bars" tab. Then you have several options for seting up the bar. One is a fixed absolute, one is a fixed percentage, and one is custom which allows you to select a colulmn of your spreadsheet containing the standard deviation amount for each x point (needed if it's different at each point).
=====================================
Eng-tips forums: The best place on the web for engineering discussions.
RE: Standard Deviation question
The curved lines can represent either:
(1) the bounds within which there is a 95% (say) probability that the EXPECTED value of Y will lie for a given value of X; or
(2) the bounds within which there is a 95% (say) probability that an INDIVIDUAL value of Y will lie for a given value of X.
Obviously the latter bounds lie further from the trendline than the former.
I do not believe Excel directly offers this sort of plot, or even the underlying calculations to produce the plot. However the necessary formulae can be found in any good Stats reference book, and are easily implemented.
RE: Standard Deviation question
Excel linest function does return some statistical parameters like the standard deviation of the estimate for the slope and standard deviation of the estimate for the offset.
Let me try a simplistic stab at how we would proceed to use those to build confidenc eintervals.
Let's say we are fitting y = mx+b based on a set {xi,yi}.
Assume the independent variables xi are known exactly and there is an error in the dependent variables with a normal distribution.
The linest function will return estimates of m and b (call them mhat and bhat) along with their standard deviations (call them sm and sb).
We compute our estimate of y (the line) by yhat=mhat x + bhat
What is the standard deviation of yhat (call it sy) ?
I think it is sy = x sm + sb
Knowing the standard deviation, I think we can set up the confidence interval under an assumption of normal distribution. 95% confidence interval is two standard deviations on either side of the line. 99% confidence interval is three standard deviations on either side of the line. The bounds would look like a cone which is tightest around the line where x=0 and gets wider as x increases.
No guarantees I haven't missed something. But it sounds to me like it would get you pretty close if those assumptions are correct. If I remember right, maybe student's t needs to be used instead of a normal distribution since we are using estimated standard deviations rather than known standard deviations. And of course if you have error in your x values, that would make it more complicated as well.
=====================================
Eng-tips forums: The best place on the web for engineering discussions.
RE: Standard Deviation question
Be that as it may, you've made a good start. But I can see three immediate points to be made.
(1) The variable "b" has nothing to do with anything, and is merely a consequence of someone's arbitrary choice of an axis system. You'll get further if you express the regression equation in the form
(Y-Ybar) = m*(X-Xbar)
(2) You have added the standard deviations. If variables are independent, you add variances, not standard deviations. Thus your equation
sy = x sm + sb
if it were relevant (which it isn't) would be
V(y) = x V(m) + V(b)
(3) Your comment about possible inaccuracies in the measurement of the X values opens a whole new can, even a whole new barrel, of worms. It is a fundamental requirement of the so-called "simple linear regression model" that the explanatory variable (stats-speak for "X") is non-stochastic (stats-speak for "known exactly", or at least known to a higher order of accuracy than Y).
RE: Standard Deviation question
Have a total of n data points.
Model is Y = a + b*X + e where e is the error, and the error is assumed to be distributed normally with mean of zero and variance of s^2.
Let E() represent expected value
Let V() represent variance
Let S() represent summation over the n points
Let Xb represent S(X)/n and Yb represent S(Y)/n
Then
E(b) = [n*S(XY)-S(X)*S(Y)]/[n*S(X^2)-(S(X))^2]
E(a) = Yb - E(b)*Xb
both of which are calculated by Excel.
Also
E(s^2) = [S((Y-Yb)^2)-E(b)*S((X-Xb)*(Y-Yb))]/(n-2)
V(b) = E(s^2)/S((X-Xb)^2)
V(a) = E(s^2)*[1/n + (Xb)^2/S((X-Xb)^2)]
Finally the results we are after
E(Y) = E(a) + X*E(b)
V(E(Y)) = E(s^2)*[1/n + (X-Xb)^2/S((X-Xb)^2)]
V(Y) = V(E(Y)) + E(s^2)
Confidence intervals follow, using Student's-T with (n-2) degrees of freedom.
(These formulae are correct on the page in front of me. At the end of a hard day, they might not be correct as I have typed them. No responsibility accepted. Check with a text book.)
RE: Standard Deviation question
Cheers
Greg Locock
Please see FAQ731-376: Eng-Tips.com Forum Policies for tips on how to make the best use of Eng-Tips.
RE: Standard Deviation question
Julian's looks much more rigorous and correct. If I read it right, it tells us that if we draw lines at two standard deviations out... they will be parallel to the original line, Right? I remember somewhere way back in my stats class a result where the confidence interval was tightest in the middle of the range of x and wider towards the ends which I think is what Greg suggested. That must have been a different type of analysis than a simple fit of a line.
=====================================
Eng-tips forums: The best place on the web for engineering discussions.
RE: Standard Deviation question
Formulae are different for CI's and CL's, CL's (percentiles)are parallel to the regression lines, CL's are tracts of hyperbolic curves (the rationale being that, as Greg says,the uncertainty is materialize by rotating the regression lien about its mean value, so taht teh freedom is greater near the endpoints.
Should have a file built a while back, see if I can retrieve it and post it
Merry Xmas!!
RE: Standard Deviation question
CL = what?
=====================================
Eng-tips forums: The best place on the web for engineering discussions.
RE: Standard Deviation question
Sorry have been relaxing during the holydays, I'm going to look for that file, and eventual references, tonight!
RE: Standard Deviation question
h
RE: Standard Deviation question
http
I am only a little bit familiar with the particulars of the derivation--this site says that "Linear regression assumes that scatter of points around the best-fit line has the same standard deviation all along the curve...." Then uses some term I have not heard before: homoscedasticity.
To calculate the standard deviation for the entire curve, calculate something called the standard deviation of the residuals, sxy:
sxy=Sqrt(SSeq/(N-2)), N is number of pairs. SSeq is the sum of the squares of the vertical distance of the points relative to the line y=mx+b (the result of your curve fit).
This website is put together by a company that sells software to do this curve fitting. I don't know the software personally. I would be curious though why this software doesn't seem to have much interest in 'sxy' to get an idea of error in the curve fit, but seems more interested in confidence and/or prediction intervals. All these quantities are useful of course.
RE: Standard Deviation question
http
RE: Standard Deviation question
Found at last the spreadsheet, it is on another offline PC, it's going to take some time to transfer it...
RE: Standard Deviation question
please find attached the .xls file: conf_bands,
where signals from 5 ground penetrometers are analyzed.
Data are taken along a depth interval from 5 different verticals in the same location.
The data display a trend. So I wanted to build, for inference purposes, a linear approximation of the trend of ground resistance (Qc) versus depth for the whole location, with its CI's and CL's.
Please note that the predictive value of the confidence levels (percentiles) tends to be poor: the 5% line, for example, rules out or nearly out all data, not just 95% of them.
This because data are clustered in particular regions of the spae of variables.
I believe a polynomial regression would be far more efficient.
Only, I could not find anything about confidence intervals and levels of a polynimial curve approximation.
Do you have any clues??
P.S.: I usually work on a black screen background, so maybe you'll need to optimize the graph colours in the attached file.
RE: Standard Deviation question
your link on regression is pretty good. It turns out you can download from the graphpad site their 351-pages manual on linear and nonlinear regression:
http://
A lot of material on nonlinear regression and tests on curve fitting. I browsed it. Needs to be studied. Very interesting stuff. It remains to be seen its ease of applicability!!