Trendline problems
Trendline problems
(OP)
I have been bashing around with Excel again and have a peculiar situation, in my opinion I'm doing everything right but still Excel makes a difference. Welle here's what I want:
I have about 150 files with data. In these files there are about 50000 samples. In this data there are peaks, I want to detect the height of these peaks. Major problem is the trend that's underneath the data. I want this trend to be removed, so called baseline adjustment.
Just to try and solve the problem I did the following:
I copied the data to a new sheet, manually erased the peaks from the data and plotted the data. then I let Excel create a trend line through this dataset. A sixth order polynom gave me the best result.
I picked the formula and entered it in Excel because I want the values and not the description of the polynom so I can subtract the polynom results from the actual data leaving the peaks 'untouched'.
Using the same x-axis data the formula Excel produces very different results from what I expected when I calculated the values of the polynom. The graph does not in any way look like the trendline Excel suggested.
Excel gave me the following trendline:
y = -0.0023x6 + 0.1582x5 - 4.3364x4 + 61.02x3 - 463.93x2 + 1806.9x - 3319.1
which I entered in the first cell like this and than copied all the way down... the cell as A1 here changes needly for evey new row so that's not the problem.
= -0.0023*A1^x6 + 0.1582*A1^5 - 4.3364*A1^4 + 61.02*A1^3 - 463.93*A1^2 + 1806.9*A1 - 3319.1
The first 1500 samples it looks OK, but than the difference between my line (based on the Excel trendline equation) and the Excel trendline explodes to infinity...
Can some one give me a suggestion what might be wrong? Excel giving the wrong formula or is this some kind of a bug in Excel??
Cactus
I have about 150 files with data. In these files there are about 50000 samples. In this data there are peaks, I want to detect the height of these peaks. Major problem is the trend that's underneath the data. I want this trend to be removed, so called baseline adjustment.
Just to try and solve the problem I did the following:
I copied the data to a new sheet, manually erased the peaks from the data and plotted the data. then I let Excel create a trend line through this dataset. A sixth order polynom gave me the best result.
I picked the formula and entered it in Excel because I want the values and not the description of the polynom so I can subtract the polynom results from the actual data leaving the peaks 'untouched'.
Using the same x-axis data the formula Excel produces very different results from what I expected when I calculated the values of the polynom. The graph does not in any way look like the trendline Excel suggested.
Excel gave me the following trendline:
y = -0.0023x6 + 0.1582x5 - 4.3364x4 + 61.02x3 - 463.93x2 + 1806.9x - 3319.1
which I entered in the first cell like this and than copied all the way down... the cell as A1 here changes needly for evey new row so that's not the problem.
= -0.0023*A1^x6 + 0.1582*A1^5 - 4.3364*A1^4 + 61.02*A1^3 - 463.93*A1^2 + 1806.9*A1 - 3319.1
The first 1500 samples it looks OK, but than the difference between my line (based on the Excel trendline equation) and the Excel trendline explodes to infinity...
Can some one give me a suggestion what might be wrong? Excel giving the wrong formula or is this some kind of a bug in Excel??
Cactus
RE: Trendline problems
There are a few ways I have been able to get around this problem. Sometimes it helps to use more decimal places (click on the equation and then on the "move decimal places right" button, or go into "format data labels" by right clicking on the equation and go to the number tab). Another thing I have sometimes had to do is to break the data into pieces and fitting a trend line to each data set. It's not very convenient, but it does help.
If you need any help, or if I misinterpreted your question, let me know.
Good Luck!
jproj
RE: Trendline problems
You should set the decimal places to a very large number by formatting the data label of the equation.
It looks that 6 significant digits such as "-0.0023" is not enough, you need more significant digits like "-0.0023******..", the more the better.
Losing significant digits should be paid attention when using higher order polynomial for trendline.
JZ
RE: Trendline problems
An alternative to copying the trendline formula (coefficients) is to use the "linest" function and "ARRAY ENTER" the formula. ARRAY ENTER means to press ctrl+shft+enter after completing typing the equation. Also, the trick is to define the equation type in the linest function. pay careful attention to syntax used in procedures. also, i'm writing this from memory, so if an error occurs, please advise and i'll check my notes.
procedures are:
1) below the data, select (highlight) 4 rows and n+1 columns, where n = polynominial degree. since you stated that you used six (6), hightlight 7 columns (4 rows x 7 columns).
2) type: =linest(known-y's,known-x's^{1,2,3,4,5,6},,true) and then press ctrl+shft+enter. NOTE: the known-y's are all y-values to be curve fitted and known-x's are all x-values to be curve fitted. The numberof values for each x and y data must identical (cannot have 40 x-values and 45 y-values).
3) the first row contains the trendline coefficients. row 3, column 1 contains the R-squared value (1 is a perfect fit).
4) instead of using the coefficients from the chart, uses the coefficients in the first row.
i use this technique myself with satisfactory results. saves significant time in doing several computations for many data sets.
summarizing:
1) ARRAY ENTER the linest function using the "special technique" described in step 2.
2) use the coefficients in row 1 of results to check analysis. i believe the coefficients are listed (left to right) as A6, A5, A4, A3, A2, A1, A0 for the six-degree polynominial.
It might be best to test these procedures on a known data set, that way you can be satisfied with results and confirm procedure. good luck.
-pmover
RE: Trendline problems
Going to a higher order polynomial than is necessary is at best inefficient, and at worst will give the wrong answer eg suppose you have a staright line going to the origin at one end.
To remove the DC one would subtract the average level, ie half the maximum in this case.
If you were to do an order 1 fit, ie y=mx+c, and then used the c value, it would be 0, and have no effect when subtracted from the data.
Is that clear? or do we need a whiteboard?
Cheers
Greg Locock
RE: Trendline problems
This includes "regression". If you make a collum with you X values and nex to these square them and next ^3 and so on you can input these and let excel do the regression. You also get the actual statistical information so that you can evaluate the correctness of your fit. I think a 5' order polun. is very high!
Best Regards
Morten
RE: Trendline problems
RE: Trendline problems
A special thanks I would like to go out to pmover for his neat solution for calculating the trendline, this makes it possible for me to automate the whole process of analysing 150 files of more than 50000 samples each.
A small remark on the posts by jproj and GregLocock. First jproj indeed the ricks you suggested are possible but makes it more work and for 150 files like above mentioned I don't like that very much.
GregLocock ofcourse you could use a linear fit like you suggested, but I did not choose a 6th order polynom because it is not neccesary, it is the best fit possible for my data.
Again, thank you for the inputs!
RE: Trendline problems
Your solution sounded very nice, but so far it's not really working. Maybe you can help me solving the problem.
Like you suggested i started with a test using a few simple values (y=x2) like below:
_|__A__B___C__D_
1| x y x x2
2| -3 9 -3 9
3| -2 4 -2 4
4| -1 1 -1 1
5| 0 0 0 0
6| 1 1 1 1
7| 2 4 2 4
8| 3 9 3 9
I added the columns for the x and x2 next to it (not really neccesary in this case because of the repaeating of values).
In the cells underneath the data I select 4 rows and n+1 columns, so that will be 4 rows and 3 columns considering this simple second order polynom and added the following code using the array add method you described:
=LINEST(B2:B8, D2:E8, , true)
Because I use a dutch Excel at my office I changed LINEST into which I think is the dutch equivalent called TREND (I don't know for sure if this is right, but the help suggested this one).
The problem is I get a #NAME! error afterwards. Am I using the wrong dutch equivalent perhaps or is there something else I do wrong?
RE: Trendline problems
Sorry if my first post was confusing. What I meant by "move the decimal place to the right" was to show more significant digits (as jackzhong pointed out).
For better understanding of pmover's post, I'll use a third order polynomial: y = 3x^3 - 5x^2 + 2
The LINEST function has output in the form of A1, A2, ... An-1, An, A0 for an equation of y = AnX^n + An-1X^n-1 + .... + A1X + A0
Lets say you had the following data (y values from the above equation):
_|__A____B__
1| x y
2| -3 -124
3| -2 -42
4| -1 -6
5| 0 2
6| 1 0
7| 2 6
8| 3 38
You would want to highlight cells A10:D10 then type =LINEST(B2:B8,A2:A8^{3,2,1},TRUE) then press Ctrl+Shift+Enter. This will yield the following:
__|__A___B___C___D___
10| 0 -5 3 2
Which corresponds to an equation of y=3X^2-5X^2+0X+2, which is what we started with. If you wish to have all the statistical information as well, you will need to highlight cells A10:D16 and type =LINEST(B2:B8,A2:A8^{3,2,1},TRUE,TRUE) followed by Ctrl+Shift+Enter.
Test it out against the trendline equation given to you in in the chart. The results should be the same.
Good Luck!
jproj
RE: Trendline problems
The english version uses the following statement:
=LINEST(B2:B8,A2:A8^{3,2,1},TRUE,TRUE)
The dutch uses:
=LIJNSCH(B2:B8;A2:A8^{3;2;1};WAAR;WAAR)
I simply forgot to change the colon to a semicolon (list seperator) in the {3,2,1} part that's why everything didn't work, must be due to the stress
Thanks now everything will work I guess!