Trendline problems 2

cactus13 · May 15, 2002

I have been bashing around with Excel again and have a peculiar situation, in my opinion I'm doing everything right but still Excel makes a difference. Welle here's what I want:

I have about 150 files with data. In these files there are about 50000 samples. In this data there are peaks, I want to detect the height of these peaks. Major problem is the trend that's underneath the data. I want this trend to be removed, so called baseline adjustment.

Just to try and solve the problem I did the following:

I copied the data to a new sheet, manually erased the peaks from the data and plotted the data. then I let Excel create a trend line through this dataset. A sixth order polynom gave me the best result.

I picked the formula and entered it in Excel because I want the values and not the description of the polynom so I can subtract the polynom results from the actual data leaving the peaks 'untouched'.
Using the same x-axis data the formula Excel produces very different results from what I expected when I calculated the values of the polynom. The graph does not in any way look like the trendline Excel suggested.

Excel gave me the following trendline:

y = -0.0023x⁶ + 0.1582x⁵ - 4.3364x⁴ + 61.02x³ - 463.93x² + 1806.9x - 3319.1

which I entered in the first cell like this and than copied all the way down... the cell as A1 here changes needly for evey new row so that's not the problem.

= -0.0023*A1^x6 + 0.1582*A1^5 - 4.3364*A1^4 + 61.02*A1^3 - 463.93*A1^2 + 1806.9*A1 - 3319.1

The first 1500 samples it looks OK, but than the difference between my line (based on the Excel trendline equation) and the Excel trendline explodes to infinity...

Can some one give me a suggestion what might be wrong? Excel giving the wrong formula or is this some kind of a bug in Excel??

Cactus

jproj · May 15, 2002

In my experience, excel trend lines (and the associated equation) is only a good estimate for the bulk (muddle) of the data. When you try to use the equation to solve for data points on either end of your data, the results are often very different than what your original data suggest. The reason is that the equation you are using. Excel uses the trend line to "best-fit" an equation to the majority of your data. Excel is not smart enough to recognize any trends at either end of your curve. The data at one end may be trending down, while the equation excel gave you is actually trending up.

There are a few ways I have been able to get around this problem. Sometimes it helps to use more decimal places (click on the equation and then on the "move decimal places right" button, or go into "format data labels" by right clicking on the equation and go to the number tab). Another thing I have sometimes had to do is to break the data into pieces and fitting a trend line to each data set. It's not very convenient, but it does help.

If you need any help, or if I misinterpreted your question, let me know.

Good Luck!

jproj

jackzhong · May 15, 2002

When you use very high order polynomial trendline, you should be careful about all the constants

You should set the decimal places to a very large number by formatting the data label of the equation.

It looks that 6 significant digits such as "-0.0023" is not enough, you need more significant digits like "-0.0023******..", the more the better.

Losing significant digits should be paid attention when using higher order polynomial for trendline.

JZ

pmover · May 15, 2002

Cactus13,
An alternative to copying the trendline formula (coefficients) is to use the "linest" function and "ARRAY ENTER" the formula. ARRAY ENTER means to press ctrl+shft+enter after completing typing the equation. Also, the trick is to define the equation type in the linest function. pay careful attention to syntax used in procedures. also, i'm writing this from memory, so if an error occurs, please advise and i'll check my notes.
procedures are:
1) below the data, select (highlight) 4 rows and n+1 columns, where n = polynominial degree. since you stated that you used six (6), hightlight 7 columns (4 rows x 7 columns).
2) type: =linest(known-y's,known-x's^{1,2,3,4,5,6},,true) and then press ctrl+shft+enter. NOTE: the known-y's are all y-values to be curve fitted and known-x's are all x-values to be curve fitted. The numberof values for each x and y data must identical (cannot have 40 x-values and 45 y-values).
3) the first row contains the trendline coefficients. row 3, column 1 contains the R-squared value (1 is a perfect fit).
4) instead of using the coefficients from the chart, uses the coefficients in the first row.
i use this technique myself with satisfactory results. saves significant time in doing several computations for many data sets.
summarizing:
1) ARRAY ENTER the linest function using the "special technique" described in step 2.
2) use the coefficients in row 1 of results to check analysis. i believe the coefficients are listed (left to right) as A6, A5, A4, A3, A2, A1, A0 for the six-degree polynominial.
It might be best to test these procedures on a known data set, that way you can be satisfied with results and confirm procedure. good luck.
-pmover

GregLocock · May 15, 2002

Also, if you just want to remove the "DC" component of the signal then you should just subtract the average of the samples, or if you want to remove a rising baseline you should use a linear fit.

Going to a higher order polynomial than is necessary is at best inefficient, and at worst will give the wrong answer eg suppose you have a staright line going to the origin at one end.

To remove the DC one would subtract the average level, ie half the maximum in this case.

If you were to do an order 1 fit, ie y=mx+c, and then used the c value, it would be 0, and have no effect when subtracted from the data.

Is that clear? or do we need a whiteboard?

Cheers

Greg Locock

MortenA · May 15, 2002

You could also make the fit your self using the analysis toolpack

This includes "regression". If you make a collum with you X values and nex to these square them and next ^3 and so on you can input these and let excel do the regression. You also get the actual statistical information so that you can evaluate the correctness of your fit. I think a 5' order polun. is very high!

Best Regards

Morten

Kiranpatel · May 15, 2002

I had a similar problem and, for me it was for the reasons that "jackzhong" stated above. If you notice, the constants in the polynomials change sign, C6 is -ve, C5 is positive, etc. As a result, you end up having subtraction of two large number to obtain a relatively small number (Always prone to large error). I solved my problem by changing the format of the trendline calculated by Excel to display it as SCIENTIFIC instead of FIXED format and also increasing the number of digits after the decimal point to 8. You could try to do the same : get Excel to derive the trendline, right click on it, select Format>Number>Scientific, etc.

cactus13 · May 16, 2002

First of all, thank you very much for your quick repsonses... adding more digits was indeed the solution for my problem. Learned something new, because I had no idea you could change the number of digits for a trendline.

A special thanks I would like to go out to pmover for his neat solution for calculating the trendline, this makes it possible for me to automate the whole process of analysing 150 files of more than 50000 samples each.

A small remark on the posts by jproj and GregLocock. First jproj indeed the ricks you suggested are possible but makes it more work and for 150 files like above mentioned I don't like that very much.
GregLocock ofcourse you could use a linear fit like you suggested, but I did not choose a 6th order polynom because it is not neccesary, it is the best fit possible for my data.

Again, thank you for the inputs!

cactus13 · May 16, 2002

pmover,

Your solution sounded very nice, but so far it's not really working. Maybe you can help me solving the problem.

Like you suggested i started with a test using a few simple values (y=x²) like below:

[tt]
_|__A__B___C__D_
1| x y x x²
2| -3 9 -3 9
3| -2 4 -2 4
4| -1 1 -1 1
5| 0 0 0 0
6| 1 1 1 1
7| 2 4 2 4
8| 3 9 3 9
[/tt]

I added the columns for the x and x² next to it (not really neccesary in this case because of the repaeating of values).

In the cells underneath the data I select 4 rows and n+1 columns, so that will be 4 rows and 3 columns considering this simple second order polynom and added the following code using the array add method you described:

=LINEST(B2:B8, D2:E8, , true)

Because I use a dutch Excel at my office I changed LINEST into which I think is the dutch equivalent called TREND (I don't know for sure if this is right, but the help suggested this one).

The problem is I get a #NAME! error afterwards. Am I using the wrong dutch equivalent perhaps or is there something else I do wrong?

jproj · May 17, 2002

cactus13:

Sorry if my first post was confusing. What I meant by "move the decimal place to the right" was to show more significant digits (as jackzhong pointed out).

For better understanding of pmover's post, I'll use a third order polynomial: y = 3x^3 - 5x^2 + 2

The LINEST function has output in the form of A1, A2, ... An-1, An, A0 for an equation of y = AnX^n + An-1X^n-1 + .... + A1X + A0

Lets say you had the following data (y values from the above equation):

_|__A____B__
1| x y
2| -3 -124
3| -2 -42
4| -1 -6
5| 0 2
6| 1 0
7| 2 6
8| 3 38

You would want to highlight cells A10

10 then type =LINEST(B2:B8,A2:A8^{3,2,1},TRUE) then press Ctrl+Shift+Enter. This will yield the following:

__|__A___B___C___D___
10| 0 -5 3 2

Which corresponds to an equation of y=3X^2-5X^2+0X+2, which is what we started with. If you wish to have all the statistical information as well, you will need to highlight cells A10

16 and type =LINEST(B2:B8,A2:A8^{3,2,1},TRUE,TRUE) followed by Ctrl+Shift+Enter.

Test it out against the trendline equation given to you in in the chart. The results should be the same.

Good Luck!

jproj

cactus13 · May 18, 2002

Thanks, just noticed the problem was in the difference between the dutch and the englisch version. I did not convert eveything...

The english version uses the following statement:

=LINEST(B2:B8,A2:A8^{3,2,1},TRUE,TRUE)

The dutch uses:

=LIJNSCH(B2:B8;A2:A8^{3;2;1};WAAR;WAAR)

I simply forgot to change the colon to a semicolon (list seperator) in the {3,2,1} part that's why everything didn't work, must be due to the stress ;-).

Thanks now everything will work I guess!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Trendline problems 2

cactus13

Automotive

jproj

Chemical

jackzhong

Structural

pmover

Mechanical

GregLocock

Automotive

MortenA

Chemical

Kiranpatel

Petroleum

cactus13

Automotive

cactus13

Automotive

jproj

Chemical

cactus13

Automotive

Similar threads

Part and Inventory Search

Sponsor