Phd Maths Question - Any Math Whizzes out There? [Resolved]

Hi Guys,

I am having some problems using a polynomial equation to estimate what y will equal at x.

When I fit a 3rd order polynomial regression line, I recieve the following equation:

y = 2E-06x^3 + .0007x^2 + .0984*196 - .424

Which as I understand when substituting 196 for x is the same as

y = 2x10^(-6)196^3+.0007196+.0984*196-.424

Based on the trendline I expect the datapoint at 196 to be approximately 4.26 to 4.28, however I get 7.03

The data is based on oxygen uptake values:

Time Oxygen Uptake
15 | 0.95
30 | 1.74
45 | 2.79
60 | 3.56
75 | 3.59
90 | 3.81
105 | 4.03
120 | 4.04
135 | 4.04
150 | 4.13
165 | 4.17
180 | 4.11
195 | 4.26

I'm not sure what I'm doing wrong and any help would be greatly appreciated!

Thank you!

Damon

[Resolved Update]

The equation I was using wasn't sensitive enough to enough decimal places.

Thank you for all your help!

Comments

  • +4

    Think you got the wrong site, buddy

    • Ozbargainers can do anything.

      • Solved, after 12 hours. Incredible.

        Thank you all!

  • +1

    What software are you using to get the equation? Is it Excel?

    If so you are correct that 2E-06 is 2*10^-6 = 0.000002:

    Computing (2 X 10^-6) 196^3+0.0007 X 196^2-0.424 gives 41.526272.

    However, I noticed there is some inconsistency between the two equations you have provided.. i.e. The lack of the x value? I doubt it would be a zero value if you were the fit polynomial function using excel.

    http://imgur.com/2PTWCqT

    Anyway, bit late, I just saw your post on the side of new forum topics… I'm off to sleep and won't be able to reply until tomorrow, but you'll probably figure it out.

  • not a math genius but I think your 7.03 is wrong, firstly, you have two different equations there:
    eq1: y = 2E-06x^3 + .0007x^2 - .424

    and

    eq 2: y = 2x10^(-6)196^3+.0007196+.0984196-.424

    If you notice, the second one has the additional element of 0.0984x which you don't have in your original polynomial fitting (I'm assuming you missed this when writing the equation here).

    Secondly, if you solve each equation for x=196, then the answers you get for y are follows.

    eq 1: y=41.52627

    eq 2: y=60.81267

    So I'm wondering where did you get the 7.03 from? is there a chance that you might have missed a subtraction and included that as an addition?

    • +1

      Well explained.

      I just realised I typed in 0.984 instead of 0.0984 in my example. ;-(

      Fatigue…

      • hehe yeah, I was going to comment on that but you've already figured it out, I'm pretty bummed out right now too, off to sleep, stuff for tomorrow may be ;)

    • PS: I notice that if you make the second sign - instead of +, you get 7.03 as your answer, so I guess you need to double check the signs first.

      eq 2: y = 2x10^(-6)196^3-.0007196+.0984196-.424 = 7.03

  • Neither of the equations provided are 3rd degree polynomial.
    Should be ax^3 + bx^2 + cx + d.

    I'm guessing it should be 0.000002(196)^3 - 0.0007(196)^2 + 0.0984(196) - 0.424 = 7.03

    • My bad, yeah it was late and that is the correct equation. I will update my original post to this equation.

  • Now that durd0008 has determined what we think is the equation. I think I can ignore the red herring bits now.

    "Based on the trendline I expect the datapoint at 196 to be approximately 4.26 to 4.28, however I get 7.03"

    Without seeing the rest of the data or knowing what exactly what level of precision you are looking for, it is difficult to understand if the polynomial has been a good fit or not.

    Things to check:

    Have you looked at a scatter plot of the data with the polynomial function plotted on top of it?
    Otherwise an estimate of ~7 when the actual data shows 4.26 is quite a good fit, considering at values of x above 400 or so the graphed function starts to shoot up really fast. At higher values on the function, there is probably going to be a larger difference than just 3 between "what the function estimates and what the actual data is". If so, then it's a pretty good fit.

    Are the large values of X important? There must be at least one Y value where X>400 which is forcing the software to fit a curve and give it that specific hockey stick/exponential feel. Check if this data point is an outlier or possibly even an error.

    Depending on what it is, you might need to censor the data. Say if the values of x above 500 are useless and/or somehow impossible in practical situations but your simulation machine in your lab is giving you results like this, then you probably want to censor the data and fit the polynominal again but make note of what you have done because of the limitations of the machine. It all depends on if and why you need a certain level of precision, if you do, otherwise it looks relatively okay.

    Other things you can do include calculating standard error and the R^2 and use those to justify whether the polynominal you fitted is a good fit or not. However, I always look at the data visually first.

    Also depends on the field you are in, is this a Maths/Stats paper, if so you'll need to go into more detail and use the more formulas to check the fit depending on how you have run the study. If engineering, then I would assume the standards are lower and more emphasis placed on the engineering aspects, in which case usually an R^2 is alright as a bare minimum.


    Did some scrolling through your ozbargain history and found sports science posts. With that in mind, you might have someone in your sample that is performing much better than the rest which is pulling up the polynomial (or the exact opposite, someone really slow if Y is time). In other words, think of the function as the guesstimated average of how others would perform based on what you actually collected. In which case you can fit two different polynomial functions one with those ironman-like people included and those without. i.e. Might be some "statistical clustering" happening. Look at the data, you might find there two or more different groups of participants that you can categorise people into (might be weight related?).

    Anyway, good luck.

    TLDR:
    In summary, the thing you are probably looking for is that the 3rd order polynomial that you fitted is just an estimate of the actual data. By fitting a curve, you are using the data from your study to make inferences about the target group (which you should have determined when you selected your participants). The data is not going to match perfectly and whether it is a good estimate is up to you to decide.

    • +1

      Wow thanks for your extensive response and your sleuthing is correct, I am indeed a sport science PhD student.

      The 3rd order polynomial is based on oxygen uptake data over obtained during an exercise test, which is sampled every 15 seconds. The participant however cycled for 196 seconds before fatigue and I would therefore like to use a polynomial curve (supervisor suggestion) to estimate what the oxygen uptake would likely be after 196 seconds. The data that I have is

      Time Oxygen Uptake
      15 | 0.95
      30 | 1.74
      45 | 2.79
      60 | 3.56
      75 | 3.59
      90 | 3.81
      105 | 4.03
      120 | 4.04
      135 | 4.04
      150 | 4.13
      165 | 4.17
      180 | 4.11
      195 | 4.26

      (Apologies for the table, I can't get spacing any good in ob)

      Considering that the new data point is only 1 second after the 195 second sampling time point I would expect it to be very close to the 195 value, and visual inspection of the curve supports this, therefore the value can't logically be 7.03.

      For this individual the impact of the extra second is small, however some participants have cycled for 205 seconds and therefore the extra 10 seconds is important in those cases, therefore the principles are important.

      Thank you for your help!

      Damon

  • You just forgot to carry the 1.

  • I think you just need to use more decimal points for the multipliers in the equation.

    i.e if I use the mulitpliers in your post I also get 7.03.
    If I get excel to report the multipliers in the polynomial fit to 10 decimal places (see below, perhaps 10 is a little excessive you could play around with that) I get 4.31 as the answer.

    1.6835016835E-06 x3 - 7.0863580864E-04 x2 + 9.8339660340E-2 x - 4.2160839161E-1

    when x=196 y= 4.31

    • So the graph displayed by excel wasn't sensitive enough! I've had that problem before, thank you for your help!

      Also, How did you calculate the regression line to that sensitivity?

  • For future reference, math.stackexchange.com is a great place to have in your bookmarks for questions like this..

    • Cheers. I am needing a resource like that. brilliant!

  • +1

    Your data looks logarithmic (increases rapidly at first, and then plateaus).
    Fitting a logarithmic regression line to it would be painful though.

    • the data will consistantly increase rapidly before so a logarithmic regression line is a possibility, thank you for the suggestion. I'll look into it and might take the idea to my supervisor.

  • +1

    Lol. Okay with the data now I can see nothing wrong with the fit or the data itself.

    I made a short video of what to do to get more decimal points, along with option to choose logarithmic or order 3 polynomial.

    http://sendvid.com/iku6cxjk

    I haven't used Excel for a while so you'll see me trying to figure out why it didn't change to the extended Trendline Options (click the 3 vertical bars icon).

    • sweet that's awesome! I really appreciate the affort.

    • Just watched the video, some impressive excelling!

Login or Join to leave a comment