Polynomials : do not accept them !

A great many Agricultural ‘response’ experiments, whether they are responses to fertilisers or to seeding rates, have perhaps 6 treatments at most; the trial area is gobbled up by the (generally 3) replicates of each treatment.

The few treatments and the implicit statistical errors (noise) very often generate a pretty murky picture.  To clear things up a bit smooth curves are often put through the data points.

Sadly and all too often the curves found in spreadsheet programs are used   No one can blame field workers ; everyone is in a hurry these days and everyone knows how to use a spreadsheet and not many people have either the time,training or finances to put their results into a curve fitting program.

The problem:  most crops respond to increases in quantity of supply in a diminishing way ; The first amount produces half the full response ,the next amount a quarter of the response , the next an eight an so on and so on. Often known as a ‘Law of Diminishing returns’ . Unfortunately the ‘trend line’ functions given to us in spreadsheets cannot easily replicate this.

The most commonly fitted spreadsheet trend line is the ‘polynomial order 2’  This curve is the trajectory of canon balls flying through a vacuum (a parabola)- it is symmetrical and always wishes to return to the ground IE zero yield.

In the graph below the red line is a (perfect!) Law of Diminishing returns crop response (red points and curve)  fitted by  a ‘polynomial order two’ IE a Parabola (the black curve):

Parabola fit to simple exponential

The actual data suggests that the crop does not respond in any meaningful way to more than 200 units of inputs , yet the polynomial suggests that  ~325 units are needed. ie 50% of the expensive inputs are wasted 😦

The ‘Law Of Diminishing Returns’ (LODR) was elucidated over 100 years ago and today we realise that often an additional (linearly) increasing or decreasing amount has to be added to to it to match the crop responses that experimenters see.

The example below shows a crop that perhaps is fallen flat when the seeding rate gets too thick –  the data follows the LODR but has an amount removed from its yield that is proportional to the seeding rate:declining lin exp

In this case the polynomial order two overestimates the seeding requirement for maximum yield by a factor of about two ( data; max yield at ~125 seeds. Polynomial fit max yield at ~250 seeds). The symmetrical canon-ball trajectory

In other cases the yield can keep increasing with increasing seeding rate or nutrient supply , perhaps as a result of weeds or a lack of foresight on the part of the experimenter:

increasing exp lin

In this case the  ‘polynomial order 2’ underestimates the required seeding rate.

I hope that you can see from these examples that the Polynomial fit gives highly misleading results. Sadly it is all to easy to use 😦

What you may also notice is that the polynomial’s maximum hardly changes in any of these three very contrasting crops !!

NB:  higher order polynomials are sometimes used but with sparse and noisy agricultural data these often behave erratically and lead to obviously and visually objectionable  fits.

Posted in Uncategorized | Leave a comment

Reversion to the mean – it IS magic

I take a cube and a set of identical rulers into a classroom of 5 year olds. I ask each child to measure the cube and write down each of the measurements, in turn.

I then take the rulers into the corridor and sprinkle them with Fairy Dust, in front of all the children.

I then go into the classroom, on the opposite side of the corridor, and ask those 5 year olds to measure the cube again record these measurements.

What has the Fairy Dust done?

-Quite a lot, as it turns out!   -Here are the changes in measurement for each ruler, plotted against the initial size measurements:

Effect of Fairy dust application

An apparently highly significant and dramatic effect from the application of Fairy Dust 🙂

OK: So we don’t believe in Fairy Dust and I will tell you that the measurements were generated by Excel (as simply random errors added to the 6 inch cube). So what has happened?

fairy dust scatter with text

Look at the little inset scatter plot in the top right: Nothing going on, just as suspected. Now look at the larger scatter plot: The most extremely erroneous (and unlikely) large first results (say around 7.5 inches)  tend not to be so ‘jammy’ second time around and so tend to be smaller (in fact they tend to be average!).  The same thing happens to the most extreme, small, first measurements too.

This is a simple example; one fixed sized cube and you didn’t believe in fairies anyway. But in the real world, outside the office, it is a lot more complicated and quite often WE DO want to believe in stuff and perhaps the changes fit in with our preconceptions.

So where might this occur in practice – where do we have noisy repeated measurements? A possible ‘high risk’ case is soil sampling: you go out one year and take soil samples in your fields you find that you have low and high indices, you apply your fertiliser using the latest tech. You come back a few years later and measure the indices in roughly the same areas again. If you then sort the results by the initial soil Indices, guess what ? -The very high indices have got lower and the lowest indices have got higher.

Your expensive GPS Variable Rate Technology worked (NOT).

And just in case you thought that ‘Statistics’ would automatically sort this out, here are the stats for the effect the fairy dust data:

fairy dust scatter

p=0.0001  !    This analysis, by the way, is generated by ‘PAST’, a very nice standalone stats package, which is free. http://folk.uio.no/ohammer/past/

So: Do not be fooled by plots of change, when the X axis has been sorted.

For a real-world example of the trap have a look at these charts:


If you want to know more; search for “Regression to the mean”  rather than ‘reversion to the mean’ as this is the commonest name for the problem.

Posted in Uncategorized | Leave a comment