http://qs321.pair.com?node_id=1159826


in reply to Re^4: How would you code this?
in thread How would you code this?

<Insert obligatory critique of cubic spline interpretation here />

Are you confident that the badly behaved regions of the dataset are bad discrete measurements as opposed to something that should be treated with a noise model? Do you have a physical understanding of what happens when the sensor generates an aberrant series?


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^6: How would you code this?
by BrowserUk (Patriarch) on Apr 07, 2016 at 18:01 UTC
    Do you have a physical understanding of what happens when the sensor generates an aberrant series?

    Okay. This is my understanding; which may not be perfect, but isn't too far wrong.

    The curves are B-H magnetisation curves produced by Remagraph C hardware. These are renowned as the best available -- they essentially set the standards by which this is done.

    The X values are the controlled variable; the Y-values the dependent. Both are digitised quanta of continuously variable analogue properties -- magnetic field strength and magnetic flux density. The measured values are magnetic flux density, induced within the samples under test, as extrapolated from those sensed by external sensing coils.

    As the magnetic samples, the inducing & sensing coils, have both reluctance (thus hysteresis) and are subject to eddy currents under the influence of changing magnetic fields; the software that performs the test cycle has a feedback loop and 'goal seeking' criteria to both vary the speed of sampling, and to actively adjust (step back) the input field values in order to detect when and where the induced flux density stabilises, so as to eliminate the transient affects causes by the changing field.

    The exact algorithm used by the software to produce the graphs is not publicly documented -- presumably it is valuable proprietary IP -- but the equipment isn't just good; it's the best.

    Perhaps the most important thing I can say at this point is that these discontinuities are tiny aberrations within the overall dataset. To try and explain how tiny I've uploaded this image that shows that you need to zoom in (twice) a long way to see just how tiny they are in relation to the overall B-H hysteresis loop.

    Thus, throwing them away has little on no effect upon the integrity of the overall information the datasets contain. It just has to be done; and I was looking for a clean way to do it.

    (In the end the code is probably going to be rewritten in something akin to matlab, so avoiding Perlish niceties is one goal of my question here.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Most informative, particularly the image. I'd definitely chalk this up as artifacts of the control software -- station keeping is noisy. I could ask some colleagues who would likely have some pretty wonky comments about the system, but I don't think deeper physics would really impact the signal processing (deeper physics available upon request).

      The problem statement as I now interpret it is generate a strictly monotonic series of X values while eliminating the smallest number of points. Agree given the presented dataset that dropping points is more appropriate than a smoothing. The controller overshot, overcompensated, and ultimately returned to the correct number. I would read this as wanting to drop the whole vee, and if graff's observation is correct, keep the top corner point and drop the lower corner point. I would implement this as

      #! perl -slw use strict; use 5.10.0; printf "%s", scalar <DATA>; my @points = map[ split ' ', $_ ], <DATA>; my @modified; push @modified, shift @points; # Kickstart while ( @points ) { my $cmp = ($points[0][0] <=> $modified[-1][0])*($points[0][1] > $m +odified[-1][1]); if ($cmp == 1) { push @modified, shift @points; } elsif ($cmp == -1) { shift @points; } else { pop @modified; } }
      which underperforms on the keep-the-most-points metric. Note the y-value switch so you can descend the loop with the same code. It also consistently deviates low, since it presumes that only the part after the first regression of the curve is wrong. Similar results are obtained with
      while ( @points ) { if (($points[0][0] > $modified[-1][0])^($points[0][1] < $modified[ +-1][1])) { push @modified, shift @points; } else { shift @points; pop @modified; } }
      which drops a slightly larger number of points, and I think has a slightly cleaner curve because of it. My best idea is that we are trying to correct for S's in the data. For minimal impact to the curve, we probably want to hold onto the start of the S, the end of the S, and the median of the retrograde component. Note that by incorporating the median, we may be adding a fictitious point.
      while ( @points ) { if (($points[0][0]-$modified[-1][0])/($points[0][1]-$modified[-1][ +1]) > 0) { push @modified, shift @points; } else { # Retrograde! my @retro = (pop @modified, shift @points); push @retro, shift @points while ($retro[-1][0]-$points[0][0]) +/($retro[-1][1]-$points[0][1]) < 0; shift @points while ($retro[0][0]-$points[0][0])/($retro[0][1] +-$points[0][1]) < 0; pop @modified while ($retro[-1][0]-$modified[-1][0])/($retro[- +1][1]-$modified[-1][1]) < 0; push @modified, [0.5*($retro[@retro/2-.5][0] + $retro[@retro/2 +][0]), 0.5*($retro[@retro/2-.5][1] + $retro[@retro/2 +][1]), ] } }
      With this, I finally beat graff's point count, though one might argue his result is a little better on the wiggly bits since I bias high.

      I realize that the above implementations are written in a very Perly way, but they can certainly be changed to more traditional structures after a best algorithm has been suggested.

      Finally, I should note that given the characteristic uncertainties of these measurements, this whole discussion is academic since none of these curves are significantly different from each other from a measurement perspective.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Kenneth, thanks for your continued explorations. I've been looking at this most of the day, and I think I finally made a decision.

        Looking in detail at the differences between the original data (red), my original algorithm (blue); graff's agorithm (black) and your latest algorithm (green); there is one place where I think your algorithm is clearly superior: the bottom left corner of the top right zoom, where your algorithm splits the difference between mine and graff's; and I think is clearly a better reflection of the underlying trend of the original data.

        However, as you pointed out, it does 'invent' points, which neither of the other two do. I also agree that in most other cases yours has a tendency to bias high.

        So after much umming and aahing, I'm going with graffs algorithm. Thanks.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.