Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: How would you code this?

by graff (Chancellor)
on Apr 07, 2016 at 14:04 UTC ( #1159813=note: print w/replies, xml ) Need Help??


in reply to Re^2: How would you code this?
in thread How would you code this?

I noticed that some x values are repeated up to five times, and that the selection of a trajectory through such a region (depending on what the algorithm does) could vary between:
c c c vs b vs b b a a a
(i.e. after dropping some number of points from the input series, the middle point either falls midway between the other two, or creates an "elbow" close to the previous or following point).

I thought about working out a way to maintain a stack, such that from the first (bottom) to the last (top) stack element, there would be at most three distinct x values (that seems to be the extent of the "oscillation" you're seeing).

Once a fourth x value appears in the next point, review the stack and output the points that comprise the smoothest trajectory. But that's a lot of work for a potentially insignificant gain in "accuracy".

UPDATE: As for your "primary" goal of "preserving the original data" as much as possible, as opposed to creating a smoothed sequence with adjustments affecting all data points: given that the input appears to be quantized, any desire to preserve that original data is actually going to preserve the nature of the measuring device, rather than the nature of the thing being measured. Maybe Certainly there's value in that, but depending on what your downstream processes are supposed to do with the data, you might rather let those processes get a "fully smoothed" version of the data, because this represents a presumably reasonable fiction, as opposed to the quantized, jagged fiction being created by your measuring device.

Another update: obviously, by preserving the original data - but not necessarily using it as-is for certain downstream processes - you'll get to do comparisons if/when you use a different device (or a different configuration of the current device).

Replies are listed 'Best First'.
Re^4: How would you code this?
by BrowserUk (Pope) on Apr 07, 2016 at 15:25 UTC
    given that the input appears to be quantized,

    I agree the data is quantized. It appears to be, (still waiting for confirmation from the equipment manufacturer), an artifact of the digitisation of the analogue values produced by the sensing device.

    depending on what your downstream processes are supposed to do with the data,

    The cleanup is required because the downstream processing -- FEM software -- interpolates between the supplied values using a cubic spline interpolation, as the simulation converges. Thus, it requires that the input data be monotonic in order that it can produce a 'single-valued cubic spline fit'.

    My choice to avoid producing a "fully smoothed" fit, is because I've seen bad interactions between pre-processed fitting, and the fitting done internally by the software. These manifest themselves as interminable oscillations in the Newton-Raphson iterations resulting in extremely extended run times.

    you'll get to do comparisons if/when you use a different device (or a different configuration of the current device).

    The data is supplied to me; I only get one dataset per sample, but lots of (physically and chemically) different samples. I have no control over how it is produced.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      <Insert obligatory critique of cubic spline interpretation here />

      Are you confident that the badly behaved regions of the dataset are bad discrete measurements as opposed to something that should be treated with a noise model? Do you have a physical understanding of what happens when the sensor generates an aberrant series?


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Do you have a physical understanding of what happens when the sensor generates an aberrant series?

        Okay. This is my understanding; which may not be perfect, but isn't too far wrong.

        The curves are B-H magnetisation curves produced by Remagraph C hardware. These are renowned as the best available -- they essentially set the standards by which this is done.

        The X values are the controlled variable; the Y-values the dependent. Both are digitised quanta of continuously variable analogue properties -- magnetic field strength and magnetic flux density. The measured values are magnetic flux density, induced within the samples under test, as extrapolated from those sensed by external sensing coils.

        As the magnetic samples, the inducing & sensing coils, have both reluctance (thus hysteresis) and are subject to eddy currents under the influence of changing magnetic fields; the software that performs the test cycle has a feedback loop and 'goal seeking' criteria to both vary the speed of sampling, and to actively adjust (step back) the input field values in order to detect when and where the induced flux density stabilises, so as to eliminate the transient affects causes by the changing field.

        The exact algorithm used by the software to produce the graphs is not publicly documented -- presumably it is valuable proprietary IP -- but the equipment isn't just good; it's the best.

        Perhaps the most important thing I can say at this point is that these discontinuities are tiny aberrations within the overall dataset. To try and explain how tiny I've uploaded this image that shows that you need to zoom in (twice) a long way to see just how tiny they are in relation to the overall B-H hysteresis loop.

        Thus, throwing them away has little on no effect upon the integrity of the overall information the datasets contain. It just has to be done; and I was looking for a clean way to do it.

        (In the end the code is probably going to be rewritten in something akin to matlab, so avoiding Perlish niceties is one goal of my question here.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1159813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (1)
As of 2021-02-25 00:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?