in reply to Re^2: Anyone with XS experience willing to create a high performance data type for Perl?
in thread Anyone with XS experience willing to create a high performance data type for Perl?

So again, tell us what you're trying to achieve.

It is exceedingly unlikely that the outcome of this thread is that someone will take it upon themselves to put a fast btree implementation on CPAN which meets all your requirements. What might reasonably be an outcome is that someone will advise you of a way of achieving your requirement in a reasonably speedy way, e.g. by using CPAN module X or perl technique Y or interesting algorithm Z. But only if we know what you want.


  • Comment on Re^3: Anyone with XS experience willing to create a high performance data type for Perl?

Replies are listed 'Best First'.
Re^4: Anyone with XS experience willing to create a high performance data type for Perl?
by beautyfulman (Sexton) on Nov 10, 2021 at 23:44 UTC
        I will try one last time. What I mean by "tell us what you want to achieve" is a description like the following. Talking about inserts and deletes is describing an implementation of what you're trying to achieve. At the moment we have ABSOLUTLY NO IDEA what your problem is, because you haven't even hinted at it yet.

        So here's a compelely hypothetical example just to give you an idea of what sort of description I'm after.

        "The program must process a large set of alphanumeric identifier tags, each mapped to a name. The program runs entirely in memory; it doesn't store permanent state to a file or database. The program reads in identifier / name pairs from a file or network socket; on startup there will be an initial "surge" of about a million records, then new records will appear at a slower rate. The program keeps the identifier/name pairs in memory. From time to time, queries will be received on a second socket; this may involve looking up a particular identifier, or returning a range of identifier values (the identifiers have batch numbers incorporated into them and sometimes you want to return all identifiers from that batch). Once an hour, the program will delete all identifiers whose batch number is more than 100 behind the most recent batch number."


        > I can switch languages if i need to. If you know of something Perl that can do it as fast as Ruby/C rbtree gem i will be very glad to use Perl.

        I have to admit your overall approach feels completely alien to me. :) Is this a personal hobby project? Or for work? - if so, how many programmers at your company?

        I don't switch languages lightly because it's not practical to maintain sharp skills across many different programming languages at the same time. In any case, I'd need to get approval from the other programmers in my team before doing so.

        And I don't introduce dependencies lightly. What if your dependent module (or any of its dependencies) has a security vulnerability? What if its author abandons it? What if its license restricts you? How quickly can you isolate/troubleshoot a bug in its code? What if you later need to port your software to a platform (or a different Perl version) that your chosen 3rd party library does not support?

        I tend to take a much more conservative approach. I'd start by solving the problem, in an easy to understand and maintain style, in plain Perl. If that was fast enough, problem solved. If not, I'd benchmark it to find where the bottleneck is and consider the options, including introducing third-party dependencies. Since both Perl and C++ are approved languages at work, I'd even consider writing the whole thing in plain portable ANSI C++ (again without third-party dependencies). Note, btw, that C++ std::map is usually implemented using red-black trees.

        For some background on where I'm coming from see: Why Create Coding Standards and Perform Code Reviews?