Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Representing all data as Lists (Perl7?)

by rje (Deacon)
on Sep 20, 2005 at 15:13 UTC ( [id://493478]=perlmeditation: print w/replies, xml ) Need Help??

I was re-reading Paul Graham's web articles (http://paulgraham.com/hundred.html) and thinking about how Perl might adapt for computers with processing power many orders of magnitude better than what we have. I wasn't thinking practically, nor particularly critically, but I was thinking about it with Paul's LISPoCentric view.

Imagine the runtime inefficiency (yet potential elegance) of treating strings as lists of characters -- or references to lists of characters -- and treating hashes as lists of list-pairs. You know how, in Perl, you declare a hash as a flat array, and Perl sorts it out for you? What if, in the future, Perl actually stores all data internally as lists?

Something strange I've just realized is how completely I've forgotten that strings really are "arrays" in C. "Modern" languages have taught me to forget the internal representation of strings and treat them as atomic.

One odd thing about strings being represented internally as lists is that the quote characters become an alias for list declaration.

Another odd thing is that sigils might converge.

Imagine my blurry view of Perl7. Pardon my lack of semicolons, I'm seeing what it would feel like to not use them all the time in Perl.

It turns out there are lots of things that might cause problems (maybe because I'm too used to thinking of scalars as strings) that might be remedied by throwing out the $ sigil and using the @, or, alternately, trying to remember that everything's a reference to a list. Or something.

It occurs to me that this behavior might be implemented today with a module. I'll have to think about this a bit. I've never written a module that fundamentally changes the look (or behavior) of the language.

It would feel strange calling split() on a list.
use strict my @baz = 'hello there' # this is a list print @baz[0] # => h print @baz[6] # => t my @foo = [ 'hello', 'there' ] # this is a list of 2 lists @foo = split ' ', @baz # so is this? print @foo # => "hellothere" print len @foo # => 2 print @foo[0][1,3,4] # => "elo" print @foo[1][3..5] # => "ere" push @foo, 'how', 'are', 'you?' @foo[0,3] = [ 'goodbye', 'were' ] # # Why would you "emulate" a hashmap using lists? # Would a list of pairs work? # Sorting a hash automagically sorts by keys? # my @bar = { 'a' => 1, 'b' => 2, 'c' => 3 } # # is perhaps translated to the access-inefficient: # # my @bar = [ [ 'a', 1 ], [ 'b', 2 ], [ 'c', 3 ] ] # # do we assume hash access looks the same? print @bar{a} # => 1 @foo = @bar{a,b,c} # => [ 1, 2, 3 ] print @foo # => 123 print join ',', @foo # => 1,2,3 print join ',', @bar # => ???

Edit by castaway - swapped pre tags for code tags

Replies are listed 'Best First'.
Re: Representing all data as Lists
by Anonymous Monk on Sep 20, 2005 at 17:03 UTC
    Why stop with lists?

    You can make your data type even more elegant and inefficient by constructing everything from sets. The mathematics of set theory is rock solid, and it's what mathematicians actually use as their theoretical underpinnings for almost all of mathematics.

    Sets will give you what you seem to be looking for; maximum theoretical abstraction, with a minimum of real world application. As a mathematician, I'm not fond of excessive abstraction. You just have to abstract back up the chain to build up useful abstractions again, like complex numbers, addition, subtraction, and so forth. Few mathematicians bother to describe addition in terms of the underlying set theory, except to prove that it can actually be done.

    Real world applications need useful data types. Strings should always be a built in data type; any language that lacks strings as a built-in data type ends up building them right back into the language again using some sort of standard library. From a practical standpoint, the only real difference between a feature that's in the core language and a feature that's in a core library is the extra commands required to include the library, and how awkward the syntax is.

    --
    Ytreq Q. Uiop

      It seems like you're talking about LISP. Isn't a Cons a simple pair set? And wasn't LISP originally simply a mathematical proof? (Forgive my naivetie, I don't know much about LISP, and have used it only in mutant forms).

      "From a practical standpoint, the only real difference between a feature that's in the core language and a feature that's in a core library is the extra commands required to include the library, and how awkward the syntax is."

      I'm not exploring what happens to the language when things are removed from the core, but rather if anything useful can happen to programmers and the way they program when the internal representation changes.

      But actually, I'm not even exploring that. I'm trying to figure out what Perl programmers can do if Perl represented scalars and hashes as lists.
Re: Representing all data as Lists
by revdiablo (Prior) on Sep 20, 2005 at 16:07 UTC

    Your main supporting premise here is that it would add some magical degree of elegance. To be honest, I just don't see it. Different things are elegant in different situations, and a list is not always the Right Thing. I guess you could say it's elegant to form all tools into the shape of a hammer, but I disagree.

    For what it's worth, you could probably emulate most of the interface you have here using tied arrays. Perhaps if I played around with something like that, I might see your point of view. But currently, I just don't.

      If the implementation is the only thing that changes, then the programmer may simply see a Perl6 look and feel. I'm exploring whether or not the implementation of Perl would become simpler if there was one datatype used to represent the things we see as strings, lists, and hashes. Granted, I'm also exploring what a Perl would look like which blurred the distinctions between scalars, lists, and hashes as well.
        I'm exploring whether or not the implementation of Perl would become simpler if there was one datatype used to represent the things we see as strings, lists, and hashes.
        It might be simpler, but it would never happen. You seem to view this as just a "matter of time" since computers will be so fast they will make up for the "inefficiency" of this LISP-y implementation. The speed of a computer is a factor, but it is probably the least important factor when it comes to "efficiency."

        Constant speedups are one thing, but if you have a bad algorithm (like blindly searching a list to implement an associative array), the problem will scale poorly no matter how fast your computer is. The only way a futuristic computer running a linear-time algorithm could keep up with another computer running a constant-time algorithm would be if it could somehow compress time.

        If I had to choose between a fast computer and fast algorithms, I'd choose fast algorithms ;)

        You'd have a better chance of convincing me that this is a good idea if the implementation stays efficient and the everything-is-a-list is just a unifying interface to whatever the underlying structure is. Although I still don't see what this does that LISP doesn't already do.

        blokhead

        It's kind of funny, because my feelings are the opposite of what blokhead describes in the last paragraph of Re^3: Representing all data as Lists; I'd actually be more inclined to support unifying the internals than changing the interface, though I don't really think either is a very good idea.

        Like I said (or maybe I should have said more clearly) in my original reply, I think different things should look different. You may say the corollary is that alike things should like similar -- and I'd agree -- but in this case I dont' think a string and a list are things that are really all that alike. I can understand why someone would say they are, but I think it's a contrived likeness.

        I strongly agree with Perl's current notion that a string is an individual thing (i.e. a scalar). I don't often have the need to break that thing apart and operate on its individual characters (Update: a clarification of what I meant). Perhaps that's a learned trait, but I tend to think it's more likely that instinctively seeing a string as a list is less natural.

        In the case of hashes, I don't feel as strongly as I do about strings. Hashes are very list-like already, and as you pointed out, you can already coerce a list into a hash. That's a pretty natural-feeling thing, in my eyes. I still think a hash is different enough from a plain list that it deserves its own syntax and sigil, though. And Perl 6 will make it pretty easy to get a list of pairs, or a list of interwoven keys/values:

        my %hash = <one 1 two 2 three 3>; say $_.join(" ") for %hash.pairs; # output: # one 1 # three 3 # two 2 say for %hash.kv; # output: # one # 1 # three # 3 # two # 2
Re: Representing all data as Lists (Perl7?)
by Roy Johnson (Monsignor) on Sep 20, 2005 at 19:35 UTC
    One interesting side-effect to cogitate on is that regular expressions would then be applied to lists. It seems like that might be useful, but would require some expansion of the atomic matchers (what currently match individual characters).

    Caution: Contents may have been coded under pressure.

      If you like that, you could play with Mathematica. It has a poweful pattern-matching engine that works with any kind of data, like lists of lists of etc, not only strings. (From version 5.1, it also has a separate regexp matcher for strings only.)

Re: Representing all data as Lists
by VSarkiss (Monsignor) on Sep 20, 2005 at 16:44 UTC
      Actually, we're about 50 years too late (LISP).
Re: Representing all data as Lists
by itub (Priest) on Sep 20, 2005 at 17:07 UTC
    I think some of the success of Perl is due to its complex and quirky but expressive syntax. Sure, Lisp is logical and elegant, but all those parentheses can get painful after a while (at least to me). In a similar way, having "redundant" ways to access things in Perl, such as $, @, substr, etc. makes code easier to read to people who like Perl (obviously, Perl detractors who say that it's line noise won't agree ;-).
      Yes, that's quite possible. Sigils are clues to the human interpreter as well as the machine interpreter. But what are they clues of? Don't they tell you what you can and cannot do with that piece of data?

      In some respect, perhaps sigils represent the contract between Perl and the programmer, just like the rest of Perl grammar.

      Perl does have a rich syntax. Would it be richer if you could operate on scalars in ways you can't currently? Or is the operation set not orthogonal enough -- is this a lose because of context ambiguity? That's sort of what I'm working through.

      substr() exists. Every C-based language gets it for free. But I think the reason Perl doesn't have strcat() is because the dot-operator is better. In other words, subroutines lose if there's a better metaphor (for some value of "better").

      So, assuming Perl6 makes @foo[4] the proper way to index lists, why can't we say $foo[4] is the 'shorthand' for substr()? No, no, I'm not submitting an RFC... I'm just asking, "does it make sense?" and "is it better than substr($foo,4,1)"? And then the next step is to ask, if some PMC of the future stores strings as lists, has Perl therefore gained quite a bit of power while, at the same time, streamlined its interior logic?

      I'm working through these thoughts. I don't know the answer.

        You can’t say that because in Perl6, $foo[4] is what you’d write $foo->[4] in Perl5.

        Makeshifts last the longest.

Re: Representing all data as Lists
by diotalevi (Canon) on Sep 20, 2005 at 16:14 UTC
Re: Representing all data as Lists (Perl7?)
by pg (Canon) on Sep 20, 2005 at 23:01 UTC
    "strings really are "arrays" in C."

    I know that you put the word arrays in quotes... the proper thinking is that "strings really are continuous memery space in c".

    Any way, the real question is what is list/array. First, how Perl stores those different data structure internally (or more precisely c, not Perl) does not matter to Perl programmers, and more importantly, there is no structure in storage/memory, structure is just the way to interprete the stored bitmap. The concept of list bares no value at the bottom level, and expose no impact to consumers of the language.

    There are at least three levels, and we have talked about the first two: the memory and the Perl internal (c). Now talk about the level you actually referred to - Perl. One sentence, doesn't matter how Perl organize things internally, you need to expose rich data structures to Perl programmers, and things like hash is not replaceable.

      the proper thinking is that “strings really are continuous memery space in c”.

      So are arrays.

      (Although in an architecture with an MMU, they might not be. So too need Perl arrays and strings not be contiguous as far as perl is concerned.)

      Makeshifts last the longest.

Re: Representing all data as Lists
by chester (Hermit) on Sep 20, 2005 at 16:27 UTC
    substr and/or split can do most of that already. I don't think accessing the individual letters of strings by index is important enough to be the focus of how to lay out Perl's syntax. Things that are done most often should be easiest to do (and shortest to type), other things can be happily relegated to function or module status.
Re: Representing all data as Lists (Perl7?)
by ambrus (Abbot) on Sep 21, 2005 at 10:49 UTC

    I think you are mistaken about Lisp.

    While this may not be true for the very first versions of lisp, data in current lisp systems aren't all represented as lists. It's source code that's written as only lists. Actual aggregate data can be lists, vectors, functions, strings, structures, hash tables, and lots of other things.

    There are such simplistic languages where most things are of the same type: LOGO has lists only (even functions are lists); most shells have strings only (this is not exactly true, bash can have integers and sparse arrays too); and I don't want to say anything wrong about TCL, awk or sed. These are not powerful languages. (Let's not mention obfu-languages like BF, unlambda, befunge etc, each of which have only one type of data.)

    Update: I must admit that Mathematica is somewhat close to your description: every kind of data appear as an array (but not a list; and strings, bigfloats, bigrats, bigints, complex numbers, symbols etc are real atoms, one can only go so far). (Some arrays are internally stored in a compact form though, although they appear to be normal arrays.) This is advanatgeous for a symbolic algebra system, as the pattern matching rules can be simple and fast, and expressions can be transformed dynamically. Also, being a symbolic algebra system implies that the boundary between code and data is not very clear, and this can be made easier with this uniform representation of all data (and code). To some amount, the same is true to Maxima, which is incidentally written in lisp.

    Note however, that this is not neccessarily true for all symbolic algebra systems. Maple, for example, has various types of compound expressions, including wierd double arrays for sums and products.

Re: Representing all data as Lists (Perl7?)
by tomazos (Deacon) on Sep 22, 2005 at 19:20 UTC
    Having written a perl poem Pound New to Perl 9.1 about a data type that combines scalars, lists and hashes into one unified type - and having read the hundred article you quoted, let me just say this...

    The author of that article doesn't have much of a clue about efficiency.

    Some algorithms scale well, some don't. The ones that don't can scale really, really badly. So much so that even a speed increase of 2^30 isn't going to help.

    Failing some fundamental shift in the nature of reality, thinking about O(f(n)) will always be important - even if Moores law continues until the end of time.

    This kind of shallow technobabble irks me. As a computer scientist you have to read things from a position of skepticism. There are good reasons for this.

    -Andrew.


    Andrew Tomazos  |  andrew@tomazos.com  |  www.tomazos.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://493478]
Approved by jdporter
Front-paged by jdporter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-23 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found