Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re^14: World's shortest intro to function programming

by kelan (Deacon)
on Jun 21, 2005 at 19:45 UTC ( #468798=note: print w/replies, xml ) Need Help??

in reply to Re^13: World's shortest intro to function programming
in thread Thread on Joel on software forum : "I hate Perl programmers."

So, how about dealing with the memory issue? I thought the beauty of Haskell was that it was non-strict or lazy. That it dealt with infinite lists. Then why does getContents insist on loading the whole darn file?

I was curious about why this happened, so I tried my own variations to see if I could overcome it. Nothing I tried worked; Hugs ran out of memory with a large file every time. So I did some more searching online and through the Haskell mailing list. There's a thread that talks about the 'wc' program, and gives the reasons why it uses so much memory.

In essence, it's not that Haskell is not being lazy enough and reading in the entire file. The problem is that it's being too lazy, and not evaluating the '+1's in the recursive function call. So it keeps all these '+1's around in memory, waiting to be evaluated until it hits the base case, but you run out of memory before it gets there.

The solution they came up with on the mailing list was to create a data type to hold the stats, and put strictness markers on the data type so that the '+1's would get evaluated immediately. Below is your example converted to that style. The strictness markers are the exclamation points before the parametric types in the data constructor Stats. They tell Haskell to evaluate whatever value is about to be put into that slot in the Stats value being created, instead of keeping the "thunks" around to evaluate later. This version does run in constant memory space with a huge file.

data Stats = Stats !Int !Int !Int wcf :: Stats -> String -> Stats wcf stats [] = stats wcf (Stats cc w lc) (' ' : xs) = wcf (Stats (cc+1) (w+1) (lc ) ) xs wcf (Stats cc w lc) ('\t' : xs) = wcf (Stats (cc+1) (w+1) (lc ) ) xs wcf (Stats cc w lc) ('\n' : xs) = wcf (Stats (cc+1) (w+1) (lc+1) ) xs wcf (Stats cc w lc) ( x : xs) = wcf (Stats (cc+1) (w ) (lc ) ) xs wc :: IO() wc = do putStr ( "Filename: " ) name <- getLine contents <- readFile name let (Stats cc w lc) = wcf (Stats 0 0 0) contents putStrLn ( "The file " ++ name ++ " has " ) putStrLn ( show cc ++ " chars " ) putStrLn ( show w ++ " words " ) putStrLn ( show lc ++ " lines." ) main = wc

Replies are listed 'Best First'.
Re^15: World's shortest intro to function programming
by BrowserUk (Patriarch) on Jun 21, 2005 at 20:11 UTC

    Thankyou! You have no idea how much that will help me with my chosen 'first program'.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://468798]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2022-10-05 08:34 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (21 votes). Check out past polls.