Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

split() Problem

by Omukawa (Acolyte)
on Jul 21, 2008 at 21:49 UTC ( [id://699163]=perlquestion: print w/replies, xml ) Need Help??

Omukawa has asked for the wisdom of the Perl Monks concerning the following question:

I think I don't quite understand how split() works. I'm currently working on a script, which reads a db full of raw entries of a dictionary and sorts and converts them into XeTeX format. I use hashes of hashes to store the entries.

The following part of the script should look if there two words seperated by ";", split them, put them in an array @engdict with all of the information that belongs to them.

for $_ (%edict) { # for entries with more than one keyword if ($edict{$count}{english} =~ /;/) { @foo = split (/; /, $edict{$count}{english}); for $_ (@foo) { $engdict[$no] = "\\textbf{".$foo[$_]."}"; print $foo[1]; $engdict[$no] = $engdict[$no]."\\textmd{$dict{$cou +nt}{puma}}"; $engdict[$no] = $engdict[$no]."\\textit{$dict{$cou +nt}{ps}}"; $no++; }; delete $edict{$count}; } else { $count++; }; };

Replies are listed 'Best First'.
Re: split() Problem
by Joost (Canon) on Jul 21, 2008 at 22:01 UTC
    You're not asking anything, and there are so many red flags and other things that are probably mistakes here I don't know what the hell this code is supposed to do. Just the first line, then:

    for $_ (%edict) { # for entries with more than one keyword
    That iterates of the hash keys AND values, storing each of them in $_

    in other words, given the hash %edict = ( a => 1, b => 2); $_ would contain 'a', then 1, then 'b' and then 2.

    Also you're splitting whatever it is you're splitting on "; " (that's semicolon followed by a space), instead of a semicolon as you claim you want.

    Update: as a first attempt at clarifying this, please explain what the top-level for loop is supposed to do and how that loop's body is effected by the loop (and especially, what you're doing with $_).

Re: split() Problem
by moritz (Cardinal) on Jul 21, 2008 at 21:56 UTC
    The following part of the script should look if there two words seperated by ";", split them,
    And then
    @foo = split (/; /, $edict{$count}{english});

    Note the extra space in the regex - it doesn't do what you described it should. If there's no space, then split will not match.

    Anyway, your example uses quite some nested data structures of which we know nothing. We also don't know what the string looks like that you want to split. If you want more or better help, give us some data (and a simpler piece of code that does the same thing). See also How (Not) To Ask A Question.

Re: split() Problem
by CountZero (Bishop) on Jul 21, 2008 at 22:34 UTC
    It looks as if $edict{$count} uses a counter as its key. Perhaps it is more logical to use an array rather than a hash here?

    When you do for $_ (@foo), $_ will sequentially contain the content of each element of the array @foo. Next you do $foo[$_], in other words you index into the array with the value of the element. This is almost certainly wrong, unless you are trying to implement some sort of linked list. I think you just want to use the content of each element and that is already in $_.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: split() Problem
by Omukawa (Acolyte) on Jul 22, 2008 at 09:51 UTC

    Sorry for the obscurities. As I said, I use hashes of hashes to store dictionary entries. So %edict{0} contains the first entry (0 is the key to the embedded hash) and &edict{0}{english} contains the english translation. That's mostly one word like "house" etc. but sometimes there are entries like "office; court; ....". That's why the split looks for "; ", since there is always a whitespace after the semicolon.

    What I'm trying to do is to split the entry with more than one english words, put the them in the array @foo and then take the first word from @foo, put it in another array @engdict with other information belongs to it, so that I will have two seperate scalars in @engdict: One is office : xyz, and the other is court : xyz.

    So that's what the code must do: 1. Find the entry about english translation with more than one word inside, seperated by "; " 2. Put them seperately in the array @foo 3. Take the first word from @foo and put it in $engdict[$no] with other information 4. Add one to $no 5. Take the second word and put it in $engdict[$no+1] and so on.

    The problem is this is not working as I thought. The $foo[$_] is always empty so I think I'm doing something wrong with the split() command but I can't see the mistake. So here is my question: Can you tell me the mistake here?

    Thank you

      I think your problem is that you do too much at once without testing the result from the intermediate steps.

      Most of my scripts start with

      # always use those: use strict; use warnings; use Data::Dumper;

      When you suspect that split isn't working the way you think it is, try adding the line print Dumper \@foo; immediately after the split, and look if the results are what you execpted it to be.

      The $foo[$_] is always empty so

      for iterates over the values in the array, not the indexes. So instead of $foo[$_] just write $_.

        Changing $foo[$_] with $_ solved the problem. Thanks a lot. Also thank you for the advice about Data::Dumper

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://699163]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-25 20:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found