http://qs321.pair.com?node_id=383095

chiburashka has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: makeing refering faster ?
by perldeveloper (Scribe) on Aug 15, 2004 at 15:20 UTC
    I'd say from your code that you are trying to build for every sentence Si a list which contains a all sentences Si1, Si2, ..., Sik that start with any of the words belonging to Si. However, your code keeps overwriting this list with the sentence which start with the last word of every sentence -- making the code if not incorrect, at least suspicious and inefficient. If you are trying to do what I think you are, here is how I'd do it for you:
    use strict; use warnings; my $dat0 = 'a.txt'; open (DAT, "$dat0") or die "Could not open file `$dat0'.\n"; my @all=<DAT>; close (DAT); my @words = (); my $sentences = {}; # 'word' => [ sentences that start with `word' ] foreach my $sentence (@all) { chomp ($sentence); push (@words, [ split (/[ \t]+/, $sentence) ]); my $firstWord = $words[-1]->[0]; $sentences->{$firstWord} = [] if not exists $sentences->{$firstWor +d}; push (@{$sentences->{$firstWord}}, $#words); } my @temp = ('', ''); for (my $i = 0; $i <= $#words; $i++) { push (@temp, $all[$i]); my @referencedSentences = (); foreach my $j (@{$words[$i]}) { if (($j ne "$j") || ($j ne "v")) { # I don't get this so I lea +ve it intact if (exists $sentences->{$j}) { push (@referencedSentences, $sentences->{$j}); } } } push (@temp, \@referencedSentences); } print "Done.\n"; # ...

    As you can see, I first make a hash indexed by the first words in every sentence, where the values are references to arrays holding the indices to the sentences whith start with the word. Then, for every sentence I make an array of these hash values, for every word which happens to start any of the sentences (including the one under scrutiny). I believe this code is more fit to start working on optimization -- my code ran within a second on a 3 thousand line file.

    A few other remarks:
    • Initialize an empty list with my @list = ();, and not @list = '', which initializes the first element to an empty string.
    • Always use my, always stick to warnings and strict.
    • Avoid recalculating the same values more than a couple of times by caching them (like split in your code).
    • Use descriptive names and comments, especially when asking for assistance :)
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: makeing refering faster?
by Zero_Flop (Pilgrim) on Aug 15, 2004 at 17:18 UTC
    chiburashka-

    You really need to get a good book on perl. Start on page one and work your way though.
    Posting bad code over and over again is not going to get you anywhere!
    There are numerous problems with this code. No strict, no Warnings.

    It all boils down to:
    $dat0 = 'a.txt'; open( DAT, "$dat0" ) || die ("Could not open file!"); @all = <DAT>; # I'm not enven sure what perl will do when you open the same file han +dle # twice without closing, I would assume it would close the first for y +ou, # but it's just bad. open( DAT, ">$dat0" ) || die; # You never write back to @all so you are doing nothing. print DAT @all; close(DAT);
    If you want to learn, we can help you, but you have to learn to walk before you run.
    Zero

    janitored by ybiC: Moved from reaped parent thread into this'un, for better site searching

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: makeing refering faster ?
by wfsp (Abbot) on Aug 15, 2004 at 14:24 UTC
    You should consider declaring all your variables and adding:
    use strict; use warnings;
    I suspect there maybe some typos.

    If you do that it would be easier to help.

Re: makeing refering faster ?
by CountZero (Bishop) on Aug 15, 2004 at 19:34 UTC
    Consider not using $1 as an ordinary variable. It is a special variable used in regular expressions and once you start using regular expressions, the (bad) habit of (ab)using these special variables will turn around and bite you.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      He's not using $1 (one), he's using $l (lower case L). Granted, it looks very confusing. Confusing enough to discourage its use.
Re: makeing refering faster ?
by graff (Chancellor) on Aug 17, 2004 at 08:23 UTC
    It's a good thing you provided this information:
    ps : this script is supposed to get all the lines from a file and refer each word to the sentence that starts with that word (and, there aren't 2 sentences that start identically).
    Without that, there'd be no hope of helping with the problem. But even with that, there's still not quite enough to go on. (Looks like perldeveloper made a lucky guess, but I confess that I am still confused.)

    Does the input data file really contain exactly one "sentence" per line? Are you certain that the "words" in each sentence are always separated by exactly a single space character? Are the words in "mixed case", and do they include punctuation marks? (And does this have an effect on what you are trying to do?) Why should it matter if a sentence contains a "word" that consists of the single letter "v"?

    Let's suppose a particular word (e.g. "bar") occurs at the beginning of one sentence (e.g. sentence #23), and also occurs in the middle or at the end of 4 other sentences (e.g. #5, #12, #47, #69). What do you want to accomplish with regard to this word? Locate just the one sentence that begins with "bar"? Locate just the other four sentences that contain "bar"? Locate all five sentences (and identify the one that begins with "bar")? What do you want to do with words that only occur in the middle or at the end of sentences but never at the beginning of any sentence? Ignore them?

    How you answer those questions will determine how you should read through the sentences and words, what sort of data structure you should create from the input data, and how you would use that data structure after you've built it.

    As for the code you posted at the start of this thread, the reason it takes so long for more sentences is the nesting of your "for" loops:

    foreach sentence in the file { ... foreach word in the sentence { ... foreach sentence in the file { ... # given n sentences with and avereage of m words each, # this block has to execute n*m*n times } } }
    As you have learned from experience, this sort of approach "does not scale well" to large numbers of sentences. But to work out a good approach, you need to clarify your goals. You seem to be content with perldeveloper's solution (assuming his additional reply makes sense to you), but it's not clear to me that it is the best approach, or that it does what you really want -- mostly because you haven't provided a clear description of what you really want.