comment on

For bulk processing like this, I'd sidestep DBI and use command lines tools.

Create a file called: getnames.sql containing:

select UserName from table where UserName <> '' and UserName not like 
+'deleted%'
[download]

And two perl scripts. 1) partNames.pl:

#! perl -w
use strict;

my %fhs;

while( <> ) {
    my $char =  substr $_, 0, 1;
    if( not exists $fhs{ $char } ) {
        open $fhs{ $char }, '>', "names.$char" or die $!;
    }
    print { $fhs{ $char } } $_;
}
[download]

And 2) uniqNames. pl:

#! perl -w
use strict;

for my $fname ( glob 'names.*' ) {
    my %uniq;
    open my $fh, '<', $fname or die $!;
    while( <$fh> ) {
        print unless exists $uniq{ $_ };
        $uniq{ $_ } =  1;
    }
}
[download]

Then run the following pair of commands:

mysql --quick < getnames.sql | perl partNames.pl

perl uniqNames.pl > uniqedNames.txt
[download]

The first line dumps the names and pipes them to the perl script that then writes them to a set of files (called names.?, where ? is the first character of the name).

The second script reads those files one at a time and writes the unique names to stdout where they are directed into a file called uniqedNames.txt.

The whole (untested) process should take less time than sorting the 34 million names -- inside the DB or with your system sort utility. I'd hazard a guess that it would take less than 20 minutes.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Fetch Mysql huge database by BrowserUk
in thread Fetch Mysql huge database by ym_chaitu

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks