Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Re: Re: Sorting URLs on domain/host: sortkeys generation

by tachyon (Chancellor)
on Mar 30, 2003 at 09:40 UTC ( [id://246707]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Sorting URLs on domain/host: sortkeys generation
in thread Sorting URLs on domain/host: sortkeys generation

Actually it does not matter that there are \w\.\w... sequences as . sorts before \w you get the desired result. The http:// is also immaterial provided all entries either have (or don't have it). cmp sorting does not stop at the first non word - it simply sorts in ASCII order.

print "$_\n" for sort qw ( http://. http://www.google.com http://www.google.co.uk http://au.google.com http://au.goo.com http://au.goop.com ); __DATA__ http://. http://au.goo.com http://au.google.com http://au.goop.com http://www.google.co.uk http://www.google.com

This looks appropriately sorted to me. The IP code will get you the domain (or IP) in $1 regardless so you can easily modify it, but as this shows you don't really need to unless you want to trim off the ftp:// http:// https:// part and thus lump these in one group. The only other modification you can do to the domain name is chop the www. off (trying to guess other subdomains is a hopeless task) Otherwise the default cmp should work fine. Perhaps you could post an example of where it is not?

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Re: Re: Sorting URLs on domain/host: sortkeys generation
by dakkar (Hermit) on Mar 30, 2003 at 10:06 UTC

    I'm quite sure the OP wanted something like:

    http://. http://au.goo.com http://au.goop.com http://au.google.com http://www.google.com http://www.google.co.uk

    That is, sorted by TLD, then 2nd level domain, then 3rd, and so on. At least, it's what his code does (notice the reverse).

    -- 
            dakkar - Mobilis in mobile
    
Re: Re: Re: Re: Sorting URLs on domain/host: sortkeys generation
by parv (Parson) on Mar 30, 2003 at 11:31 UTC
    http://aset.its.psu.edu/announcements/newsgroup_changes.html
    http://aset.its.psu.edu/unix_group/
    http://aset.its.psu.edu/unix_group/unixaccounts.html
    http://aset.psu.edu/ait/
    http://aset.psu.edu/ait/filesys.html
    http://aset.psu.edu/unix_group/lsfaqs.html
    http://aset.psu.edu/unix_group/quickunix.html
    http://cac.psu.edu/
    http://cac.psu.edu/publish/htpasswd/alternate.html
    http://clc.its.psu.edu/
    http://clc.its.psu.edu/Labs/
    http://clc.its.psu.edu/Labs/Mac/
    http://clc.its.psu.edu/labs/Mac/software/all.aspx
    http://clc.its.psu.edu/labs/Mac/software/default.aspx
    http://css.its.psu.edu/internet/
    http://css.its.psu.edu/internet/unix.html
    http://css.its.psu.edu/news/alerts/
    http://css.its.psu.edu/news/alerts/K4notice.html
    http://its.psu.edu/
    http://its.psu.edu/computing.html
    http://its.psu.edu/learning.html
    http://search.psu.edu/query.html
    

    ...is sorted by current algorithm (in OP) in the following desired order...

    http://aset.psu.edu/ait/
    http://aset.psu.edu/ait/filesys.html
    http://aset.psu.edu/unix_group/lsfaqs.html
    http://aset.psu.edu/unix_group/quickunix.html
    http://cac.psu.edu/
    http://cac.psu.edu/publish/htpasswd/alternate.html
    http://aset.its.psu.edu/announcements/newsgroup_changes.html
    http://aset.its.psu.edu/unix_group/
    http://aset.its.psu.edu/unix_group/unixaccounts.html
    http://clc.its.psu.edu/
    http://clc.its.psu.edu/Labs/
    http://clc.its.psu.edu/Labs/Mac/
    http://clc.its.psu.edu/labs/Mac/software/all.aspx
    http://clc.its.psu.edu/labs/Mac/software/default.aspx
    http://css.its.psu.edu/internet/
    http://css.its.psu.edu/internet/unix.html
    http://css.its.psu.edu/news/alerts/
    http://css.its.psu.edu/news/alerts/K4notice.html
    http://its.psu.edu/
    http://its.psu.edu/computing.html
    http://its.psu.edu/learning.html
    http://search.psu.edu/query.html
    

    ...sorting is done first on the 2d level TLD, then on hostname if any, then on the remaining string if any. (I thought i already wrote that in OP; perhaps was not clear...)

    Lest we forget the question, is there a less verbose way (than the one in OP) to sort the URLs on criteria just presented above?

    (Long) Side note: FWIW, i converted the given Schwartzian transform to Gottman-Rosler Transform as an exercise, which was faster around 14-16% (benchmarked, Perl 5.8, merge/quick sorts, FreeBSD 4.7/386) -- not much of a difference (to me in this case, unless i am missing something).

Re: Re: Re: Re: Sorting URLs on domain/host: sortkeys generation
by Anonymous Monk on Mar 30, 2003 at 10:48 UTC

    I beg to differ, but simple sorting urls will put http://www.google.com/stuff nowhere near http://google.com/stuff or http://translate.google.com/stuff?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://246707]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-25 06:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found