Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"


by cajun (Chaplain)
on Jun 16, 2005 at 03:44 UTC ( #467158=perlquestion: print w/replies, xml ) Need Help??

cajun has asked for the wisdom of the Perl Monks concerning the following question:

I have several bookmark.xml files that have been saved over the years. I'm looking for a way to extract the bookmarks from these files, nuke the duplicates and end up with one 'master' bookmark file.

The only node that I've located so far that was related was Re: Uses for Perl & XML. I've searched CPAN for bookmark type modules as well as XML modules. There's quite a few XML modules. I've never dealt with XML before, so I have no working knowledge of any of the various modules.

Any suggestions about which module you feel might be best suited for this, or pointers in the right direction would be most helpful.


Update: The bookmarks all come from Konqueror and later Firefox. My assumption is they are all XML for two reasons, they all have xml filename extensions and if I open them with a browser, it says at the top of the page " This XML file does not appear to have any style information associated with it. The document tree is shown below." Hint - I know nothing about XML.

Replies are listed 'Best First'.
Re: Bookmark.xml
by dorward (Curate) on Jun 16, 2005 at 07:05 UTC

    If I look at my Konqueror bookmarks.xml file, I see:

    <!DOCTYPE xbel>

    One quick Google later and I've found documentation for the Xbel bookmark format. A scan of CPAN and I've found XML::XBEL.

    On the other hand, if I poke at my Firefox bookmarks.html file, I see:

    <!DOCTYPE NETSCAPE-Bookmark-file-1>

    So Netscape::Bookmarks is probably better for reading and writing in this format.

Re: Bookmark.xml
by Cody Pendant (Prior) on Jun 16, 2005 at 04:47 UTC
    Where do they come from? What's the format? Are they all the same? The best way to get started would be to fire up XML::Simple and see what Data::Dumper shows you as the output.

    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: Bookmark.xml
by BaldPenguin (Friar) on Jun 16, 2005 at 16:05 UTC
    You could load the data using XML::Simple (or you favorite XML Parser) which will load all the data into hashes as it understand it, then just process the hash, and write out to a new XML file in whatever format you want. That works,

    I could also see using XSLT to convert ine of the two files into the other and going from there. That would only be advantageous if you where going to do this often enough to justify the time.

    If you are just doing this once, the XML::Simple route would proably work better.

    FWIW, Konqueror can export it's bookmark files to Netscape/Mozilla format as well.


    Edit by castaway: Closed small tag in signature

Re: Bookmark.xml
by duelafn (Parson) on Jun 16, 2005 at 18:19 UTC
Re: Bookmark.xml
by chanio (Priest) on Jun 17, 2005 at 03:58 UTC
    If you are asking of a good target to point with this issue, I would recomend the following:
    • * New Bookmark parsers (like Galeon's) use to load every address as a hash key. And memorize their branch possition at every tree. Then, when building the final & unique tree, they would place every address on the most popular branch and they would place links to that address at the other branches where the same address was previously collected.
    • * You should see XML files just as other text files. With the advantage that you are able to find similar tags in different files. And take advantage of this.
    You might want to share your final masterpiece. But consider it as a practice for future works with XML files.

    .{\('v')/}   C H E E R   U P !
     _`(___)' ___a_l_b_e_r_t_o_________
    Wherever I lay my KNOPPIX disk, a new FREE LINUX nation could be established.
Re: Bookmark.xml
by cajun (Chaplain) on Jun 17, 2005 at 06:09 UTC
    Thanks to all for your commments / suggestions. I've learned a few things about XML during the process. Still much more to learn about it though in my spare time. As this is likely a one-time shot, I'm not sure of the value of getting very deeply involved in this process.

    For now, using XML::Simple and Data::Dumper, I have parsed the six bookmark files into one large file filled with URL's. I then took this file, put it in a hash, thereby deleting the duplications, then wrote it out to a file of 'unique' URL's.

    My plan now is to convert this file into HTML, then at my leisure, visit each of the sites (if they still exist) and bookmark them. Hmmm.... Maybe LWP::Simple can sort out the dead ones without me visiting each one. Another quest....

    Thanks again to all,

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://467158]
Approved by moot
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2021-10-19 09:51 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (76 votes). Check out past polls.