Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Combining Excel Parser with Google Scholar Scraper

by ochez (Initiate)
on Apr 14, 2009 at 14:57 UTC ( [id://757408]=perlquestion: print w/replies, xml ) Need Help??

ochez has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

For a task at work, I am trying to write a program that 1.) parses a list of paper titles from an excel spreadsheet, 2.) scrapes google scholar to get/return the "cited by:" numbers for each title, and 3.) puts these numbers in the spreadsheet column next to the titles.


Basically I am trying to combine a simple spreadsheet parser with a nested for loop that I wrote with fetch.pl found at this site: http://davide.eynard.it/cgi-bin/perlcode.pl?file=scholar.pl


It all seems simple enough, but I just can't get them to work together. At this point I'm just trying to have the fetch.pl program return the "cited by:" numbers, but I also found a script that would probably benefit more in my case: Spreadsheet::ParseExcel::SaveParser


If anyone could help me out real quick, I'd be very grateful. I feel like an experienced programmer could do this in five minutes if they wanted.


My Excel Parser looks like this:



#!/usr/bin/perl -w use strict; <br>use Win32::OLE qw(in with); <br>use Win32::OLE::Const 'Microsoft Excel'; <br> <br>$Win32::OLE::Warn = 3; # die on err +ors... <br># get already active Excel application or open new <br> my $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit'); <br># open Excel file <br>my $Book = $Excel->Workbooks->Open("C:/Documents and Settings/rto5 +u/My Documents/CV.xls"); <br># select worksheet number 1 (you can also select a worksheet by na +me) <br>my $Sheet = $Book->Worksheets(1); <br>foreach my $row (2..4) <br>{ <br>foreach my $col (1) <br>{ <br># skip empty cells next unless defined $Sheet->Cells($row,$col)->{'Value'}; <br># print out the contents of a cell <br> print "At ($row, $col) the value is: \n", <br> $Sheet->Cells($row,$col)->{'Value'}; <br> print "\n"; <br>} <br>} <br>print "\n"; <br># clean up after ourselves <br>$Book->Close;

The spreadsheet just has titles of papers in the first column...I'd like to eventually have the program write in "cited by" results in the column next to it.

My apologies for the bad formatting, I'm a little rusty with my HTML

Replies are listed 'Best First'.
Re: Combining Excel Parser with Google Scholar Scraper
by kennethk (Abbot) on Apr 14, 2009 at 15:22 UTC
    First, please read Writeup Formatting Tips and/or Markup in the Monastery for how to properly format postings in the monastery. In particular, wrapping your code in <code> tags will maintain proper formatting and allow for easy download by those folks who wish to help you.

    What exactly are you looking for? Is there some way in which the above code does not work? You seem to have all the elements necessary to meet your design goals - where is the disconnect? You clearly state you are having difficulties, but what those are are unclear. How (Not) To Ask A Question. Perhaps if you simplified by modifying each script to perform your task with a fixed input and then merged the two?

      The problem is that I have a google scholar scraper, and i have the Excel parser, but for the life of me, I am having trouble combining the two. I want the scraper to go through the paper titles in the spreadsheet one by one and return the "cited by:" results for each paper.
        Again, where is the disconnect? What have you tried to do to combine the two? Is the interpreter outputting errors? Is the issue just that you are not familiar with Perl syntax? Post what you've done (even if it is just pseudocode) and we can work through it together. If you expect a monk to take two scripts you found on the internet and merge them for you, then the Your Work section of How (Not) To Ask A Question is very relevant.
Re: Combining Excel Parser with Google Scholar Scraper
by Nkuvu (Priest) on Apr 14, 2009 at 18:41 UTC

    One thing I'm seeing in your Excel-handling Perl is that you never write to the spreadsheet, nor do you save the Excel file. When using Win32::OLE, opening a file opens it for read/write, so you don't have to do anything differently there. The other thing I'm not sure about is your inner foreach loop -- why is it there at all if you only have one column? Not wrong per se, just curious.

    Saving to Excel is really simple. Using kennethk's code (in this node), just add one line (with some surrounding lines to show location):

    if ($citedby){ print "\"$title\"\nCited by: $citedby\n\n"; # Assuming you want to save the "cited" info in the # column just to the right of the current cell, adjust # as desired. $Sheet->Cells($row,$col+1)->{Value} = $cited; }

    Then, of course, you'll want to call $Book->Save(); before you close the workbook.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://757408]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-26 03:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found