Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: REGEX for url

by james28909 (Deacon)
on Apr 25, 2016 at 20:42 UTC ( [id://1161478]=note: print w/replies, xml ) Need Help??


in reply to REGEX for url

my $line = '<td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-00¡0365-0009.txt">0009.txt</a></td>'; $line =~ s/.*a href="(.*)".*/$1/; print $line;

Replies are listed 'Best First'.
Re^2: REGEX for url
by wrkrbeee (Scribe) on Apr 25, 2016 at 20:52 UTC

    Thank you for your help! That expression does not seem to bind to anything for me, something else perhaps that I"m doing wrong? Below is a small amount of the code. Thanks again!

    $/="</html>"; while (my $line = <$FH_IN>) { chomp $line; #removes line break or new line; my $url_sub = ""; my $data=""; $url_sub =~ s/.*a href="(.*)".*/$1/; print $url_sub;
      This works for me:
      use strict; use warnings; for(<DATA>){ print if s/.*a href="(.*)".*/$1/; } __DATA__ <td scope="row">9</td> <td scope="row">SUBSIDIARIES OF THE REGISTRANT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-00­0365-0009.txt">0009.txt</a></td> <td scope="row">EX-21.1</td>

      Output:

      C:\Users\James\Desktop\perlmonks>perlmonks.pl /Archives/edgar/data/1050122/000092735601000365/0000927356-01-00¡0365- +0009.txt

      EDIT: It seems that $/ = "</html>"; manipulates the input record seperator in such a way it does completely break the functionality of the simple regex. Do yu have any links to documentation on this $/ = "</html>"; ?

        Not sure if this helps, but the full text block, from <html> through </html> appears below. Just using $/ as a way to indicate the end of a record. I apologize for wasting your time.

        <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>EDGAR Filing Documents for 0000927356-01-000365</title> <link rel="stylesheet" type="text/css" href="/include/interactive.css" + /> </head> <body style="margin: 0"> <noscript><div style="color:red; font-weight:bold; text-align:center;" +>This page uses Javascript. Your browser either doesn't support Javas +cript or you have it turned off. To see this page as it is meant to a +ppear please use a Javascript enabled browser.</div></noscript> <!-- BEGIN BANNER --> <div id="headerTop"> <div id="Nav"><a href="http://www.sec.gov/index.htm">Home</a> | <a +href="/cgi-bin/browse-edgar?action=getcurrent">Latest Filings</a> | < +a href="javascript:history.back()">Previous Page</a></div> <div id="seal"><a href="http://www.sec.gov/index.htm"><img src="/im +ages/sealTop.gif" alt="SEC Seal" border="0" /></a></div> <div id="secWordGraphic"><img src="/images/bannerTitle.gif" alt="SE +C Banner" /></div> </div> <div id="headerBottom"> <div id="searchHome"><a href="/edgar/searchedgar/webusers.htm">Sear +ch the Next-Generation EDGAR System</a></div> <div id="PageTitle">Filing Detail</div> </div> <!-- END BANNER --> <!-- BEGIN BREADCRUMBS --> <div id="breadCrumbs"> <ul> <li><a href="http://www.sec.gov/">SEC Home</a> &#187;</li> <li><a href="/edgar/searchedgar/webusers.htm">Search the Next-Ge +neration EDGAR System</a> &#187;</li> <li><a href="/edgar/searchedgar/companysearch.html">Company Sear +ch</a> &#187;</li> <li class="last">Current Page</li> </ul> </div> <!-- END BREADCRUMBS --> <div id="contentDiv"> <div id="formDiv"> <!-- START FILING DIV --> <div id="formHeader"> <div id="formName"> <strong>Form 10-K</strong> - Annual report [Section 13 and 15 +(d), not S-K Item 405] </div> <div id="secNum"> <strong><acronym title="Securities and Exchange Commission">S +EC</acronym> Accession <acronym title="Number">No.</acronym></strong> + 0000927356-01-000365 </div> </div> <div class="formContent"> <div class="formGrouping"> <div class="infoHead">Filing Date</div> <div class="info">2001-03-30</div> <div class="infoHead">Accepted</div> <div class="info">1995-09-28 00:00:00</div> <div class="infoHead">Documents</div> <div class="info">10</div> </div> <div class="formGrouping"> <div class="infoHead">Period of Report</div> <div class="info">2000-12-30</div> </div> <div style="clear:both"></div> </div> <!-- END FILING DIV --> <!-- START DOCUMENT DIV --> <div style="padding: 0px 0px 4px 0px; font-size: 12px; margin: 0px +2px 0px 5px; width: 100%; overflow:hidden"> <p>Document Format Files</p> <table class="tableFile" summary="Document Format Files"> <tr> <th scope="col" style="width: 5%;"><acronym title="Sequenc +e Number">Seq</acronym></th> <th scope="col" style="width: 40%;">Description</th> <th scope="col" style="width: 20%;">Document</th> <th scope="col" style="width: 10%;">Type</th> <th scope="col">Size</th> </tr> <tr> <td scope="row">1</td> <td scope="row">ANNUAL REPORT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0001.txt">0001.txt</a></td> <td scope="row">10-K</td> <td scope="row">194594</td> </tr> <tr class="blueRow"> <td scope="row">2</td> <td scope="row">EMPLOYMENT AGREEMENT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0002.txt">0002.txt</a></td> <td scope="row">EX-10.6</td> <td scope="row">18708</td> </tr> <tr> <td scope="row">3</td> <td scope="row">CHANGE IN TERMS AGREEMENT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0003.txt">0003.txt</a></td> <td scope="row">EX-10.9</td> <td scope="row">24380</td> </tr> <tr class="blueRow"> <td scope="row">4</td> <td scope="row">FIRST AMENDMENT TO LEASE AGREEMENT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0004.txt">0004.txt</a></td> <td scope="row">EX-10.12</td> <td scope="row">15945</td> </tr> <tr> <td scope="row">5</td> <td scope="row">THIRD AMENDMENT TO LEASE</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0005.txt">0005.txt</a></td> <td scope="row">EX-10.19</td> <td scope="row">3127</td> </tr> <tr class="blueRow"> <td scope="row">6</td> <td scope="row">FOURTH AMENDMENT TO LEASE</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0006.txt">0006.txt</a></td> <td scope="row">EX-10.20</td> <td scope="row">3887</td> </tr> <tr> <td scope="row">7</td> <td scope="row">FIFTH AMENDMENT TO LEASE</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0007.txt">0007.txt</a></td> <td scope="row">EX-10.21</td> <td scope="row">3980</td> </tr> <tr class="blueRow"> <td scope="row">8</td> <td scope="row">SIXTH AMENDMENT TO LEASE</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0008.txt">0008.txt</a></td> <td scope="row">EX-10.22</td> <td scope="row">4017</td> </tr> <tr> <td scope="row">9</td> <td scope="row">SUBSIDIARIES OF THE REGISTRANT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0009.txt">0009.txt</a></td> <td scope="row">EX-21.1</td> <td scope="row">700</td> </tr> <tr class="blueRow"> <td scope="row">10</td> <td scope="row">CONSENT OF INDEPENDENT PUBLIC ACCOUNTANTS< +/td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0010.txt">0010.txt</a></td> <td scope="row">EX-23.1</td> <td scope="row">346</td> </tr> <tr> <td scope="row">&nbsp;</td> <td scope="row">Complete submission text file</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365.txt">0000927356-01-000365.txt</a> +</td> <td scope="row">&nbsp;</td> <td scope="row">272254</td> </tr> </table> </div> <!-- END DOCUMENT DIV --> </div> <!-- START FILER DIV --> <div id="filerDiv"> <div class="mailer">Mailing Address <span class="mailerAddress">13751 S WADSWORTH PARK DR SUITE D-14 +0</span> <span class="mailerAddress"> DRAPER UT 84020 </span> </div> <div class="mailer">Business Address <span class="mailerAddress">13751 S WADSWORTH PARK DR SUITE D-14 +0</span> <span class="mailerAddress"> DRAPER UT 84020 </span> <span class="mailerAddress">8015728225</span> </div> <div class="companyInfo"> <span class="companyName">1 800 CONTACTS INC (Filer) <acronym title="Central Index Key">CIK</acronym>: <a href="/cgi-bin/b +rowse-edgar?CIK=0001050122&amp;action=getcompany">0001050122 (see all + company filings)</a></span> <p class="identInfo"><acronym title="Internal Revenue Service Number"> +IRS No.</acronym>: <strong>870571643</strong> | State of Incorp.: <st +rong>DE</strong> | Fiscal Year End: <strong>1231</strong><br />Type: +<strong>10-K</strong> | Act: <strong>34</strong> | File No.: <a href= +"/cgi-bin/browse-edgar?filenum=000-23633&amp;action=getcompany"><stro +ng>000-23633</strong></a> | Film No.: <strong>1587687</strong><br />< +acronym title="Standard Industrial Code">SIC</acronym>: <b><a href="/ +cgi-bin/browse-edgar?action=getcompany&amp;SIC=3827&amp;owner=include +">3827</a></b> Optical Instruments &amp; Lenses<br />Assistant Direct +or 10</p> </div> <div class="clear"></div> </div> <!-- END FILER DIV --> </div> </body> </html>
        Any possibilities for why that would not work on my end? Maybe something that a rookie would do that an expert would not, or vice versa? Thank you for your time!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161478]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-24 01:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found