Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Regular Expression help, MS Access Pipe delimited export gone bad

by silent11 (Vicar)
on Nov 06, 2001 at 06:20 UTC ( [id://123484]=perlquestion: print w/replies, xml ) Need Help??

silent11 has asked for the wisdom of the Perl Monks concerning the following question:

I was given an Access database which I am trying to get into MySQL, I read through the documentation on mysqlimport and it said that it was only capable of importing text files. So I export my Access DB into a pipe delimited file, pipes delimiting the fields, and newlines delimiting the records. Well, to make a long story short, M$ Access's Export feature sucks, many of my rows were divided on whitespaces which resulted single lines looking like the following
3884|MC1|GW|Graphworks/Mirandas Needle|kit |Mice Skating|EACH|$1.50|$0.75|MC1.JPG

My question is, how do I remove the newline '\n' on a line if the next line doesn't start with a 3-5 digit number and then print that new complete line to a seperate file?

This is what I have so far, and it doesn't work.
open (PROD,"<products.txt"); open (OUT,">out.txt"); @array = <PROD>; foreach $line (@array){ unless ($line =~/JPG?/){ chomp; } print OUT "$line"; }

any help would be appreciated.

Replies are listed 'Best First'.
Re: Regular Expression help, MS Access Pipe delimited export gone bad
by hawson (Monk) on Nov 06, 2001 at 07:21 UTC
    How about this:
    #!/usr/bin/perl -w use strict; my ($line, $spacer); while(defined($_=<>)) { s/\n//g; if ($_ =~ /^[0-9]{3,5}/) { print "$line\n" if defined($line); $line=$_; } else { $spacer = (/^\|/ || $line =~ /\|$/) ? "" : " "; $line.="$spacer$_"; } } print "$line\n"; #get last one...
    If a line starts or ends with a pipe ("|"), then it assumes you do not want extra whitespace around it. Otherwise, it should honor any leading and trailing whitespace on each line (so if one line ends with a tab, and the next starts with one, the code will spit out two tabs in a row).
Re: Regular Expression help, MS Access Pipe delimited export gone bad
by jlongino (Parson) on Nov 06, 2001 at 12:05 UTC
    I use MS Access for importing/exporting files all the time and can tell you that in all likelihood the problem is not with Access per se. More likely is that whoever designed the application allowed users to embed hard returns into text fields (not necessarily a bad thing, depending on what they were doing with the data). This is common in memo fields but less so with regular text fields.

    The solution is not as simple as looking for a number at the beginning of each line. What if the text that follows the hard return begins with a number? You can't count the number of fields because your records don't end with a delimiter (like  1|abc d|2|xxx|).

    I think the best solution is to strip the carriage return/line feed pairs out of your Access table text fields before you export the data. Either that or add a field to the end of each Access table and put a fake end-of-record character in it (one that isn't used in the data). Then export and use change the input separator in Perl to the fake character.

    Not pretty, any way you look at it. Good luck and HTH,

    --Jim

Re: Regular Expression help, MS Access Pipe delimited export gone bad
by boo_radley (Parson) on Nov 06, 2001 at 07:35 UTC
    My answer assumes that the first fields will never be empty (that is, a pipe being the first character in a line is a sign that there's been a break in the record.
    do { local $/=undef; $_=<DATA>; ~s~\n\|~\|~g; print; }
Re: Regular Expression help, MS Access Pipe delimited export gone bad
by gt8073a (Hermit) on Nov 06, 2001 at 12:38 UTC
    3884|MC1|GW|Graphworks/Mirandas Needle|kit
    |Mice Skating|EACH|$1.50|$0.75|MC1.JPG

    it looks like you are not using a text qualifier( double quotes for example ) when you export your table(s). mysqlimport will know what to do with the exported file if you do( at least, that has been my experience ). this is not a perl solution, but it may help you your stated problem: I was given an Access database which I am trying to get into MySQL.

    Will perl for money
    JJ Knitis
    (901) 756-7693
    gt8073a@industrialmusic.com

Re: Regular Expression help, MS Access Pipe delimited export gone bad
by RhetTbull (Curate) on Nov 06, 2001 at 18:42 UTC
    You may want to take a look at Text::xSV by our own tilly. This works like Text::CSV but handles embedded newlines. If you import your data into perl with Text::xSV you could then export it in a format more suitable for MySQL (perhaps using quoted fields as gt8073a already pointed out.
Re: Regular Expression help, MS Access Pipe delimited export gone bad
by CubicSpline (Friar) on Nov 06, 2001 at 18:55 UTC
    From my testing, your code would work fine if you changed the line:

    unless ($line =~/JPG?/){ chomp; }

    to be:

    unless ($line =~/JPG?/){ chomp( $line ); }

    Seeing other ways to do what you want is a great way to learn Perl, but it's also nice to know what is going wrong in the code that you write.

    HTH.
    ~CS

Re: Regular Expression help, MS Access Pipe delimited export gone bad
by jlongino (Parson) on Nov 06, 2001 at 23:49 UTC
    If you are dealing with a very small data file, or this is just a one-shot program than you can skip the rest of this post with my apologies.

    I stress again that you should really focus on correcting the problem from the source and then generating an usable export file. Important issues still hold true:

    • What if the text that follows the hard return begins with a number?
    • What if two separate fields in the same record have hard returns? (a three line record).
    • Can you enumerate every possible scenario of this sort?
    Here is MS Access VB code that will allow you to create a global function that can be called anywhere from within your application and strip the hard returns. It is certainly not optimized, but I'm at work and had to do a quick and dirty. Just be sure that you make a copy of your Access database to work/export from. Create a MS Access VB module:
    Option Compare Database Option Explicit Public Function Strip_crlf(mStr) Dim mNewStr As String mNewStr = mStr While (InStr(mNewStr, Chr(13))) Mid(mNewStr, InStr(mNewStr, Chr(13)), 1) = " " Wend While (InStr(mNewStr, Chr(10))) Mid(mNewStr, InStr(mNewStr, Chr(10)), 1) = " " Wend Strip_crlf = mNewStr End Function
    Once you create this module you can use the function to either write a sub that will process all text fields in the database, more simply create a Update Query that does the module call:  Strip_crlf([fieldname]) for each "Update to:" text field. Then export the table as you originally wanted it.

    HTH,

    --Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://123484]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-03-28 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found