first character is the record type. there are 4 types. T-W-Q-C I usually load into a staging and set rec type = substring(data,1) and then do the
insert-select substring where rec_typ= into other tables. file is 1017 characters per line and the first two lines are always T-W, then Q's (anywhere from a few dozen, to a few million) and then a single C (which is the reconciliation of the Q records).
So the speed is needed for the Q records. (The fake Q records below show what data each fields have. e.g. numeric(always integers with no decimals), and text)
<c>
T2022TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
+WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
Q0001012345678900012345678900012345678900000000GB00000000 01234567890
+001234567890123456789 ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEF
+GHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJ
+KLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWX
+YABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOXXYYZZ1234567
+89123456789 012345678912012345678912012345678912 Q00010123456789
+0001234567890001234567890Q000101234567890001234567890001234567890Q000
+101234567890001234567890001234567890 Q000101234567890001234567890001
+234567890Q000101234567890001234567890001234567890Q0001012345678900012
+34567890001234567890XXYYGB123456789123456789ABCDEFGHIJKLMNOPQRSTUVWXY
+ABCDEFGHIJKLM4012345678912345678912312-3456789STABCDEFGHIJKLMNOPQRSTU
+ ABCDEFGHIJKLMNOPQRSTU 3 0115 9TS7QT.99999.S1.092 9999999901234567891
+29TS7QT.99999.SL.0920101ABCDEFGHIJKLMNOPQR20229TS7QT.99999.SL.092ABCD
+EFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLM40ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGH
+IJKLM40999999999 0123456789 0000001
Q0001012345678900012345678900012345678900000000GB00000000 01234567890
+001234567890123456789 ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEF
+GHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJ
+KLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWX
+YABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOXXYYZZ1234567
+89123456789 012345678912012345678912012345678912 Q00010123456789
+0001234567890001234567890Q000101234567890001234567890001234567890Q000
+101234567890001234567890001234567890 Q000101234567890001234567890001
+234567890Q000101234567890001234567890001234567890Q0001012345678900012
+34567890001234567890XXYYGB123456789123456789ABCDEFGHIJKLMNOPQRSTUVWXY
+ABCDEFGHIJKLM4012345678912345678912312-3456789STABCDEFGHIJKLMNOPQRSTU
+ ABCDEFGHIJKLMNOPQRSTU 3 0115 9TS7QT.99999.SL.092 9999999901234567891
+29T57QT.99999.SL.0920101ABCDEFGHIJKLMNOPQR20229TS7QT.99999.SL.092ABCD
+EFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLM40ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGH
+IJKLM40999999999 0123456789 0000001
Q0001012345678900012345678900012345678900000000GB00000000 01234567890
+001234567890123456789 ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEF
+GHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJ
+KLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWX
+YABCDEFGHIJKLMNOABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLMNOXXYYZZ1234567
+89123456789 012345678912012345678912012345678912 Q00010123456789
+0001234567890001234567890Q000101234567890001234567890001234567890Q000
+101234567890001234567890001234567890 Q000101234567890001234567890001
+234567890Q000101234567890001234567890001234567890Q0001012345678900012
+34567890001234567890XXYYGB123456789123456789ABCDEFGHIJKLMNOPQRSTUVWXY
+ABCDEFGHIJKLM4012345678912345678912312-3456789STABCDEFGHIJKLMNOPQRSTU
+ ABCDEFGHIJKLMNOPQRSTU 3 0115 9TS7QT.99999.SL.092 9999999901234567891
+29TS7QT.99999.SL.0920101ABCDEFGHIJKLMNOPQR20229TS7Q1.99999.SL.092ABCD
+EFGHIJKLMNOPQRSTUVWXYABCDEFGHIJKLM40ABCDEFGHIJKLMNOPQRSTUVWXYABCDEFGH
+IJKLM40999999999 0123456789 0000001
| [reply] [d/l] |
Please wrap that data in <code>...</code> tags.
As it stands, your post has made viewing the entire thread extremely difficult:
on my 27" screen, I need to scroll horizontally about a dozen times;
this will likely be even more difficult for those with laptops or other smaller devices.
For future posts, kindly ensure you "Preview" first,
edit to fix any problems such as this,
repeat as necessary,
and only "Create" when all is well.
Thankyou.
| [reply] [d/l] |
Ugh.. I just fixed it. (I think). Is there a guide on how to make readable posts?
| [reply] |
so to condense my prior post.
How do I convert the following DBI instructions for mariadb to sqlite
print "Enter filename";
$filename=<STDIN>;
chomp $filename;
$path='/media/usb-drive';
$file=$path.$filename;
$dbh->do("LOAD DATA INFILE '$file' into TABLE STAGE SET REC_TYP=SUBSTRING(DATA,1)");
$dbh->do("INSERT INTO A(A,B,C) SELECT SUBSTRING(DATA,2,10,SUBSTRING(DATA,13,1),SUBSTRING(DATA,14,20) FROM STAGE WHERE REC_TYP='q'");
| [reply] |
I did some more investigation and contrary to previous reports, there are SQLite equivalents to these commands. And there are even different variations on the theme (there is more than one way to do this). However, AFIK the relevant commands are only available via SQLite.exe, the command line interface to SQLite. In your example the input file is read into a table STAGE, then each row in that table STAGE is split out via SUBSTRING into a new table A.
Because at least my Perl doesn't come with the command line installed, I didn't pursue this idea further and instead would like to show you how to use what you already have via the Perl DBI program I/F without needing the command line I/F. As a DB browser, I use a separate GUI that I like so I don't even have the SQLite command line I/F on my computer. SQLite essentially comes with your Perl distribution (can't imagine a distribution that did not include the DBI). "use DBI;" is all you need - there is nothing else that is necessary to install.
I really didn't understand what your text format meant. So to write some demo code, I just picked 3 substrings that I defined from a Q line. You will need to define others. I hope you can understand the concept and then extend it further. I took your example data and put it in a file called "testImportFile.txt". The lines were 1019 chars so I shortened them to the claimed 1017 characters. The first Q line was truncated to produce an error condition of malformed input record.
Here is the code. Then I will discuss further...
use strict;
use warnings;
use DBI;
my $dbName = "YourDBname.sqlite";
my $import_file = "testImportFile.txt";
my %attr = ( RaiseError => 1); # auto die with error printout
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbName","","",\%attr)
or die "Couldn't connect to database $dbName: " . DBI->errstr;
open (my $fh_in, '<', $import_file) or die "cannot access file:\'$impo
+rt_file\' $!\n"; #suppress Perl line number
$dbh->do ("DROP TABLE IF EXISTS Data");
$dbh->do ("CREATE TABLE Data (fieldA TEXT, fieldB TEXT, fieldC TEXT)")
+;
my $insertRow = $dbh->prepare ("INSERT INTO Data (fieldA, fieldB, fiel
+dC)
VALUES (?, ?, ?) ");
+
$dbh->begin_work; ### Starts a transaction!!
while (<$fh_in>)
{
next unless /^Q/;
chomp;
my $len = length($_);
if ($len != 1017)
{
print "*** ERROR FOLLOWING LINE IS $len CHARS, not 1017! THIS
+LINE IS REJECTED\n";
print "$_\n\n";
next;
}
# The number of fields must match the number of columns in CREATE T
+ABLE
# or the program will stop with a fatal error!
my @fields = (substr($_,1,57), substr($_,59,32), substr($_,93,25));
$insertRow->execute(@fields);
print "$_\n" for @fields; print "\n"; # Just to show data for demo
}
$dbh->commit; ### Completes a transaction!!
__END__
*** ERROR FOLLOWING LINE IS 76 CHARS, not 1017! THIS LINE IS REJECTED
Q0001012345678900012345678900012345678900000000GB00000000 01234567890
+001234
0001012345678900012345678900012345678900000000GB00000000
01234567890001234567890123456789
ABCDEFGHIJKLMNOPQRSTUVWXY
0001012345678900012345678900012345678900000000GB00000000
01234567890001234567890123456789
ABCDEFGHIJKLMNOPQRSTUVWXY
I called my data table "Data" instead of "A". Use whatever name you want. In the CREATE TABLE statement, I used the SQLite data type of TEXT instead of char() or varchar(). In SQLite, char(n) is always n chars in width. varchar(n) can contain more than n characters unlike other DB's where this is limited to 1..n chars. If you have those designations in your MySQL code, by all means, use them. Just be aware that you can stick 100 chars into a varchar(5)! Here TEXT is pretty generic and works just fine.
SQLite is a full ACID database. This is a good thing for data consistency, but the bookkeeping involved really slows down an insert. The thing to do is to bunch a whole bunch of inserts together into what is called a transaction. According to ACID principles, a transaction either fully succeeds or fully fails - there is no "in-between". Finalizing a transaction with 100 inserts takes about as long as a transaction with 100,000 inserts. You can do maybe 10 transactions per second - computers vary... I have made transactions with 1 million inserts...works fine. If you try one insert per transaction, performance will auger into the ground!
Note that for Perl substring, the index starts at "0", not "1". I think for SQL the first index is "1". So be aware of that and adjust for the dreaded "off by one" accordingly.
The demo code does not make any intermediate table and goes directly from input file to DB. This may even run faster than your import utilities for Maria. In general, you will find SQLite to be quite performant. The big limitation is that there can only be one writer at a time and acquiring an exclusive file lock for that is slow. There can be many simultaneous readers. This looks perfect for your application. Also note that SQLite now has many "big boy" features like stored procedures, etc.
I assumed that all of these fields were actually stored as characters. If some are binary integers, then there are a few "yeah, but's" which can be discussed if you have trouble.
Have fun! Hope this was helpful...
Update:
To get an idea of performance, I ran one of my applications and timed it. It uses 12,400 input data files, creates a
main table with 3.6 million rows, and executes in 182 seconds (3 minutes). My machine was a used, refurbished i5 when it was bought 8-9 years ago - so not a screamer by any means. More than a million rows a minute is a fine data import speed for this application. It is possible to go faster, by defeating more of the ACID features and journaling, but that is not worth it to me.
| [reply] [d/l] |
<c> <----
print "Enter filename";
$filename=<STDIN>;
...
... FROM STAGE WHERE REC_TYP='q'");
</c> <----
| [reply] [d/l] |