Yeah, agreed on the database-can-read-CSV issue. That eliminates this overhead.
But then, the code example of tybalt98 (I had prepared something very similar to run benchmarks) doesn't swap, regardless of how big the dataset is. Time is more or less linear with the number of records. My (not very up-to-date) system processes about 20000 records per minute, which means I wouldn't stand a chance to process 14M records in four hours. NYTProf shows that most of the time goes into preparation and printing the output file. It doesn't even help a lot if output goes to SSD.
I wonder what indexing you would apply to the problem at hand? If you can provide an example, I'd be happy to run it against my SQLite or postgres server on the same system for comparison.
I don't mind working with databases at all (how could I: I've been working as a product manager for database engines for some years). But in this case the suggestions to use a database (or MCE) all came with little concrete help for the OP and his program. tybalt98 and I found an actual performance issue which, when fixed, gives several orders of magnitude acceleration. How much gain do you expect from switching to a database?
How much familiarity with SQL and database functions do the database aficionados expect from the OP? Is this actually helping or is this saying "look how smart I am!"?
Also, when your management likes the output you just produced, they're going to ask for more and more analytics.
I can confirm that from my own experience. But then, management doesn't ask for a 260GB CSV file, they usually want "two or three slides". One of my most successful Perl programs fell into that category. The evaluation ran once per week for several years. It might have been using a database but it didn't. Actually, no one cared.