Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

remove non ascii characters in a whole corpus

by Anonymous Monk
on Oct 25, 2014 at 09:10 UTC ( [id://1104951]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that will remove all non ascii characters from a single file. However, what a need is a program that will remove all non acsii characters from over 400 separate files. And I need the new, clean files (still separate) in a new folder. Thanks.

  • Comment on remove non ascii characters in a whole corpus

Replies are listed 'Best First'.
Re: remove non ascii characters in a whole corpus
by hippo (Bishop) on Oct 25, 2014 at 09:36 UTC

    When a piece of code is required to perform the same or similar operation a number of times in succession the programmer uses a construct called a loop. This is one of the three main types of control flow in computer programming so, as an novice programmer, it is well worth your time reading up about loops in the literature.

    As a rich and diverse language, Perl provides a number of different loop mechanisms. Among the simplest of these are for, while and until which have analogues in other 3rd and 4th generation languages. Becoming familiar with these basic constructs will be time well spent.

    Update: improved links.

Re: remove non ascii characters in a whole corpus
by jonadab (Parson) on Oct 25, 2014 at 13:30 UTC

    There are several ways to approach that. Here are a couple of the easiest:

    • opendir will let you read the list of files in the source directory. You could then use a foreach loop to open each of them in turn, read the contents, do your transformation, open a corresponding file in the output directory, and write your transformed content to it.
    • You could also just use a pattern glob, perhaps something like this:
      foreach my $file (</path/to/sourcedir/*>) { # ... }
      This is slightly less portable (because the syntax of your glob is platform-dependent), and less flexible (because the glob is hardcoded, so it's harder to do things like read the source directory path from a config file), but for a quick-and-dirty hack, it's easier to get working quickly.

    HTH.HAND.

Re: remove non ascii characters in a whole corpus
by Anonymous Monk on Oct 25, 2014 at 09:14 UTC

    and then what happened?

    Why can't you take the program you have, and improve it?

    perlintro, Path::Tiny

      Yes, I am trying to improve the program, but am new to programming.If I knew how to improve the program, I would not have written the question.

        Yes, I am trying to improve the program, but am new to programming.If I knew how to improve the program, I would not have written the question.

        What are you trying exactly?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1104951]
Approved by igelkott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-18 23:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found