Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

create array of empty files and then match filenames

by angela2 (Sexton)
on Jan 07, 2016 at 15:38 UTC ( [id://1152200]=perlquestion: print w/replies, xml ) Need Help??

angela2 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I am so happy to have come across this website, you might be able to tell me what I'm doing wrong. I haven't coded much, maybe written 5 tiny scripts, and I'm struggling a bit.

So I have some chemical structure files in a directory and I want to process them. What I try to achieve now is look for some empty files that their filename is a single number as in 4, 10 and so on. So I thought I would put them all in an array, called @emptyfiles, and each element of this array I would call $emptyfile. As you can see, I have commented out this attempt as I couldn't get it to work, but I had used push.

Anyway my second attempt is to set the array again (@emptyfiles) and then use foreach to loop through the elements of the array, and then say "match filenames that only have numbers as their name", if this match is true I move forward to an "if file is empty" check and then I ask for it to be printed. However it doesn't work. The best I have managed to do is get the script to return nothing, so it's not finding an empty "4" file I have in the directory. Can anybody tell me what I'm doing wrong???

#/bin/perl/ use strict; use warnings; my @emptyfiles; #my $emptyfile =~ m/(\d+)/; #push (@emptyfiles, $emptyfile); my $digit; my $emptyname; foreach (@emptyfiles) { /(\d+)/; $emptyname = $digit; if (-z "$emptyname") { print "found empty file $emptyname \n"; } }

Replies are listed 'Best First'.
Re: create array of empty files and then match filenames
by toolic (Bishop) on Jan 07, 2016 at 15:53 UTC
    The following code creates a list of all files in your current directory which have a number in the name (such as 1.txt, 22, file345.doc):
    use warnings; use strict; my @emptyfiles = grep { /\d/ } grep { -f } glob '*';
    To restrict to file names which only have digits (such as 22, 4, 99999, 0):
    my @emptyfiles = grep { /^\d+$/ } grep { -f } glob '*';
    If that is not what you are looking for, you need to be more specific.
      my @emptyfiles = grep { /\d/ } grep { -f } glob '*';

      Why two greps?

      I don't think it makes a big difference for a small set of files, but this solution runs through two loops (implicitly in grep) where one loop is sufficient:

      my @files = grep { -f && /\d/ } glob '*';

      Also, -f hides a system call (stat), which is quite expensive. Swapping the order avoids the system call for all directory entires whose names do not match the regular expression:

      my @files = grep { /\d/ && -f } glob '*';

      Or, if you want to keep the two greps:

      my @files = grep { -f } grep { /\d/ } glob '*';

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Hi!

      This works! I also tried it against a few files with numbers in their filename but that are not empty, and it's still successful, only finding the empty ones, so my choice for using if -z was correct (I hope).

      Can you tell me what {grep -f} and {glob '*'} do? Also, can you tell me a couple of things that were wrong with my code??? I assume I have many mistakes but if you could mention one or two it'd still be good for me

      Thank you very very much

        my @emptyfiles = grep { /^\d+$/ } grep { -f } glob '*';

        Let's look at this right to left. glob '*' returns a list of files, the same list you'd get if you use * as the argument to an ls or dir command, except it's just the file names, not the dates, sizes, etc that those commands also return. This list will be the input to the grep command.

        my @emptyfiles = grep { /^\d+$/ } grep { -f } [ 1 temp.dat testdirecto +ry 10 resume.wd ]

        grep { something } list returns the list that results by applying the something as a test. If it passes, it is included in the return. glob will return both file and directory names in its list. Applying the -f test will remove the directories. The middle grep returns a list also.

        my @emptyfiles = grep { /^\d+$/ } [1 temp.dat 10 resume.wd ]

        we can use another grep to further filter the list to just the files we want, those whose name consists of 1 or more digits, and nothing else.

        my @emptyfiles = [1 10]
        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

        Can you tell me what {grep -f} and {glob '*'} do?
        If you have the "Function Nodelet" displayed, there are links to these Perl built-in functions. Otherwise... glob, grep
        Ooops sorry I missed your updated post. I know I'm annoying but can you tell me why the second line only matches file names which only have digits? Is it this +$ thing? I recognise the ^, this means "that starts with", if I understood correctly.
Re: create array of empty files and then match filenames
by kcott (Archbishop) on Jan 07, 2016 at 17:34 UTC

    G'day angela2,

    Welcome to the Monastery.

    "Can anybody tell me what I'm doing wrong???"

    There's many issues with the code you've posted:

    • For a start, you don't get any filenames to process. How does your script know which directory to look in? Where does it read the filenames (i.e. get a directory listing)? How does it differentiate between filenames that are plain files, those that are directories, and those that are something else (e.g. symbolic links)?
    • Your regex, /(\d+)/, is matching strings containing one or more digits (and capturing the digits into $1 which you don't use); you want strings containing only digits (i.e. /^\d+$/). Furthermore, that regex by itself is effectively a no-op: something like 'next unless /^\d+$/' would perhaps be more meaningful.
    • You declare both $digit and $emptyname but don't initialise them. That's absolutely fine: they both start as undef. You never change $digit: it remains undef throughout. You set $emptyname to $digit on every iteration: so it too remains undef throughout.
    • You never populate @emptyfiles: it has zero elements. Consequently, your foreach loop has zero iterations, i.e. that loop is never entered.
    • You also have a problem with what I assume was intended to be a shebang line, i.e. #/bin/perl/. That's just a comment! Shebang lines start with #!. Also, /bin/perl/ looks like a directory and is probably wrong.

    To be honest, and I don't mean this in any sort of nasty way, I rather think you just threw code at the problem and hoped it would work instead of having any real idea of what was going on. Accordingly, I think you'd be well served by reading "perlintro -- a brief introduction and overview of Perl": it's not particularly long and should really help you to understand the code you're writing.

    The next step is how to resolve these issues.

    Take a look at the readdir function to "get a directory listing". You'll see in the first example it uses the -f file test (to check for plain files); like the -z you used in your code (to check for files of zero size). [For reference, here's all the unary file test operators.] That example also uses a regex: I addressed your regex above (i.e. /^\d+$/).

    Making a small modification to that example, and putting it in a script:

    #!/usr/bin/perl -l use strict; use warnings; use autodie; my $dir = '.'; opendir(my $dh, $dir); my @emptyfiles = grep { -f && /^\d+$/ && -z } readdir $dh; closedir $dh; print for @emptyfiles;

    With these files available (and 12 being the only one with any content):

    $ ls -l 1 12 123 -rw-r--r-- 1 ken staff 0 8 Jan 03:00 1 -rw-r--r-- 1 ken staff 7 8 Jan 03:00 12 -rw-r--r-- 1 ken staff 0 8 Jan 03:00 123

    Running that script, gives this output:

    1 123

    — Ken

      I feel a bit bad about myself now lol! Anyway, I hope I'll learn. At some point. In all honesty yes I did hope it would work, but I din't just throw it in, I spent quite a lot of time on this rubbish code I wrote. I have completed a Perl tutorial and I have read perlintro, so obviously I'm just not very talented in coding if I manage to make one million mistakes in 10 lines. I'll try to study all your suggestions now, thank you very much Ken.

        Don't feel bad about yourself or think you're no good at coding. None of us were born knowing Perl; you've just started learning whereas I've been coding in Perl for over two decades: it's not unreasonable to expect that I might be a bit better at it than you. :-)

        — Ken

        Yes, as Ken posted, we all started out as n00bs. Never let that concern you, the goal is learning and one never really stops learning coding, unless they give up on coding outright. Just keep at it and you'll have many a 'lightbulb' moment where what you thought was a huge hurdle suddenly makes perfect sense and you've conquered it. Apply that to everything you write from then on out and keep searching for those lightbulb moments.

        There's a saying floating around the internet somewhere, I can't remember where I first saw it, but, regardless of the language, it's gospel truth:
        If you can't look at code you wrote 6 months ago and be completely terrified at what you wrote, you did not learn enough.

        That's the beauty and the curse of programming, there's always something new to learn and conquer

        --- Where Earth and Spirit Unite.

Re: create array of empty files and then match filenames
by stevieb (Canon) on Jan 07, 2016 at 16:01 UTC

    Welcome to the Monastery, angela2!

    The following example will look into a directory specified by $dir, check whether the file name consists of only a series of digits, then adds the full path to @empty_files array, but only if the file is actually empty.

    use strict; use warnings; my $dir = 'test'; opendir my $dir_handle, $dir or die $!; my @files = readdir $dir_handle; my @empty_files; for my $file (@files){ my $path = "$dir/$file"; next if -d $path; # skip if file is a directory if ($file =~ /^(\d+)$/){ next if ! -z $path; # skip if file is not empty push @empty_files, $path; } } for my $file (@empty_files){ print "$file\n"; }
Re: create array of empty files and then match filenames
by mr_mischief (Monsignor) on Jan 07, 2016 at 17:04 UTC

    I don't see in your original where you're interacting with the file system to find these files and put them into the array. The usual advice at this point is to use File::Find or File::Find::Rule. However it's not that hard for a simple case as other monks have shown. Keep the modules in mind for when you need to find things across nested subdirectories and such though.

Re: create array of empty files and then match filenames
by angela2 (Sexton) on Jan 07, 2016 at 17:47 UTC

    Can I also ask one more thing that I will need later? I was just trying to sketch on a paper how it would go but didn't manage to do it. If I have two files in my directory, the same way I described, that have only numbers in their filenames, for example 8 and 9, this line

    my @emptyfiles = grep { /^\d+$/ } grep { -f } glob '*';

    would match both of them. How could I match them separately? As in, put the "biggest number" in a variable and the "biggest number minus 1" in another variable?

    I tried this:
    my $emptyfile; #structure file, new. my $previous; #structure file, alternative conformation. my @emptyfiles = grep { /^\d+$/ } grep { -f } glob '*'; $previous = ($emptyfile)-1); #set that $previous is the previous alter +native conformation. #print "previous is $previous \n"; #push (@emptyfiles, $emptyfile); #push (@emptyfiles, $previous); foreach (@emptyfiles) { if (-z "$emptyfile") { print "found previous conformation: $previous \n"; print "found new conformation: $emptyfile \n"; } # if -z loop. } #foreach loop.

    But this obviously doesn't work. It keeps saying that $previous is -1, although in my example I have used files by the number of 8 and 9. So it should say yes, I found 9 and then I found 8.

    When I use the push command again it doesn't work, so I obviously don't know how to use it and have to do some homework.

    I found this http://stackoverflow.com/questions/10701210/how-to-find-maximum-and-minimum-value-in-an-array-of-integers-in-perl that mentions function "List::Util", does that sound correct for my purpose?"

    Also sorry for the extensive commenting on my script, I need it so that I won't get lost.

      You probably want something like this

      #!/usr/bin/perl use strict; my @emptyfiles = grep { /^\d+$/ } grep { -f } glob '*'; foreach my $emptyfile (@emptyfiles) { my $previous = $emptyfile - 1; if ((-z $emptyfile) and (-f $previous)) { print "found previous conformation: $previous \n"; print "found new conformation: $emptyfile \n"; } # if -z loop. } #foreach loop.

      Have a look at scalars and arrays in Perl variable types.

      poj
        I wish I had seen this earlier! I tried "and" but I got compilation errors (again one more mistake lol) so in the end I did this:
        use List::Util qw( min max ); my @numbers; @numbers = grep { /^\d+$/ } grep { -f } glob '*'; my $max = max@numbers; my $previous = ($max)-1; foreach (@numbers) { if (-z "$max") { if (-z "$previous") { print "previous conf: $previous \n"; print "Latest conf: $max \n"; printf "Ready for conf %d \n",++$max; } } }
        Now I need to think about "else" messages/actions. Sorry, I changed my array name halfway through the thread, I just used directly the "@numbers" example from List::Util that I found at stack overflow

      If you numerically sort your filenames in descending order, you will have the biggest in the first element (index 0) and the next biggest in the next element.

      Modifying the code I provided earlier:

      #!/usr/bin/perl -l use strict; use warnings; use autodie; my $dir = '.'; opendir(my $dh, $dir); my @emptyfiles = sort { $b <=> $a } grep { -f && /^\d+$/ && -z } readd +ir $dh; closedir $dh; print for @emptyfiles; print "Biggest: $emptyfiles[0]"; print "Next biggest: $emptyfiles[1]";

      Gives this output:

      123 1 Biggest: 123 Next biggest: 1

      So, they're effectively in separate variables already (i.e. $emptyfiles[0] and $emptyfiles[1]). There's probably no need to use other variables but, if you really need to, just do something like this (untested):

      my ($biggest, $next_biggest) = @emptyfiles[0, 1];

      Regarding your problem with "It keeps saying that $previous is -1", there's at least two problems. Firstly, the code you posted

      $previous = ($emptyfile)-1);

      won't compile. You'll get something like: "syntax error at ... line ..., near "1)""

      So, the code you posted isn't the code you're running; however, given you don't initialise $emptyfile, subtracting 1 from it will evaluate to -1 and should also give you a warning (assuming you still have use warnings; at the top of your code). Here's a minimal example:

      $ perl -wE 'my $x; say($x - 1)' Use of uninitialized value $x in subtraction (-) at -e line 1. -1

      — Ken

        Hi! I tried the "sort" function for a different thing I wanted to do, but it doesn't work as expected.
        my @confs = (); push (@confs, $confs); my @sortedconfs; @sortedconfs = sort {$a cmp $b} @confs; print @sortedconfs; print "\n";

        So I put all my conformations ($confs) in an array (@confs) and then I want to sort them numerically. However the output I get is only one file, the biggest number one, not a row of numbers sorted numerically. What do you think might be going wrong?

        I'm also concerned by the fact that if I do "sort {$b cmp $a}" I get the same result as with {$a cmp $b}, shouldn't I get the reversed order? The variable $confs is my numbered chemical conformations, which is always 4 characters long in the style of B001, B002, B003 and so on. This variable is in my filenames along with other similar ones (similar to a filename like MOLEC1-B001-OPT-FREQ2), so I want to sort my files, according to their chemical conformations, numerically. But if I do "sort {$a <=> $b}" it complains that "it's not a numerical value", that's why I tried with cmp as seen here http://www.perlmonks.org/?node_id=259465. Any thoughts?

        edit: I read a bit more and found that in mixed strings I should extract the numerical value and sort the substring, so I changed the previous "sort" line with "{ substr($a, 1) <=> substr($b, 1) }" but I get the same output :(

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1152200]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 19:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found