Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

regex for identifying encrypted text

by skendric (Novice)
on May 16, 2018 at 10:06 UTC ( #1214624=perlquestion: print w/replies, xml ) Need Help??

skendric has asked for the wisdom of the Perl Monks concerning the following question:

I write scripts which compare two text files and then do interesting things if they are different.
use Text::Diff qw(diff); [...] $diff = diff "$config_dir/$config_old", "$config_dir/$config_new", { STYLE => "OldStyle"}; @diff = split '\n', $diff; [...]
Typically, I want to ignore certain changes ... in the example below, I am uninterested in lines which contain the string 'set password ENC'. I end up writing code like:
LINE: for my $line (@diff) { next LINE if $line =~ /set password ENC/; [...] }
Now, I'm discovering that I am uninterested in changes to private keys ... a typical line in a file might look like this:
set private-key "-----BEGIN ENCRYPTED PRIVATE KEY----- MIIFDjBABgkqhkiG9w0BBQ0wMzAbBgkqhkiG9w0BBQwwDgQInXCep+2zzpgCAggA MBQGCCqGSIb3DHMHBAiSZZZ3CUL1cQSCBNhxHiU0wI3XOMU05aVZybU6OOJOJBa/ M+b28ad6P8VZiN+eToUfs3pTg+VqzAc273fdnZPZFMClXpJk8kQZv0ruEoA99RqE pgsnYGVxzZNmDy5HT3yBDGjRCssDnQ8QUBqabFCpW6d7fzilw9PnoHjFRmLxKnNE [...]
I'm struggling to figure out how to ignore such lines. My brain wants to construct a regex which identifies "random strings", so that I could write a line like:
next LINE if $line =~ /{looks like random stuff to me}/;
(1) Suggestions on how to construct such a regex?
(2) Suggestions on how to tackle the problem differently?

--sk

Replies are listed 'Best First'.
Re: regex for identifying encrypted text
by Eily (Monsignor) on May 16, 2018 at 10:22 UTC

    That looks like Base64 encoding. There are clues that can help you identify those lines (64 chars wide, except maybe the last, the chars are only those allowed by Base64 (no space), etc...) but since you have "BEGIN ENCRYPTED PRIVATE KEY", I'm guessing you might also have an END. If that's the case, the better solution might be to ignore all the lines between those two tokens. Using the "from..to" version of the .. operator, this could be something like: next LINE if ($line=~/BEGIN ENCRYPTED/)..($line=~/END ENCRYPTED/);

Re: regex for identifying encrypted text
by hippo (Chancellor) on May 16, 2018 at 10:28 UTC
    (1) Suggestions on how to construct such a regex?

    Very difficult to avoid false positives because the last line of the key may only be a few characters.

    (2) Suggestions on how to tackle the problem differently?

    Text::Diff will happily work on arrays/scalars as well as files, so pre-process your inputs to remove any private keys before doing the diff - that way they are easy to identify.

Re: regex for identifying encrypted text
by QM (Parson) on May 16, 2018 at 10:18 UTC
    I can think of 2 options:

    1) Write a regex that recognizes the full encryption multiline blob. This will require modifying how lines are defined, etc.

    2) That's pretty much it, unless you want to have the odd false positive, and decide that any string longer than 30 chars without whitespace or certain punctuation is actually part of a key. And also, not catch certain fumble finger changes where some such string was accidentally introduced in a comment or other free-form, non-parsed section. (I'm assuming that such a change in code would fail to parse, and would be caught fairly quickly.)

    Do you have a list of valid characters in private keys and such?

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: regex for identifying encrypted text
by cavac (Curate) on May 16, 2018 at 12:22 UTC

    So the problematic block of lines start with set private ", then some junk that doesn't containt quotes and then it ends with a quote, right?

    You seem to be parsing this line-by-line, correct? So, modifying your code to make a very rudimentary state machine, i'd guess something like this would do:

    my $isprivatekey = 0; for my $line (@diff) { if($isprivatekey) { if($line =~ /\"/) { # last line of private key $isprivatekey = 0; } next; } if($line =~ /set\ private\-key/) { # uh, get some useless stuff here $isprivatekey = 1; next; } next LINE if $line =~ /set password ENC/; [...] }

    This should skip the whole private key block altogether

    "For me, programming in Perl is like my cooking. The result may not always taste nice, but it's quick, painless and it get's food on the table."

      The problem with this approach is that the line with "set private" is the same in both files and therefore will not be featured in the diff. The data between these markers must be removed before the diff is performed.

        Would diff with context help? Might make the processing simpler in the long run.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: regex for identifying encrypted text
by james28909 (Deacon) on May 16, 2018 at 16:44 UTC
    Can you give an example file that you parse? I would say you could just change the input record separator to paragraph mode eg  local $/ = ""; when you find "set private-key ", then read one more time with <DATA>. I am not sure if enc data ends with '[...]' or "\n\n" or '"' or what.

    please post an example file.
    #here is a small test i did with the examples given in OP use strict; use warnings; while(<DATA>){ if ( /set private-key/){ print "\nsetting input seperator to paragraph mode\n"; local $/ = ""; #record separator will change itself back to de +fault AFTER leaving the if block print "skipping encrypted data\n"; <DATA>; print "encrypted data skipped and should not be printed !\n\n" +; next; } print if /\w+/; } __DATA__ random data more random data blah blah one last test BEFORE! encrypted data! set private-key "-----BEGIN ENCRYPTED PRIVATE KEY----- sdfkjghsdlkhfgldkfjghldkfjgh sdflkjgdfgl;kd;lfkgjdlfkgjd;l dlkjfghlkdfjghldskfjhgldskfjhg this is AFTER encrypted data more rand0m data this is the last test AFTER! encrypted data.
    The idea here is to read the file/s one line at a time until you find the desired string. then once you find that string, change the input record separator to stop at the end of the enc data (if possible). if enc data is always the same length, then you could find the desired string and then read that static length every time.
    EDIT: Cleaned up post... a little.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: regex for identifying encrypted text
by skendric (Novice) on Jun 29, 2018 at 22:57 UTC
    I ended up:

    (a) Pre-parsing both files into arrays

    (b) Using a crude state machine to skip the BEGIN ... END sections

    (c) Then handing the shrunken arrays to Text::Diff

    Thank you all for helping me to think through this.

    P.S. This script now allows me to checkpoint my firewall config files into a config database, so that I can do things like report on how frequently we change firewall configuration.

      I have to write a config diff script having the exact same request. Would it be possible for you to provide this part of your script? with diff -I I'm able to filter lines like 'set password ENC' etc. but I wasn't able to filter the certificate part
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1214624]
Approved by Eily
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2020-12-01 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?