Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Binary Comparision

by punklrokk (Scribe)
on Feb 27, 2007 at 23:36 UTC ( [id://602412]=perlquestion: print w/replies, xml ) Need Help??

punklrokk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Hi Monks! So I'm trying to write a basic virus scanner, and I'm unable to get binary strings to match. Is there anything obvious I'm missing in my code? Thanks!
$x=$y=0; $data_file='C:\scripts\sample_virus'; open(FH, "<:raw", $data_file) or die("Could not open file dumbass!"); @raw_data=<FH>; close(FH); $virus_defs='c:\scripts\signatures'; #puts signatures into array @def_ +input open(FH, "<:raw", $virus_defs) or die ("Could not open $virus_defs"); @def_input=<FH>; close(FH); $fileLen=scalar(@raw_data); #find number of file lines print "Array length: $fileLen\n"; foreach $line (@def_inputs) { #puts text file into a stepped array #print "x= $x\n"; @defs[$x]=$line; #print "@defs[$x]\n"; $x+=1; } print "\n"; $x=0; #reset counter foreach $line (@raw_data) { #compares line by line for virus if ($line eq @virus[$x]) { print "Infection line $x \n"; $x+=1; } { print "$line\n"; print "@virus[$x]\n"; } }
JP Bourget (punklrokk) MS Information and Security Rochester Institute of Technology Rochester, NY

Replies are listed 'Best First'.
Re: Binary Comparision
by samtregar (Abbot) on Feb 27, 2007 at 23:52 UTC
    You're missing a call to binmode(), which is essential for reading and writing binary files under Windows. You're also trying to read a binary file into an array of lines. Binary files don't really have lines, so this won't work. Even if they did, you can't be sure your checker can identify a virus by looking at just one line of a file, so you'll have to be able to match across arbitrary sections of the file.

    On a higher level, a virus checker can't really work by loading the entire file into memory - some files you'll have to check won't fit in memory!

    -sam

      You're missing a call to binmode(),

      The OP used :raw in the open, a suitable alternative.

      #!perl -l open(my $fh, '<', $0); print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # txt open(my $fh, '<', $0); binmode($fh); print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin open(my $fh, '<', $0); binmode($fh, ':raw'); print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin open(my $fh, '<:raw', $0); print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin
      How would I read the file other than into an array? Is there a way to access it a bit at a time? JP

        How would I read the file other than into an array?

        You can read the entire file into a scalar using the following snippet:

        my $raw_data; { local $/; $raw_data = <FH>; }

        Is there a way to access it a bit at a time?

        Yes, using read.

        You can set $/ to a reference to a literal number. Then the file will be read in chunks of that size.

        { local $/ = \1024; while (<$fh>) { # do things with 1k chunks in $_ } }

        After Compline,
        Zaxo

Re: Binary Comparision
by ikegami (Patriarch) on Feb 28, 2007 at 00:16 UTC

    if ($line eq @virus[$x])
    should be
    if (index($line, $virus[$x]) >= 0)
    because the signature might not occupy the entire "line".

    Furthermore, the virus could span more than one "line", so
    if (index($line, $virus[$x]) >= 0)
    should be
    if (index($raw_data, $virus[$x]) >= 0)

    If you start using read, be careful not to reintroduce the problem. The signature could span more than one block of data.

    Next, you should optimize your algo to search for more than one virus at once. For example, if Virus.A has signature "asdfgjkl" and Virus.B has signature "asdfhjkl", it could be faster to search using /asdf[gh]jkl/.

      When I've built these sorts of byte and bit scanners in the past I've tended to use two buffers to read into. I make one buffer the size of the largest search string and the other twice that size. Then I scan across the larger buffer to the halfway point, move the 2nd half of the data in the buffer up and stick a new chunk on it.

      Later, I gave up on that and just used a 4 or 8k buffer that I loaded the second half of and worked in it. This strategy beats the file-size limit and the span issue at the same time.

      Bit-wise scanning sucks, tho. That just isn't fun in Perl 5.

      --
      $you = new YOU;
      honk() if $you->love(perl)

        Thanks, but I've already implemented that my other post. I even optimized it to limit the amount of bytes that are scanned twice.
        $block = substr($block, -($longuest_sig_len-1)) . $_; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ Minimal portion of last block New block
Re: Binary Comparision
by graff (Chancellor) on Feb 28, 2007 at 01:28 UTC
    What do you mean by "binary string"? Are your input files actually line-oriented text data, as implied by your use of while (<>), or are they streams of non-text byte values (e.g. machine instructions, values to be loaded into registers, etc)? If the latter, is there any sort of structure for either of the inputs (fixed record length, or some sort of record delimiter)?

    I expect the information given in earlier replies will be helpful, but based on the information you haven't given (sample data, or some description of it), I can't really tell what should be changed in your code, let alone how to change it.

      Here is the example binary, it's streams of machine code. It's actually in Portable Executabe (PE) format. I'm not sure how to figure out whether there is structure or not (in the context of line breaks, etc...)

      All that being said, I just realized my approach needs to be modified to basically take my defs, for example (say we are looking for <>,4044484<4@4D4H4</code> within streams, which do have an end, I just haven't figured out how to find them yet.

      I see I actually want to do a match /,4044484<4@4D4H4/ on each line as opposed to comparing them. Which I can handle that. My part that I still am having trouble grasping is how do I treat this binary data? I think I'm just used to always working with text so this binary is throwing me off.

      Hope this helps you guys help me.... (See "virus.a" below (but cut wayyy short)

      JP

      MZP    @     + !L!This program must be run under Win32 $7 + + + + + PE L 1; +       @     P  +        !  0 +  PT p  + .text     `.data +  F  @ .tls  `   @ + .rdata  p   @ P.idata 0  "  + @ @.edata    2 @ @.rsrc 0  + 0 4 @ @.reloc `  V d @ P + + + + + + + fb:C++HOOKD D D R +j y ‹# Z|" S# j 5 YhpD j S D j s 5 3 D +áD ` PSh ù tM=D s Qj Pp u PP5D u 5D u _ù tnu D s z +Ã=D r(5D ]u tPj蟷 P 5D lu Ã=D r5 +D u áD dg‹, ‹ÐSh D$PE P. ‹ ‹؉D +u D D D M  [ÐD J D ;D tP荶 ÐU‹‹E‹U +} tE E ‹ lE E E E D 3D 3D E <=E u8@ + D D  D  ‹tE E4‹E ]ÐSV‹|E =E u +; t‹3҉‹փ; u^[ÐU‹3Uh@ d0d E 3ZYYdh@ + ]Ð-E U‹SVWD ' fE ‹E ‹ ‹E ‹‹ E ‹lD  E ‹ U fE ‹E ‹‹U fE ԃ 3‹Ud + _^[‹]  \@ Exception &  0 D T +   @  X Sysutils::Exception ,@  x@  +U‹ U׉E@D & E } |&fE M‹E c M3‹Ev ‹Md } ~‹E蒳 ‹]Ð  0 @ D   D  +H System::TObject   0  D H   + dD  L System::AnsiString  \@ Exception * U‹j + hD hxD j : ]U‹j hD hD j " ]U‹ЈU~莲 MЈU׉ +ED $ fE ‹M3‹E4 Ej hlD hTD j Ѹ ‹Ud ‹E} + tF ‹]ÐU‹ЈU~" MЈU׉E4D G$ fE ‹M3‹E + E‹Ud ‹E} t ‹] @ TForm1 * U‹UE +E ‹  YY]U‹̉UЉEԸD # fE Ep ‹E‹M‹ %M + fE E EhD u! uw‹E ‹ ME + ׯ ‹Md ‹]ÐU‹؉EXD 9# fE ‹U3ɉ ‹Ed ‹E‹]U‹QE‹E8 t‹U‹D Y]ÐU‹MUE‹Ef8 u 3‹E‹]  D@ TForm *   x@ AnsiString *   +0 @ P   |@  T Forms::TForm @  +   0 8 H   @  L TForm1 +D@  @ TForm1D JA V vp U‹貯 U׉E +$D ! } |m3‹EE ‹Ud } ~‹Ei ‹]ÐU‹ȉM +UˉE̸dD {! E UE詭 EfE UE蕭 EUE臭 E +0fE UEq E‹UˡC g ME ‹ ME { + ME k ME [ ‹Md ‹] U‹ЈU~肮 MЈU׉ED  fE u‹M3‹E1 + E‹Ud ‹E} tJ ‹] U‹> U׉ED Q  E + } |fE m3‹E蘀 ‹Ud } ~‹E ‹]  0 + D T   PA  X Forms::TCustomForm H@  + x@ $ @ 8 @ L   0 L \ ‹A  ` Forms::TScrollingWinControl |@    0 +  \ `   ,@  d System::DelphiInterface<Form +s::IDesigner>   0  \ `   +@  d System::DelphiInterface<Forms::IOleForm>   +0 H X B  \ Controls::TWinControl + l@  @ H x@ X U‹UE} t)‹E8 t +‹U‹ Q‹P‹U3ɉ Et u YYY]ÐU‹UE} t)‹E8 t ‹U‹ Q‹P‹U3ɉ Et u^ YYY]Ð   0 D T   8VB  X + Controls::TControl <@  x@ p   0  +` d   @  h System::DelphiInterface<Controls::IDo +ckManager> $  0 D T   lyC  +X Classes::TComponent @  x@  U‹UE} + t)‹E8 t ‹U‹ Q‹P‹U3ɉ Et u YYY]Ð  0 H X   2C  \ + Classes::TPersistent ,@  U‹-E r]]U‹ +E r]]ì@ @  C D D D +D D (D <D |@ tD @ @ TOrderedList@ + @  `@ D D D D D (D <D |@ @ @ @ TS +tack‹RЋÐSVt6 ‹‹3‹ XC  F‹Ƅt +g d ‹^[SVY ‹‹‹F ‹Ӏ‹ ~‹& ^[Ћ +P‹JI‹‹P  ÐSV‹‹‹R‹‹C‹PJ#  ‹^[Ћ@  ÐU‹3Uh @ d0d  +E 3ZYYdh @ f ]Ð-E U‹3UhI @ d0d E 3ZYYdhP + @ . ]Ð-E è @ !@ @ l8@ +C D yC D D (D <D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C + C D!@ C C tC |C xC ;@ THintAction !@  THintAction @ 89@ +  StdActns pC ` \B@   HintSVt* ‹‹3‹ + FP ‹Ƅtf d ‹^[ÐU‹3Uh!@ d0d E 3ZYYd +h!@  ]Ð-E "@ "@ "@  C + D D D D D (D <D 5@  5@ TChangeLink"@  TImageInde +x "@ "@ "@ "@ ` C C D yC +D D (D <D #@ -@ 01@ ,@ }C L|C }C ~C C }C }C C D#@ ) +@ #@ .@  .@ TCustomImageList"@ TCustomImageList"@ C  + ImgList ‹ЁtJt ø ЋЁ t u +ø ÐSVt* ‹‹3‹U F( F$ ‹‹R4‹ƄtU + d ‹^[ÐSVWD ‹‹‹O  ‹‹ ‹w@~ ‹GD ‹ ‹G@ 3G@‹GHt ‹Ӏ‹U + ~‹ _^[ÐU‹j S‹3Uh$@ d0d XC F C@‹C$| = {(}!UE  ‹MC z  ] C, C5‹e C7 C8C<@ S CD‹5 3ZYYdh$@ ED  +[Y]Ãx0 Ãx0 u U‹SEj  E3Uhi%@ d0d ‹E‹XD‹E‹ +@$P‹E‹@(P‹EP蟥 ‹‹l ‹1  ‹@3C ‹‹R PEP‹‹R,‹ +33U EP‹ Z 3ZYYdhp%@ ‹EPj   ‹E‹@Ht  ‹E3҉PH[‹]S‹‹C(D$‹C$$TD$PR譤 ‹D$C(‹$C$‹ +YZ[ÐSV‹‹‹ t‹‹s0‹f ^[ÐS‹‹‹C0[ÐSV +W‹‹‹‹‹F t/‹ <u ‹‹R`#‹‹‹Q‹‹Qh‹‹R` +‹FD‹R`_^[ÐS‹‹Vt{6 u ‹P譣 3C0‹f [ÐU‹j SVW‹3Uh'@ d0d ‹s,VV3C5‹ +D P‹C$P‹C(PX ‹{0u!UE  ‹MC   ‹C8=t + ‹‹ 3ZYYdh'@ E ` _^[Y]U‹SV‹‹ډE@ + E3Uh'@ d0d @  E3Uh'@ d0d ‹EG‹M‹‹E~ +P‹M‹‹EpP‹E‹@0Ps E3ZYYdh'@ ‹E  3ZYYd +h'@ ‹E  ‹Ef ‹E^[‹]ÐS‹‹t‹PS + [3[ÐU‹j SV‹‹3Uh(@ d0d ‹;|!UE  ‹MC + x [ ‹TtV‹P ‹f 3ZYYdh(@ EZ + ^[Y]ÐlÐSV‹‹‹t‹@P‹C0P訡 s8‹f + ^[ÐS‹‹t‹PM [‹C8[ÐU‹SVW‹U‹‹ + } tD‹E P‹C<P‹Pj j ‹EP‹EP‹ P‹E +P‹P e {H u.@ U ‹sH‹7 ‹S(‹‹Q@‹S$‹‹ +Q4‹CH ‹@ ‹C$PEP‹K(33 EP‹CHj Z$ j j h +j j j j ‹CHK P‹EP‹Pz  ‹EC$PEP‹MK(‹U‹Ed ‹ +CH E‹G  ‹ ‹h VY j V' hF j j ‹EP +‹C$P‹C(P‹E@P‹E@PV  ‹G ‹N ‹h V  j V  hF + j j ‹EP‹C$P‹C(P‹EP‹EPV躟 _^[‹] U‹SVW‹‹‹‹t.W‹EP +3C4‹D 3ҊS7 D PEP‹‹U ‹‹S0_^[] U‹SVWUE‹E‹@$ +PEP‹E‹H(33; ‹E? 3Uh,@ d0d @ a E3Uh›,@ d0d ‹E‹P$‹E‹Q4‹E‹P(‹E‹ +Q@@ ( E3Uh~,@ d0d ‹E ‹E‹P$‹E‹Q4‹E‹P(‹E‹Q +@‹EPj ‹N|mF3‹EB ‹U‹ j j j ‹ PS‹EP ‹E + ‹U‹ jj j ‹ PS‹EP ‹M‹U‹ECNu3ZYYdh,@ + ‹E  3ZYYdh,@ ‹E  3ZYYdh,@ ‹E + _^[‹]ÐSV‹‹؅t‹‹‹N^[ÐSV‹‹؅u ‹[^[‹‹4"@   ‹F5C5F7C7V4‹ F6C6‹ +‹‹H‹iu ‹f‹C$P‹C(P‹P ‹|PҜ +‹‹$‹F<C<‹‹1 ‹‹z ^[ÐSV‹‹‹‹4"@ Q + F5C5F7C7V4‹ F6C6‹F<C<‹‹‹‹‹ +u ‹‹C$P‹C(P‹P] ‹P ‹‹l‹‹ +^[‹‹ ^[U‹j SV‹‹3Uh.@ d0d t9‹‹R ;C$| ‹‹R, +;C(}!UE  ‹MC #  3ZYYdh.@ E)  ^[ +Y]SV‹:V4tV4‹f ^[ÐS‹‹33 [SVWU‹FL~P 8‹F@t +"‹xO|G3‹‹F@ f? EOuf~Z t‹‹F\VX]_^[ÐSVWU‹‹‹C@ +t.‹xO|&G3‹‹C@ ;u3E‹‹C@y FOu]_^[ÐB‹H@t‹ + ÐSVW‹‹‹ ‹‹ ;u‹ ‹‹V‹C u3_^[ +U‹SVW‹‹؅t‹‹‹;t E ‹u‹ +u E @C ' E3Uh0@ d0d ‹U‹ @C  +E3Uh0@ d0d ‹U‹ ‹U‹EE3ZYYdh0@ ‹E  +3ZYYdh0@ ‹E  E_^[‹]ÐU‹S‹E‹@x t7‹E‹@‹X ‹ +‹4"@  t‹E‹‹E‹@t3[]‹E‹@[]ÐU‹ +SUE‹U‹E`K ‹EPh4@ ‹EPh$5@ UtY‹Ⱥ1@ ‹E‹S[YY] + Bitmap U‹SVW‹ډEU ‹ U ‹ @  + E3Uh 4@ d0d ‹d ‹‹‹E‹QP‹U‹\ @ E3 +Uh3@ d0d ‹‹E‹QP@ E‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4 +@ E‹Eq ‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4‹E‹@$PEP‹E‹H(3 +3 ‹E 3Uh3@ d0d ‹E‹R ‹U‹J$H @E3}  + ‹E‹R,‹U‹J(‹N F3ۃ}  ‹E‹@$PEP‹E‹H(‹E‹P$ +‹E‹@( EP‹E P‹E UY ‹E‹@$PEP‹E‹H(‹E‹P$ +‹E‹@(H EP‹E P‹E UYN ‹M‹U‹EMCNQ +GM3ZYYdh3@ ‹E ‹E ‹E  3ZYYdh3@ +‹E ‹ 3ZYYdh4@ ‹E n _^[‹]ÐU‹QS‹j ‹ʡ0C +=O E3Uh4@ d0d ‹EtP ‹‹k‹u‹ E C  9 3ZYYdh4@ ‹E  [Y]SVW‹‹‹ +‹ D$ ‹Թ ‹‹ST$ ‹‹S$f‹$fD$‹T$ ‹Y + ‹$;D$tIu|$Lu ‹‹ ‹‹l_^[ÐU‹QS‹j ‹ʡ0C + 1N E3Uh5@ d0d ‹EtP‹P$ u‹ E  C  5 3ZYYdh5@ ‹E   [Y]@PSxP ~HPx +L t @L f [ÐSV ‹‹‹Ft‹o‹Ӏ‹› ~‹ ^[ÐSf +x t ‹‹P‹C S[ÐU‹3Uh56@ d0d E 3ZYYdh<6@ B ]Ð- +E L6@ TContainedAction 7@ C  ActnList pC H ;@ d:@ +  Category6@  TActionLink9@ C ActnList 7@ T +7@ L6@ f7@ 7@ P !C C D yC D D (D <D :@ 4C | +C 3C }C L|C :@ ~C C }C }C C C C C tC |C xC ;@  +  pC H  X;@ ;@ h:@ x:@ TContainedAction7@ + D8@ 8@ .8@ H C C D yC D D (D <D <@ +4C |C 3C }C =@ }C ~C C }C }C C 4<@ (>@  <@ 4=@ +>@ ?@ TCustomActionListD8@ TCustomActionList7@ C  ActnList + 8@ 9@ <9@ 9@ (9@ 6@ C D yC D D (D +<D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C C ?@ C C tC | +C xC ;@   pC T pC `  D@ <D@ TCustomAction<9@  TCustomAction8@ H6@  ActnList 9@ :@ + P!C D D D D D (D <D DC @C xC ́C C ЁC C C +C 0?@ D?@ X?@ l?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ T +ActionLinkSV ‹‹‹FLt‹ ‹Ӏ‹G ~‹ ^[ЋPLt ‹R$  ÃÐЋPLt‹B ÃxL tB ÐSVW‹‹‹‹B ‹ +~,‹‹7@ d t ‹‹{ _^[ÐSVW‹‹‹|1‹SL‹z$‹W} +3;‹N;t‹ג  ‹CL‹@$‹‹ _^[ÐSV‹‹‹‹SHh tCH‹  +‹CLt‹R0^[ÐSV‹‹‹CL;tt‹ t ‹‹ ^[SV‹‹C  +u‹‹7@ › t ‹‹^[ÐSVW‹‹~Lt‹‹f u2 +E ‹ ‹ u ‹F u‹3Ҹ@  Ht3_^[ÐSVW‹‹~Lt‹ +‹f{ u2E ‹ ‹u u ‹F u‹3Ҹ? t Ht3_ +^[ÐU‹QSVt6 U‹3‹< XC  C$!@  ‹s +(^ F=@ ‹À} tI d ‹^[Y]SVW8 ‹‹‹G( ‹; +  ‹w$~ ‹G$ ‹Ӏ‹< ~‹ _^[ÐU‹SVUE‹E +‹@$‹XK|$C3‹E‹@$‹ ‹P;Uu‹‹E UFKu^[YY] SVW‹‹‹‹‹C$ +! |‹‹6@  ‹]_^[ÐSV‹‹‹C,t‹S(‹ƉC,t‹S( +‹‹C,l< ^[Ð;P,u‹R0ÐSVW‹‹‹‹‹‹> u*;~,u 3‹ +‹‹6@ 2 t ‹‹% _^[ÐSV‹‹‹‹C$ ^L‹‹; ^[ÐSV +‹‹‹‹C$ |3FL^[SVW‹f{2 t‹‹C4S0‹C$‹pN|F3‹‹C$ + ‹R0GNuC t(‹s‹‹,sA  t8 t ‹8 ‹R _^[ÐSVWU‹‹ +‹F> ‹f‹F‹H ‹‹G$‹pN|,F3‹G$‹E f;hhu‹G$‹5 f + CNu3]_^[ÐSQ$ fx: t ‹‹‹C<S8$Z[ÐSQ$ fxB t ‹‹‹CDS@$Z[ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@ +  [ÐS‹‹C‹l8@  [ÐS‹‹C‹l8@ { [ÐS‹‹C‹l8@ g [Ð +S‹‹C‹l8@ S [ÐS‹‹C‹l8@ ? [ÐÐÐÐÐÐÐÐ +SVt~ ‹‹3‹A FPFYFdFj‹Ƅt d + ‹^[SV ‹‹‹FxI ‹F|A ‹Ӏ‹~‹b ^[ÐSVW‹‹ +‹‹l8@  tU‹ST‹‹X SX‹ SY‹ ‹S\‹R ‹S`‹  +‹Sd‹ f‹Sh‹K Sj‹ ‹‹ _^[ÐSVWU‹‹‹‹ST~ +tM‹C@‹xO|1G3‹‹C@ ‹`9@  t‹‹C@ ‹‹Q@FOuҍCT‹ + ‹‹R0]_^[ÐSVWU‹‹:]XtF‹E@‹xO|1G3‹‹E@ ‹`9@  t +‹‹E@z ‹‹QDFOu҈]X‹‹R0]_^[SVWU‹‹:]YtF‹E@‹xO|1G3‹‹E@ +; ‹`9@ 0 t‹‹E@" ‹‹QHFOu҈]Y‹‹R0]_^[SVWU‹‹;k\tF‹ +C@‹xO|1G3‹‹C@ ‹`9@  t‹‹C@ ‹‹QLFOu҉k\‹‹R0 +]_^[SVWU‹‹‹‹S` tM‹C@‹xO|1G3‹‹C@ ‹`9@ y t‹‹C@ +k ‹‹QPFOuҍC`‹ ‹‹R0]_^[ÐSVWU‹‹;kdtF‹C@‹xO|1G3‹ +‹C@# ‹`9@  t‹‹C@  ‹‹QTFOu҉kd‹‹R0]_^[SVWU‹‹f;khtG‹C@‹xO|1G3‹‹C@ ‹` +9@  t‹‹C@ ‹‹QXFOufkh‹‹R0]_^[ÐSVWU‹‹:]jtF‹E@‹x +O|1G3‹‹E@o ‹`9@ d t‹‹E@V ‹‹Q\FOu҈]j‹‹R0]_^[ +SVW‹‹‹F‹VT u ‹^t C t3‹‹‹: t‹F@x u ‹‹_^[SQ$fxr t ‹‹‹CtSp$Z[ÐS‹‹‹RD{Y t ‹1u3[ð[ÐU‹3UhD@ d0d + E 3ZYYdhD@  ]Ð-E  False  True +  . 1ҊPDÐSVWt;1ɊH‹D ‹ Ht‹|1;Ju\ 2uIu@t9~݃_^[Ð8u‹|m ÐSVW t J +* 1ۊXt^ | tD f‹X9t ODu‹Ft ‹ Ȋ*ߊX l2luKu_^[SW‹:‹? +?t 1ɊO\‹Jz‹Rr w  r f‹r‹ss t% _[t% _[ÐSVW‹7‹6>t + 1ۊ^\‹W u‹w wr 0 r fr_^[ÐVW‹V u‹~~ wr 8_^_^Á ‹‹ 1ɊHL8rȪ_^ÐSVW ‹‹‹ +‹‹׹ k ‹‹‹  _^[ÐSV‹‹‹‹= ^[ÐVW‹V + u‹~~ wr 8_^_^Á _^ÐSV‹‹‹‹ ^[ÐVW‹V u +‹~~ wr 8_^_^Á _^ÐU‹j SVW‹‹‹3UhG@ d0d E‹ + ‹M‹‹3ZYYdhG@ E  _^[Y]ÐS‹‹t  +t t[[~[U‹SV1‹ ‹ Y\m‹Jz wa+eH@ yH@ z ‹Rr PL +   H@ H@ H@ H@ H@ >> D >Á 4yH@ ^[] SV‹‹‹‹ ^[ÐVW‹V u‹~ +~ wr 8_^_^Á _^ÐS‹Zz w‹Rq1r [ +[Á ‹‹YX[ÐU‹z ‹Jw‹Ru ur  + ‹U‹M H] U‹3UhI@ d0d E uD I D  ‹pC +  3ZYYdhI@  ]Ð-E I@  TTextLayout 

        There are no differences[*] between text and binary files except how you open them. Your plan would fail for text too. Consider trying to match "def\nghi" in a file whose content is "abcdef\nghijkl". You have the same problem whether the file is text (lines) or binary (blocks). The problem you really have is not text vs binary. If you solve this problem for text files, you also solve it for binary files.

        If you know the length of the longest signature, you could use

        my $longuest_sig_len = ...; my $block_size = 4096; $block_size = int(($longuest_sig_len + 1023) / 1024) if $block_size < $longuest_sig_len; local $/ = \$block_size; my $block = ''; while (<$fh>) { $block = substr($block, -($longuest_sig_len-1)) . $_; ... search for signature in $block ... }

        That's the approach I'd take if I was looking for one string. There are surely algorithms that are more efficient at search for a number of strings.

        * — You can even use while (<FILE>) on a binary file, but it might read more than you expect. Setting $/ to a reference to a number (e.g. $/ = \1024; and $block_size = 1024; $/ = \$block_size;) solves that.

        MZPÿÿ¸@º´
Re: Binary Comparision
by GrandFather (Saint) on Feb 28, 2007 at 00:16 UTC

    You are missing use strict; use warnings; and a bunch of declarations.

    Probably also you intend $defs[$x]=$line where you have written @defs[$x]=$line (and similarly for @virus).


    DWIM is Perl's answer to Gödel
Re: Binary Comparision
by rhesa (Vicar) on Feb 28, 2007 at 00:31 UTC
    Have you considered using an existing virus scanner? Clam AntiVirus comes to mind...
Re: Binary Comparision
by hangon (Deacon) on Feb 28, 2007 at 09:21 UTC

    Binary files don't necessarily have a structure similar to line breaks. The easiest way to deal with them is to read the entire file into a variable, but file size and available memory can limit this method. Reading blocks of data solves that problem, but you need to overlap on each read in case what you're looking for spans the boundary between two blocks.

    Here's another approach using read.

    open (FH, "$data_file); binmode FH; # set the block size as large practical # absolutely must be larger than any signature my $blocksize = 65536; # get length of largest signature my $signaturelength = ...; my $offset = 0; seek FH, $offset, 0; while (read FH, $block, $blocksize){ # check each signature against block for $signature (@def_inputs){ if (# search block for $signature){ # do signature found stuff } } # seek to next block $offset += ($blocksize - $signaturelength); seek FH, $offset, 0; } }
Re: Binary Comparision
by zentara (Archbishop) on Feb 28, 2007 at 13:08 UTC
    Although virus scanning can be done with Perl, you will be unsatisfied with the speed. You should look at Clam AV It has a utility for making a binary fingerprint of any virus, and adding it to it's database of what to scan for. There are Perl modules available to use ClamAV.

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://602412]
Approved by samtregar
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-20 04:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found