punklrokk has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
Hi Monks!
So I'm trying to write a basic virus scanner, and I'm unable to get binary strings to match. Is there anything obvious I'm missing in my code? Thanks!
$x=$y=0;
$data_file='C:\scripts\sample_virus';
open(FH, "<:raw", $data_file) or die("Could not open file dumbass!");
@raw_data=<FH>;
close(FH);
$virus_defs='c:\scripts\signatures'; #puts signatures into array @def_
+input
open(FH, "<:raw", $virus_defs) or die ("Could not open $virus_defs");
@def_input=<FH>;
close(FH);
$fileLen=scalar(@raw_data); #find number of file lines
print "Array length: $fileLen\n";
foreach $line (@def_inputs) { #puts text file into a stepped array
#print "x= $x\n";
@defs[$x]=$line;
#print "@defs[$x]\n";
$x+=1;
}
print "\n";
$x=0; #reset counter
foreach $line (@raw_data) { #compares line by line for virus
if ($line eq @virus[$x]) {
print "Infection line $x \n";
$x+=1;
} {
print "$line\n";
print "@virus[$x]\n";
}
}
JP Bourget (punklrokk)
MS Information and Security
Rochester Institute of Technology
Rochester, NY
Re: Binary Comparision
by samtregar (Abbot) on Feb 27, 2007 at 23:52 UTC
|
You're missing a call to binmode(), which is essential for reading and writing binary files under Windows. You're also trying to read a binary file into an array of lines. Binary files don't really have lines, so this won't work. Even if they did, you can't be sure your checker can identify a virus by looking at just one line of a file, so you'll have to be able to match across arbitrary sections of the file.
On a higher level, a virus checker can't really work by loading the entire file into memory - some files you'll have to check won't fit in memory!
-sam
| [reply] |
|
#!perl -l
open(my $fh, '<', $0);
print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # txt
open(my $fh, '<', $0);
binmode($fh);
print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin
open(my $fh, '<', $0);
binmode($fh, ':raw');
print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin
open(my $fh, '<:raw', $0);
print(substr(<$fh>, -2) eq "\r\n" ?'bin':'txt'); # bin
| [reply] [d/l] [select] |
|
How would I read the file other than into an array? Is there a way to access it a bit at a time?
JP
| [reply] |
|
my $raw_data; { local $/; $raw_data = <FH>; }
Is there a way to access it a bit at a time?
Yes, using read. | [reply] [d/l] [select] |
|
{
local $/ = \1024;
while (<$fh>) {
# do things with 1k chunks in $_
}
}
| [reply] [d/l] |
Re: Binary Comparision
by ikegami (Patriarch) on Feb 28, 2007 at 00:16 UTC
|
if ($line eq @virus[$x])
should be
if (index($line, $virus[$x]) >= 0)
because the signature might not occupy the entire "line".
Furthermore, the virus could span more than one "line", so
if (index($line, $virus[$x]) >= 0)
should be
if (index($raw_data, $virus[$x]) >= 0)
If you start using read, be careful not to reintroduce the problem. The signature could span more than one block of data.
Next, you should optimize your algo to search for more than one virus at once. For example, if Virus.A has signature "asdfgjkl" and Virus.B has signature "asdfhjkl", it could be faster to search using /asdf[gh]jkl/.
| [reply] [d/l] [select] |
|
When I've built these sorts of byte and bit scanners in the past I've tended to use two buffers to read into. I make one buffer the size of the largest search string and the other twice that size. Then I scan across the larger buffer to the halfway point, move the 2nd half of the data in the buffer up and stick a new chunk on it.
Later, I gave up on that and just used a 4 or 8k buffer that I loaded the second half of and worked in it. This strategy beats the file-size limit and the span issue at the same time.
Bit-wise scanning sucks, tho. That just isn't fun in Perl 5.
--
$you = new YOU;
honk() if $you->love(perl)
| [reply] [d/l] |
|
Thanks, but I've already implemented that my other post. I even optimized it to limit the amount of bytes that are scanned twice.
$block = substr($block, -($longuest_sig_len-1)) . $_;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
Minimal portion of last block New block
| [reply] [d/l] |
Re: Binary Comparision
by graff (Chancellor) on Feb 28, 2007 at 01:28 UTC
|
What do you mean by "binary string"? Are your input files actually line-oriented text data, as implied by your use of while (<>), or are they streams of non-text byte values (e.g. machine instructions, values to be loaded into registers, etc)? If the latter, is there any sort of structure for either of the inputs (fixed record length, or some sort of record delimiter)?
I expect the information given in earlier replies will be helpful, but based on the information you haven't given (sample data, or some description of it), I can't really tell what should be changed in your code, let alone how to change it. | [reply] [d/l] |
|
Here is the example binary, it's streams of machine code. It's actually in Portable Executabe (PE) format. I'm not sure how to figure out whether there is structure or not (in the context of line breaks, etc...)
All that being said, I just realized my approach needs to be modified to basically take my defs, for example (say we are looking for <>,4044484<4@4D4H4</code> within streams, which do have an end, I just haven't figured out how to find them yet.
I see I actually want to do a match /,4044484<4@4D4H4/ on each line as opposed to comparing them. Which I can handle that. My part that I still am having trouble grasping is how do I treat this binary data? I think I'm just used to always working with text so this binary is throwing me off.
Hope this helps you guys help me.... (See "virus.a" below (but cut wayyy short)
JP
MZP @
+ !L!This program must be run under Win32
$7
+
+
+
+
+ PE L 1;
+ @ P
+ ! 0
+ PT p
+ .text `.data
+ F @ .tls ` @
+ .rdata p @ P.idata 0 "
+ @ @.edata 2 @ @.rsrc 0
+ 0 4 @ @.reloc ` V d @ P
+
+
+
+
+
+
+ fb:C++HOOKD D D R
+j y ‹# Z|" S# j 5 YhpD j S D j s 5 3 D
+áD ` PSh ù tM=D s
Qj Pp u
PP5D u 5D u _ù tnu D s z
+Ã=D r(5D ]u tPj蟷 P 5D lu Ã=D r5
+D u áD dg‹, ‹ÐSh D$PE P. ‹ ‹؉D
+u
D D D M [ÐD J D ;D tP荶 ÐU‹‹E‹U
+} tE E ‹
lE E E E D 3D 3D E <=E u8@
+ D D D ‹tE E4‹E ]ÐSV‹|E =E u
+; t‹3҉‹փ; u^[ÐU‹3Uh@ d0d E 3ZYYdh@
+ ]Ð-E U‹SVWD ' fE ‹E ‹ ‹E ‹‹
E ‹lD E ‹ U fE ‹E ‹‹U fE ԃ 3‹Ud
+ _^[‹] \@ Exception & 0 D T
+ @ X Sysutils::Exception ,@ x@
+U‹ UE@D
& E } |&fE M‹E c M3‹Ev ‹Md
} ~‹E蒳 ‹]Ð 0 @ D D
+H System::TObject 0 D H
+ dD L System::AnsiString \@ Exception * U‹j
+ hD hxD j : ]U‹j hD hD j " ]U‹ЈU~莲 MЈU
+ED $ fE ‹M3‹E4 Ej hlD hTD j Ѹ ‹Ud ‹E}
+ tF ‹]ÐU‹ЈU~" MЈUE4D G$ fE ‹M3‹E
+ E‹Ud ‹E} t ‹] @ TForm1 * U‹UE
+E ‹ YY]U‹̉UЉEԸD # fE Ep ‹E‹M‹ %M
+ fE E EhD u! uw‹E ‹ ME
+ ׯ ‹Md
‹]ÐU‹؉EXD 9# fE ‹U3ɉ
‹Ed ‹E‹]U‹QE‹E8 t‹U‹D Y]ÐU‹MUE‹Ef8
u
3‹E‹] D@ TForm * x@ AnsiString *
+0 @ P |@ T Forms::TForm @
+ 0 8 H @ L TForm1
+D@ @ TForm1D JA V vp U‹貯 UE
+$D ! } |m3‹EE ‹Ud } ~‹Ei ‹]ÐU‹ȉM
+UˉE̸dD {! E UE詭 EfE UE蕭 EUE臭 E
+0fE UEq E‹UˡC g ME ‹ ME {
+ ME k ME [ ‹Md
‹] U‹ЈU~肮 MЈUED fE u‹M3‹E1
+ E‹Ud ‹E} tJ ‹] U‹> UED Q E
+ } |fE m3‹E蘀 ‹Ud } ~‹E ‹] 0
+ D T PA X Forms::TCustomForm H@
+ x@ $ @ 8 @ L 0 L \
‹A ` Forms::TScrollingWinControl |@ 0
+ \ ` ,@ d System::DelphiInterface<Form
+s::IDesigner> 0 \ `
+@ d System::DelphiInterface<Forms::IOleForm>
+0 H X B \ Controls::TWinControl
+ l@ @ H x@ X U‹UE} t)‹E8 t
+‹U‹
Q‹P‹U3ɉ
Et u YYY]ÐU‹UE} t)‹E8 t‹U‹
Q‹P‹U3ɉ
Et u^ YYY]Ð 0 D T 8VB X
+ Controls::TControl <@ x@ p 0
+` d @ h System::DelphiInterface<Controls::IDo
+ckManager> $ 0 D T lyC
+X Classes::TComponent @ x@ U‹UE}
+ t)‹E8 t‹U‹
Q‹P‹U3ɉ
Et u YYY]Ð 0 H X 2C \
+ Classes::TPersistent ,@ U‹-E r]]U‹
+E r]]ì@ @ C D D D
+D D (D <D |@ tD @ @ TOrderedList@
+ @ `@ D D D D D (D <D |@ @ @ @ TS
+tack‹RЋÐSVt6 ‹‹3‹ XC F‹Ƅt
+g d ‹^[SVY ‹‹‹F ‹Ӏ‹ ~‹& ^[Ћ
+P‹JI‹‹P
ÐSV‹‹‹R‹‹C‹PJ# ‹^[Ћ@ ÐU‹3Uh @ d0d
+E 3ZYYdh @ f ]Ð-E U‹3UhI @ d0d E 3ZYYdhP
+ @ . ]Ð-E è @ !@ @ l8@
+C D yC D D (D <D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C
+ C D!@ C C tC |C xC ;@ THintAction !@ THintAction @ 89@
+ StdActns pC ` \B@ HintSVt* ‹‹3‹
+ FP ‹Ƅtf d ‹^[ÐU‹3Uh!@ d0d E 3ZYYd
+h!@ ]Ð-E "@ "@ "@ C
+ D D D D D (D <D 5@ 5@ TChangeLink"@ TImageInde
+x "@ "@ "@ "@ ` C C D yC
+D D (D <D #@ -@ 01@ ,@ }C L|C }C ~C C }C }C C D#@ )
+@ #@ .@ .@ TCustomImageList"@ TCustomImageList"@ C
+ ImgList ‹ЁtJt ø ЋЁ t u
+ø ÐSVt* ‹‹3‹U F( F$ ‹‹R4‹ƄtU
+ d ‹^[ÐSVWD ‹‹‹O
‹‹ ‹w@~ ‹GD ‹ ‹G@ 3G@‹GHt ‹Ӏ‹U
+ ~‹ _^[ÐU‹j S‹3Uh$@ d0d XC F C@‹C$|
= {(}!UE ‹MC z ] C, C5‹e
C7 C8C<@ S CD‹5 3ZYYdh$@ ED
+[Y]Ãx0 Ãx0 u U‹SEj E3Uhi%@ d0d ‹E‹XD‹E‹
+@$P‹E‹@(P‹EP蟥 ‹‹l ‹1 ‹@3C ‹‹R PEP‹‹R,‹
+33U EP‹ Z 3ZYYdhp%@ ‹EPj
‹E‹@Ht
‹E3҉PH[‹]S‹‹C(D$‹C$$TD$PR譤 ‹D$C(‹$C$‹
+YZ[ÐSV‹‹‹ t‹‹s0‹f ^[ÐS‹‹‹C0[ÐSV
+W‹‹‹‹‹F t/‹ <u ‹‹R`#‹‹‹Q‹‹Qh‹‹R`
+‹FD‹R`_^[ÐS‹‹Vt{6 u
‹P譣 3C0‹f [ÐU‹j SVW‹3Uh'@ d0d ‹s,VV3C5‹
+D
P‹C$P‹C(PX ‹{0u!UE ‹MC ‹C8=t
+ ‹‹ 3ZYYdh'@ E ` _^[Y]U‹SV‹‹ډE@
+ E3Uh'@ d0d @ E3Uh'@ d0d ‹EG‹M‹‹E~
+P‹M‹‹EpP‹E‹@0Ps E3ZYYdh'@ ‹E 3ZYYd
+h'@ ‹E ‹Ef ‹E^[‹]ÐS‹‹t‹PS
+ [3[ÐU‹j SV‹‹3Uh(@ d0d ‹;|!UE ‹MC
+ x [ ‹TtV‹P ‹f 3ZYYdh(@ EZ
+ ^[Y]ÐlÐSV‹‹‹t‹@P‹C0P訡 s8‹f
+ ^[ÐS‹‹t‹PM [‹C8[ÐU‹SVW‹U‹‹
+ } tD‹EP‹C<P‹Pj j ‹EP‹EP‹ P‹E
+P‹P e {H u.@ U ‹sH‹7 ‹S(‹‹Q@‹S$‹‹
+Q4‹CH ‹@ ‹C$PEP‹K(33 EP‹CHj Z$ j j h
+j j j j ‹CHK P‹EP‹Pz ‹EC$PEP‹MK(‹U‹Ed ‹
+CH E‹G ‹ ‹h VY j V' hF j j ‹EP
+‹C$P‹C(P‹E@P‹E@PV ‹G ‹N ‹h V j V hF
+ j j ‹EP‹C$P‹C(P‹EP‹EPV躟 _^[‹] U‹SVW‹‹‹‹t.W‹EP
+3C4‹D 3ҊS7D PEP‹‹U‹‹S0_^[] U‹SVWUE‹E‹@$
+PEP‹E‹H(33; ‹E?
3Uh,@ d0d @ a E3Uh›,@ d0d ‹E‹P$‹E‹Q4‹E‹P(‹E‹
+Q@@ ( E3Uh~,@ d0d ‹E ‹E‹P$‹E‹Q4‹E‹P(‹E‹Q
+@‹EPj ‹N|mF3‹EB ‹U‹ j j j ‹ PS‹EP ‹E
+ ‹U‹ jj j ‹ PS‹EP ‹M‹U‹ECNu3ZYYdh,@
+ ‹E 3ZYYdh,@ ‹E 3ZYYdh,@ ‹E
+ _^[‹]ÐSV‹‹t‹‹‹N^[ÐSV‹‹u
‹[^[‹‹4"@ ‹F5C5F7C7V4‹ F6C6‹
+‹‹H‹iu ‹f‹C$P‹C(P‹P ‹|PҜ
+‹‹$‹F<C<‹‹1 ‹‹z ^[ÐSV‹‹‹‹4"@ Q
+ F5C5F7C7V4‹ F6C6‹F<C<‹‹‹‹‹
+u ‹‹C$P‹C(P‹P] ‹P ‹‹l‹‹
+^[‹‹ ^[U‹j SV‹‹3Uh.@ d0d t9‹‹R ;C$|‹‹R,
+;C(}!UE ‹MC # 3ZYYdh.@ E) ^[
+Y]SV‹:V4tV4‹f ^[ÐS‹‹33 [SVWU‹FL~P 8‹F@t
+"‹xO|G3‹‹F@ f? EOuf~Z t‹‹F\VX]_^[ÐSVWU‹‹‹C@
+t.‹xO|&G3‹‹C@ ;u3E‹‹C@y FOu]_^[ÐB‹H@t‹
+ ÐSVW‹‹‹ ‹‹ ;u‹ ‹‹V‹C u3_^[
+U‹SVW‹‹t‹‹‹;t E ‹u‹
+u E @C ' E3Uh0@ d0d ‹U‹ @C
+E3Uh0@ d0d ‹U‹ ‹U‹EE3ZYYdh0@ ‹E
+3ZYYdh0@ ‹E E_^[‹]ÐU‹S‹E‹@x t7‹E‹@‹X ‹
+‹4"@ t‹E‹‹E‹@t3[]‹E‹@[]ÐU‹
+SUE‹U‹E`K ‹EPh4@ ‹EPh$5@ UtY‹Ⱥ1@ ‹E‹S[YY]
+ Bitmap U‹SVW‹ډEU ‹ U ‹ @
+ E3Uh 4@ d0d ‹d ‹‹‹E‹QP‹U‹\ @ E3
+Uh3@ d0d ‹‹E‹QP@ E‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4
+@ E‹Eq ‹E‹P(‹E‹Q@‹E‹P$‹E‹Q4‹E‹@$PEP‹E‹H(3
+3 ‹E 3Uh3@ d0d ‹E‹R ‹U‹J$H @E3}
+ ‹E‹R,‹U‹J(‹N F3ۃ} ‹E‹@$PEP‹E‹H(‹E‹P$
+‹E‹@( EP‹E P‹E UY ‹E‹@$PEP‹E‹H(‹E‹P$
+‹E‹@(H EP‹E P‹E UYN ‹M‹U‹EMCNQ
+GM3ZYYdh3@ ‹E ‹E ‹E 3ZYYdh3@
+‹E ‹ 3ZYYdh4@ ‹E n _^[‹]ÐU‹QS‹j ‹ʡ0C
+=O E3Uh4@ d0d ‹EtP ‹‹k‹u‹
E C 9 3ZYYdh4@ ‹E [Y]SVW‹‹‹
+‹ D$‹Թ ‹‹ST$ ‹‹S$f‹$fD$‹T$‹Y
+ ‹$;D$tIu|$Lu‹‹ ‹‹l_^[ÐU‹QS‹j ‹ʡ0C
+ 1N E3Uh5@ d0d ‹EtP‹P$ u‹
E C 5 3ZYYdh5@ ‹E [Y]@PSxP ~HPx
+L t
@L f [ÐSV ‹‹‹Ft‹o‹Ӏ‹› ~‹ ^[ÐSf
+x
t‹‹P‹CS[ÐU‹3Uh56@ d0d E 3ZYYdh<6@ B ]Ð-
+E L6@ TContainedAction7@ C ActnList pC H ;@ d:@
+ Category6@ TActionLink9@ C ActnList 7@ T
+7@ L6@ f7@ 7@ P !C C D yC D D (D <D :@ 4C |
+C 3C }C L|C :@ ~C C }C }C C C C C tC |C xC ;@
+ pC H X;@ ;@ h:@ x:@ TContainedAction7@
+ D8@ 8@ .8@ H C C D yC D D (D <D <@
+4C |C 3C }C =@ }C ~C C }C }C C 4<@ (>@ <@ 4=@
+>@ ?@ TCustomActionListD8@ TCustomActionList7@ C ActnList
+ 8@ 9@ <9@ 9@ (9@ 6@ C D yC D D (D
+<D 8@@ p@@ |C 3C }C L|C :@ C@ C }C }C C ?@ C C tC |
+C xC ;@ pC T pC ` D@ <D@
TCustomAction<9@
TCustomAction8@ H6@ ActnList 9@ :@
+ P!C D D D D D (D <D DC @C xC ́C C ЁC C C
+C 0?@ D?@ X?@ l?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ T
+ActionLinkSV ‹‹‹FLt‹ ‹Ӏ‹G ~‹ ^[ЋPLt
‹R$ ÃÐЋPLt‹B ÃxL tB ÐSVW‹‹‹‹B ‹
+~,‹‹7@ d t ‹‹{ _^[ÐSVW‹‹‹|1‹SL‹z$‹W}
+3;‹N;t‹ג ‹CL‹@$‹‹ _^[ÐSV‹‹‹‹SHh tCH‹
+‹CLt‹R0^[ÐSV‹‹‹CL;tt‹ t ‹‹ ^[SV‹‹C
+u‹‹7@ › t ‹‹^[ÐSVW‹‹~Lt‹‹f u2
+E ‹ ‹ u ‹F u‹3Ҹ@ Ht3_^[ÐSVW‹‹~Lt‹
+‹f{ u2E ‹ ‹u u ‹F u‹3Ҹ? t Ht3_
+^[ÐU‹QSVt6 U‹3‹< XC C$!@ ‹s
+(^F=@ ‹À} tI d ‹^[Y]SVW8 ‹‹‹G( ‹;
+ ‹w$~ ‹G$ ‹Ӏ‹< ~‹ _^[ÐU‹SVUE‹E
+‹@$‹XK|$C3‹E‹@$‹ ‹P;Uu‹‹EUFKu^[YY] SVW‹‹‹‹‹C$
+! |‹‹6@ ‹]_^[ÐSV‹‹‹C,t‹S(‹ƉC,t‹S(
+‹‹C,l< ^[Ð;P,u‹R0ÐSVW‹‹‹‹‹‹> u*;~,u3‹
+‹‹6@ 2 t ‹‹% _^[ÐSV‹‹‹‹C$ ^L‹‹; ^[ÐSV
+‹‹‹‹C$ |3FL^[SVW‹f{2 t‹‹C4S0‹C$‹pN|F3‹‹C$
+ ‹R0GNuC t(‹s‹‹,sA t8 t‹8 ‹R_^[ÐSVWU‹‹
+‹F> ‹f‹F‹H ‹‹G$‹pN|,F3‹G$‹E f;hhu‹G$‹5 f
+ CNu3]_^[ÐSQ$ fx: t
‹‹‹C<S8$Z[ÐSQ$ fxB t
‹‹‹CDS@$Z[ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@
+ [ÐS‹‹C‹l8@ [ÐS‹‹C‹l8@ { [ÐS‹‹C‹l8@ g [Ð
+S‹‹C‹l8@ S [ÐS‹‹C‹l8@ ? [ÐÐÐÐÐÐÐÐ
+SVt~ ‹‹3‹A FPFYFdFj‹Ƅt d
+‹^[SV ‹‹‹FxI ‹F|A ‹Ӏ‹~‹b ^[ÐSVW‹‹
+‹‹l8@ tU‹ST‹‹X SX‹ SY‹ ‹S\‹R ‹S`‹
+‹Sd‹ f‹Sh‹K Sj‹ ‹‹ _^[ÐSVWU‹‹‹‹ST~
+tM‹C@‹xO|1G3‹‹C@ ‹`9@ t‹‹C@ ‹‹Q@FOuҍCT‹
+ ‹‹R0]_^[ÐSVWU‹‹:]XtF‹E@‹xO|1G3‹‹E@ ‹`9@ t
+‹‹E@z ‹‹QDFOu҈]X‹‹R0]_^[SVWU‹‹:]YtF‹E@‹xO|1G3‹‹E@
+; ‹`9@ 0 t‹‹E@" ‹‹QHFOu҈]Y‹‹R0]_^[SVWU‹‹;k\tF‹
+C@‹xO|1G3‹‹C@ ‹`9@ t‹‹C@ ‹‹QLFOu҉k\‹‹R0
+]_^[SVWU‹‹‹‹S` tM‹C@‹xO|1G3‹‹C@ ‹`9@ y t‹‹C@
+k ‹‹QPFOuҍC`‹ ‹‹R0]_^[ÐSVWU‹‹;kdtF‹C@‹xO|1G3‹
+‹C@# ‹`9@ t‹‹C@
‹‹QTFOu҉kd‹‹R0]_^[SVWU‹‹f;khtG‹C@‹xO|1G3‹‹C@ ‹`
+9@ t‹‹C@ ‹‹QXFOufkh‹‹R0]_^[ÐSVWU‹‹:]jtF‹E@‹x
+O|1G3‹‹E@o ‹`9@ d t‹‹E@V ‹‹Q\FOu҈]j‹‹R0]_^[
+SVW‹‹‹F‹VT u
‹^t
C t3‹‹‹: t‹F@x u ‹‹_^[SQ$fxr t
‹‹‹CtSp$Z[ÐS‹‹‹RD{Y t‹1u3[ð[ÐU‹3UhD@ d0d
+ E 3ZYYdhD@ ]Ð-E False True
+ . 1ҊPDÐSVWt;1ɊH‹D‹ Ht‹|1;Ju\
2uIu@t9~݃_^[Ð8u‹|m ÐSVW tJ
+* 1ۊXt^
|tD
f‹X9t ODu‹Ft ‹ Ȋ*ߊX l2luKu_^[SW‹:‹?
+?t 1ɊO\‹Jz‹Rrw
r
f‹r‹ss t% _[t% _[ÐSVW‹7‹6>t
+ 1ۊ^\‹W u‹wwr 0 r
fr_^[ÐVW‹V u‹~~wr
8_^_^Á ‹‹ 1ɊHL8rȪ_^ÐSVW ‹‹‹
+‹‹ k ‹‹‹ _^[ÐSV‹‹‹‹= ^[ÐVW‹V
+ u‹~~wr
8_^_^Á _^ÐSV‹‹‹‹ ^[ÐVW‹V u
+‹~~wr
8_^_^Á _^ÐU‹j SVW‹‹‹3UhG@ d0d E‹
+ ‹M‹‹3ZYYdhG@ E _^[Y]ÐS‹‹t
+t
t[[~[U‹SV1‹
‹ Y\m‹Jzwa+eH@ yH@ z‹Rr PL
+ H@ H@ H@ H@ H@ >>
D >Á 4yH@ ^[] SV‹‹‹‹ ^[ÐVW‹V u‹~
+~wr
8_^_^Á _^ÐS‹Zzw‹Rq1r [
+[Á ‹‹YX[ÐU‹z‹Jw‹Ruur
+ ‹U‹MH] U‹3UhI@ d0d E uD I D ‹pC
+ 3ZYYdhI@ ]Ð-E I@ TTextLayout
| [reply] [d/l] |
|
There are no differences[*] between text and binary files except how you open them. Your plan would fail for text too. Consider trying to match "def\nghi" in a file whose content is "abcdef\nghijkl". You have the same problem whether the file is text (lines) or binary (blocks). The problem you really have is not text vs binary. If you solve this problem for text files, you also solve it for binary files.
If you know the length of the longest signature, you could use
my $longuest_sig_len = ...;
my $block_size = 4096;
$block_size = int(($longuest_sig_len + 1023) / 1024)
if $block_size < $longuest_sig_len;
local $/ = \$block_size;
my $block = '';
while (<$fh>) {
$block = substr($block, -($longuest_sig_len-1)) . $_;
... search for signature in $block ...
}
That's the approach I'd take if I was looking for one string. There are surely algorithms that are more efficient at search for a number of strings.
* — You can even use while (<FILE>) on a binary file, but it might read more than you expect. Setting $/ to a reference to a number (e.g. $/ = \1024; and $block_size = 1024; $/ = \$block_size;) solves that.
| [reply] [d/l] [select] |
|
| [reply] |
Re: Binary Comparision
by GrandFather (Saint) on Feb 28, 2007 at 00:16 UTC
|
You are missing use strict; use warnings; and a bunch of declarations.
Probably also you intend $defs[$x]=$line where you have written @defs[$x]=$line (and similarly for @virus).
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
Re: Binary Comparision
by rhesa (Vicar) on Feb 28, 2007 at 00:31 UTC
|
Have you considered using an existing virus scanner? Clam AntiVirus comes to mind... | [reply] |
Re: Binary Comparision
by hangon (Deacon) on Feb 28, 2007 at 09:21 UTC
|
Binary files don't necessarily have a structure similar to line breaks. The easiest way to deal with them is to read the entire file into a variable, but file size and available memory can limit this method. Reading blocks of data solves that problem, but you need to overlap on each read in case what you're looking for spans the boundary between two blocks.
Here's another approach using read.
open (FH, "$data_file);
binmode FH;
# set the block size as large practical
# absolutely must be larger than any signature
my $blocksize = 65536;
# get length of largest signature
my $signaturelength = ...;
my $offset = 0;
seek FH, $offset, 0;
while (read FH, $block, $blocksize){
# check each signature against block
for $signature (@def_inputs){
if (# search block for $signature){
# do signature found stuff
}
}
# seek to next block
$offset += ($blocksize - $signaturelength);
seek FH, $offset, 0;
}
}
| [reply] [d/l] |
Re: Binary Comparision
by zentara (Archbishop) on Feb 28, 2007 at 13:08 UTC
|
Although virus scanning can be done with Perl, you will be unsatisfied with the speed. You should look at Clam AV It has a utility for making a binary fingerprint of any virus, and adding it to it's database of what to scan for. There are Perl modules available to use ClamAV.
| [reply] |
|
|