C vs perl

mandog has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: C vs perl by Ovid (Cardinal) on Apr 28, 2002 at 06:16 UTC
Well, I can't comment on the C code, but I think you can make the Perl a bit clearer (or at least more correct). The following assumes that you want each `<p>...</p>` section on a new line. If not, the final join should be on an empty string. `sub MakePtag{ my $fixme = shift; # take in our parameters return join "\r\n", # join with newlines map { "<p>$_</p>" } # wrap each line in <p></p> tags grep { /\S/ } # must have at least one non-whitespace + character split "\r\n", $fixme; # break apart on the newlines }` [download] Of course, I'd be shot for suggesting: `sub MakePtag { join "\r\n", map {"<p>$_</p>"} grep {/\S/} split "\r\n" +, $_[0] } # :)` [download] If you stick with your solution, you'll want to chomp `$fixme` to avoid wasted tags on the end. Oh, and you had the slash on the trailing paragraph tag backwards :) `sub MakePtag{ chomp(my $fixme = shift;) # take in our parameters $fixme=~s\|(\r\n)\|</p><p>\|g; # replace all \r\n with <\p><p> $fixme = "<p>$fixme</p>"; # Add beginning and ending tags return $fixme; }` [download] Interesting question: what hoops would you have to just through with C to duplicate the grep functionality that I tossed in? Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
Re: C vs perl by samtregar (Abbot) on Apr 28, 2002 at 07:56 UTC
Well, that is some aggresively ugly C code. If one objective of you presentation is to convince competent C coders to try Perl then I suggest you take another run at it. With that code the first thing they'll think is that you just don't know enough C to know how much it rocks. Specific comments: Do it in one pass. No one likes to see an alogorithm that has to scan through the input text more than once. That might mean realloc()ing memory as you run short, but you should be able to take a good guess based on the input text length. Consider making use of the str() library routines. Perl has much better string support but it's not as though C is totally* lacking! Think about building your parser around as swicth-driven state-machine. This is how C parsers are commonly built. Maybe do it with YACC instead? No one should build parsers in C by hand once they learn YACC! I bet the YACC implementation would compare favorably with Perl. -sam	[reply]
Re: Re: C vs perl by abstracts (Hermit) on Apr 28, 2002 at 08:06 UTC
I totally agree with samtregar that this code really convinces nobody that you know enough C to compare the 2 languages. What I don't agree with is using realloc instead of scanning the string twice, as realloc will have to copy the string over to a new location if it fails to allocate a larger size of contiguous memory at the same location. I might be wrong and it might even be implementations dependant (what malloc library guarantees giving you the same location if you realloc to a larger size?). Hope this helps...	[reply]
Re: Re: Re: C vs perl by samtregar (Abbot) on Apr 28, 2002 at 19:04 UTC
Do you have a better idea? It's pretty hard to know how much memory to allocate when you don't know how big your results will grow! Perl realloc()s on SVs all the time for just this reason. I suppose he could build a linked-list of text blocks and then reassemble them into a single contiguous block at the end. I doubt that would perform better than realloc() though. -sam	[reply]
Re: Re: Re: Re: C vs perl by abstracts (Hermit) on Apr 29, 2002 at 02:09 UTC
Re: Re: Re: Re: Re: C vs perl by samtregar (Abbot) on Apr 29, 2002 at 03:22 UTC
Re: Re: C vs perl by John M. Dlugosz (Monsignor) on Apr 29, 2002 at 15:42 UTC
What about using strtok in C? That will give you something similar to `foreach (split)`.	[reply] [d/l]
Re: C vs perl by abstracts (Hermit) on Apr 28, 2002 at 07:56 UTC
Hello Your C code is broken somwhat, let's see: extern char * danCGIReplacedCRLF(char * szFixMe){ int i,j,iLen=0,iNewLen=0; int iCRLFcount=0; char* szResult=NULL; iLen=strlen(szFixMe); for (i=0;i<=iLen;i++){ /* this should be i < len / if (szFixMe[i]==(int) '\n' \|\| szFixMe[i]==(int) '\r') iCRLFcount++; / and here you're counting how many \r or \n you see, not how many "\r\n" sequences / } / you don't need the <p>+NULL+text, you need <p>+text / // need space for first <p> +NULL+ text // + replace each CR &LF + last " <\p>" iNewLen=3+1+strlen(szFixMe)+(iCRLFcount8)+5; szResult = malloc(iNewLen); if (szResult){ /* what's wrong with strcpy(szResult, "<p>"); ? / szResult[0]='<'; szResult[1]='p'; szResult[2]='>'; szResult[3]='\0'; i=0; j=3; while (i<=iLen) if (szFixMe[i]==(int) '\n' \|\| szFixMe[i]==(int) '\r'){ / here again, you're checking for \r or \n, now the pair / strcat(szResult," </p><p>"); j+=8; / and now I'm completely lost!, why are you looking for \n\r? I thought we were looking for \r\n. / // deal with Newline CR pairs if(szFixMe[i]==(int) '\n' && szFixMe[i+1]==(int) '\r' \|\| szFixMe[i]==(int) '\r' \|\| szFixMe[i+1]==(int) ' +\n') i+=2; else i++; }else{ szResult[j]=szFixMe[i]; j++; szResult[j]='\0'; //dest string in strcat needs termin +ating \0 i++; } //outer loop & if statement //end of new document needs </p> to match at start strcat(szResult," </p>"); } // if(szResult !=NULL ) return szResult; } [download] Hmmm, I don't think this C code even comes closely to the Perl code given by you. Here one way to do it. This code aims to be simple and fast, rather than optimally fast. Enjoy :-). /*******************************************************/ #if 0 we want to do the following: sub MakePtag{ my ($fixme)=@_; # take in our parameters $fixme='<p>'.$fixme; # Prepend a <p> tag to our string $fixme=~s\|(\r\n)\|</p><p>\|g; # replace all \r\n with <\p><p> $fixme.='</p>'; # Append closing </p> tag return $fixme; } #endif char make_p_tag(char* str){ int len = 0; int cr = 0; /* this will tell you if we found \r / char s = str; char newstr, s2; /* s#\r\n#<p></p>#, increases len by 5 (7-2)/ while(s != '\0'){ /* while we have more data / ++len; if(s == '\r'){ cr = 1; /* we found one / } else if(s == '\n' && cr){ /* we found \n and \r in the prev step / cr = 0; / forget we found a \r / len += 5; } else { cr = 0; / also forget we found an isolated \r/ } ++s; } s = str; s2 = newstr = calloc(len+8, sizeof(char)); strcpy(s2,"<p>"); s2 += 3; while(s != '\0'){ s2 = s; /* copy this char to the output string / if(s == '\r'){ cr = 1; /* mark that we put a \r / } else if(s == '\n' && cr){ /* now we placed \r\n / cr = 0; strcpy(s2-1,"</p><p>"); / put </p><p> / s2 += 5; / and advance / } else { cr = 0; / forget it / } ++s; ++s2; } strcpy(s2,"</p>"); / put terminating </p> / s2+=4; s2 = '\0'; return newstr; } [download] Hope this helps. PS: You need </P> not <\P> PS2: I know this is the Perl monastery, but I thought this shouldn't hurt :-) Update: meant to say strcpy instead of strcat: `/* what's wrong with strcpy(szResult, "<p>"); ? */` [download]	[reply] [d/l] [select]
Re: Re: C vs perl by mandog (Curate) on Apr 29, 2002 at 01:14 UTC
Thanks for the corrections. --My C code is pretty ugly. Especially the \r\n vs \n\r thing . For what it is worth I do need to allocate another byte for the terminating \0 strlen does not include the terminating null see man strlen email: mandog	[reply]
Re: Re: Re: C vs perl by abstracts (Hermit) on Apr 29, 2002 at 01:52 UTC
My comment was regarding the following statement: `szResult[3]='\0';` [download] So, basically, you were putting '<', 'p', '>', '\0' at the beginning on the `szResult`. You need not put the null there.	[reply] [d/l]
null bytes (was Re: C vs perl) by mandog (Curate) on Apr 29, 2002 at 02:19 UTC
Re: null bytes (was Re: C vs perl) by abstracts (Hermit) on Apr 29, 2002 at 02:25 UTC
Re: C vs perl by broquaint (Abbot) on Apr 28, 2002 at 14:20 UTC
That is some mighty fat `C`. While I am all for people using `perl` in situations like this I have to respect my elders and come in to defend `C`'s ability to be compact when it wants to be. `char* crlf2ptag(char* str) { char ret, pstr, pret; / ok, this could be done better but it's only a string ;-) / ret = (char) malloc(strlen(str) << 2 + 8); strncpy(ret, "<p>", 3); pstr = str; pret = &ret[3]; while(pstr != '\0') if(pstr == '\r' && (pstr+1) != '\0' && (pstr+1) == '\n') { strncpy(pret, "</p><p>", 7); pret += 7; pstr += 2; } else pret++ = pstr++; ret[strlen(ret) - 3] = '\0'; return ret; }` [download] This compiles under cygwin's gcc 2.95 in win98 and I'd be surprised if there were problems elsewhere (make sure you include `string.h` though!). I'm sure this could be golfed to heck with even more pointer trickery, but my point is that the as long as you are a competent craftsmen it doesn't really matter what tools you use. HTH `_________ broquaint` update: now checks for \r and \n to appease abstracts and should allocate enough space for all but the most \r\n ridden strings.	[reply] [d/l]
Re: C vs perl by Dominus (Parson) on Apr 28, 2002 at 15:04 UTC
Says broquaint: `/* ok, this could be done better but it's only a string ;-) / ret = (char) malloc(strlen(str) + strlen(str));` [download] String or not, it's clear that if you were going to do that, you should have done this instead: `ret = (char) malloc(strlen(str) 2);` [download] Also, your function does not work. (I think you forgot to test it before you posted.) You need to have `pret += 7;` [download] in the `if` block. -- Mark Dominus Perl Paraphernalia	[reply] [d/l] [select]
Re: Re: C vs perl by broquaint (Abbot) on Apr 28, 2002 at 15:08 UTC
Also, your function does not work. (I think you forgot to test it before you posted.) I'd walked away from the computer and was going about my rainy afternoon before a little light-bulb appeared over my head and I ran back to add the offending line in hope that no one would notice. But the second I saw you in Other Users I knew it was coming ;-) A lesson learned for the day - think before posting kids. `_________ broquaint`	[reply]
Re: Re: C vs perl by meonkeys (Chaplain) on Apr 28, 2002 at 18:45 UTC
++broquaint (after fixes, of course). A mighty phat response. One question since I am still learning C... when you do `ret = (char*) malloc(strlen(str) + strlen(str));` [download] Are `ret[0], ret[1], ret[2], ... ret[n]` all set to '\0'? My manual for malloc() says "The memory is not cleared", but in my debugger it looks like all these values are zero. Also, I don't think you need to cast malloc(). In fact, it might hide errors. --- "A Jedi uses the Force for knowledge and defense, never for attack."	[reply] [d/l] [select]
Re: Re: Re: C vs perl by Elian (Parson) on Apr 28, 2002 at 19:21 UTC
malloc just hands you back some memory that it has handy. If it's newly allocated from the system, odds are that it'll be zeroed. (On most OSes these days freshly minted process memory comes pre-zeroed since it's no more expensive to hand you zeroed memory as any other type) malloc keeps a free list, though, and when you free() some memory, you may get that same chunk back later, with whatever gook might be in it. If you want guaranteed zeroed memory, use calloc() instead. It zeroes the memory before it's handed to you, and generally in the way most efficient for the OS you're on.	[reply]
Re: Re: Re: C vs perl by h.toothrot (Initiate) on Apr 29, 2002 at 10:33 UTC
Also, I don't think you need to cast malloc(). In fact, it might hide errors. The type of malloc is void, I think I remember a compiler telling me that char is not the same as void. Consequently I had to cast malloc to char.	[reply]
Re: Re: C vs perl by abstracts (Hermit) on Apr 28, 2002 at 21:35 UTC
First of all, this does NOT replace crlf pairs with </p><p>. Period. Second of all, there is a buffer overflow as you `malloc(2*strlen(str))`. Please pass your routine `"1\r2\r3\r"` and see how it will give you a nice SEGFAULT (which you might not even see since you're ona Win98 box).	[reply]
Re: Re: C vs perl by czrdup (Acolyte) on Apr 29, 2002 at 17:42 UTC
I did some thinking and the best I could come up with in a short amount of time is this: `char* destroycrlf(char* original) { char ret, token, sep[] = "\r"; ret = (char )calloc(strlen(original) 7 + 8,sizeof(char)); token = strtok( original, sep ); sprintf(ret ,"<p>" ); while (token) { if (token[0] == '\n') sprintf(ret, "%s<\\p><p>%s", ret, &token[1]); else sprintf(ret, "%s%s", ret, token); token = strtok( NULL, sep ); } sprintf(ret, "%s<\\p>", ret); return ret; }` [download] Maybe not the shortest, but it should work and is fairly quick.	[reply] [d/l]
Re: C vs perl by Juerd (Abbot) on Apr 28, 2002 at 12:36 UTC
Your example is not fair. In Perl, you use a regex to substitute, but in the C example, you don't use any regex at all. This isn't really a C versus Perl example, but more like a brute-force versus regex one. If you want to compare languages, use a regex engine in your C example, and use the same substitution. Or, alternatively, write the Perl code without regexes. Of course you can comment on Perl having a very fast built-in regex engine, and you could give a long and short example of Perl code. Please use equivalent examples. Regexes are in no way unique to Perl. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply]
(All roads lead to Rome) Re: Re: C vs perl by Ovid (Cardinal) on Apr 28, 2002 at 19:22 UTC
Juerd wrote: If you want to compare languages, use a regex engine in your C example, and use the same substitution. Or, alternatively, write the Perl code without regexes. While I understand your point of view, I have to disagree. Different languages naturally lead to different solutions. A program is correct if I does exactly what it is supposed to do and nothing more. The implementation is almost an afterthought. Consider the following snippet. `foreach my $alpha ( @alphas ) { foreach my $beta ( @betas ) { if ( $alpha eq $beta ) { # do something } } }` [download] A friend of mine uses that in job interviews. He tells the programmer that a project was moved into production only to discover that this snippet was consuming 20% of the run time, a fact not discovered in testing. My friend has two questions. First, what is the problem? Second, how do you fix it? Interestingly, he says, most programmers that he interviews do not see the problem. Of those who do, many cannot fix it. Here was my initial reaction. `my %alpha; @alpha{ @alphas } = undef; # suppress void context warnings foreach my $beta ( @betas ) { if ( exists $alpha{ $beta } ) { # do something } }` [download] Now, that seems all well and good, but I am informed that many C programmers look at the problem and say "sort one list and do a binary search". It turns out to not be as fast as the above method, but it's much faster than the original code. Personally, I never would have though of the sort and binary search method. I don't think like a C programmer anymore (if I ever did). Someone from another language altogether might say "sort both lists and look for intersections." Depending on what you're really doing with the data, that answer might be acceptable too. To really compare languages, you have to let the them show how a problem would be solved "their way". If you tried to shoehorn C or Perl into the programming style of the other, what are you really comparing? Speed? Perl doesn't have pointers (shhh... no one mention Devel::Pointer), so some of the weird things that C programmers come up with using pointers just don't translate over. C doesn't (natively) have regular expressions or hashes. Typical C programmers probably wouldn't look for those solutions. That doesn't make them invalid, but trying to contrast the inherent abilities of different languages means they shouldn't be reduced to the lowest common denominator. If all roads lead to Rome, who cares what road I take? Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
Re: (All roads lead to Rome) Re: Re: C vs perl by belg4mit (Prior) on Apr 28, 2002 at 20:18 UTC
UPDATED `foreach my $alphabeta ( my $z; grep { $z = $_; grep { $z eq $_ } @alphas } @betas ){ # do something }` [download] Granted this still has the same problem as the initial code, but it's twice as fast and saves memory over your speedier (20xOriginal) solution. It also has the side-effect of effectively uniq-ing @betas. If you don't like that replace the outter grep with map. `-- perl -pew "s/\b;([mnst])/'$1/g"`	[reply] [d/l]
Re: Re: C vs perl by belg4mit (Prior) on Apr 28, 2002 at 18:50 UTC
They are not unique to Perl, however they are a core feature of the language, bit of a difference there. I think the better option would be to instead use the suggested cleaned up Perl and C provided by others, and implement a "dumb" Perl version as well. Perhaps as if to demonstrate how "awkward" it would be. `-- perl -pew "s/\b;([mnst])/'$1/g"`	[reply]
C vs perl followup by mandog (Curate) on Apr 29, 2002 at 01:06 UTC
Thanks to everyone who replied. I'll post an update when I have final presentation notes I'll be using corrections from Anonymous Monk and ovid to the perl version of the function and broquaint's version of the C function with corrections by dominus and abstracts Time permitting, I'll do up a Flex and Bison version and a pcre (perl compatible regular expressions) version. Juerd makes the good point that it would be fairer to compare a C regex to a perl regex solution. However, my preliminary research and very small experience suggest C regexs have a fair amount of overhead that wouldn't be recovered in my toy project. Juerd will no doubt produce elegant, working code and correct me if I er but... To use the pcre library, I'll need to: Allocate Memory Compile the Regular Expression Execute the regular expression Play w/ a special substr function, a loop and maybe some strcat() To use Flex/Bison I'll have to: Build a .l file Build a .y file (or write a concat loop in C ??) Run Flex Run Yacc In Perl I just use: `$fixme=~s\|\n\r\|</p><p>\|g` Time permitting, I'll also do an "ugly" version in Perl (thanks belg4mit) A couple other thoughts. It took me about 5 hours to write the C version of the function and about 50 minutes to write the Perl version of the function. At no time, during my perl coding did I dump core... My perl version was much closer to the more ideal perl version created by ovid than the more ideal C version created by broquaint My guess is that as a average Perl & C programmer, I'm closer to being a great perl programmer than I am to being a great C programmer. (not that I'm especially close...) email: mandog	[reply] [d/l]
Re: C vs perl by Anonymous Monk on Apr 28, 2002 at 10:43 UTC
I know nothing about C, but a little Perl. The first thing I noticed was that you capture `\r\n` but don't use `$1`. Besides that I'd like to remove some unnecessary code. `sub MakePtag { my ($s) = @_; $s =~ s!\r\n!</p><p>!g; return "<p>$s</p>"; }` [download]	[reply] [d/l] [select]


"be consistent"
	PerlMonks