Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

C vs perl

by mandog (Curate)
on Apr 28, 2002 at 05:23 UTC ( [id://162618]=perlquestion: print w/replies, xml ) Need Help??

mandog has asked for the wisdom of the Perl Monks concerning the following question:

I'm preparing a 30 minute presentation comparing C to Perl for my club. The audience fellow undergraduates with some exposure to C and no exposure to Perl. This is not official homework for a grade. I plan to include a pointer to this thread in my presentation.

Below are two functions, one in Perl, one in C. Both are supposed to replace Microsoft \n\r with hmtl  <p></p> tags. Both functions were kludged up by banging randomly on the problem until I more or less got the right answer. For what it is worth, I wrote the C function two years ago because I was too lazy to tackle perl.

I would probably benefit from suggestions like "just open the file in text mode.". However, for my immediatepurpose I'm most interested in knowing:
  1. Do these functions do approximately the same thing?
  2. Is there an cleaner way to do this in C?

C

extern char * danCGIReplacedCRLF(char * szFixMe){ int i,j,iLen=0,iNewLen=0; int iCRLFcount=0; char* szResult=NULL; iLen=strlen(szFixMe); for (i=0;i<=iLen;i++){ if (szFixMe[i]==(int) '\n' || szFixMe[i]==(int) '\r') iCRLFcount++; } // need space for first <p> +NULL+ text // + replace each CR &LF + last " <\p>" iNewLen=3+1+strlen(szFixMe)+(iCRLFcount*8)+5; szResult = malloc(iNewLen); if (szResult){ szResult[0]='<'; szResult[1]='p'; szResult[2]='>'; szResult[3]='\0'; i=0; j=3; while (i<=iLen) if (szFixMe[i]==(int) '\n' || szFixMe[i]==(int) '\r'){ strcat(szResult," </p><p>"); j+=8; // deal with Newline CR pairs if(szFixMe[i]==(int) '\n' && szFixMe[i+1]==(int) '\r' || szFixMe[i]==(int) '\r' || szFixMe[i+1]==(int) ' +\n') i+=2; else i++; }else{ szResult[j]=szFixMe[i]; j++; szResult[j]='\0'; //dest string in strcat needs termin +ating \0 i++; } //outer loop & if statement //end of new document needs </p> to match at start strcat(szResult," </p>"); } // if(szResult !=NULL ) return szResult; }

perl

sub MakePtag{ my ($fixme)=@_; # take in our parameters $fixme='<p>'.$fixme; # Prepend a <p> tag to our string $fixme=~s|(\r\n)|<\\p><p>|g; # replace all \r\n with <\p><p> $fixme.='<\p>'; # Append closing <\p> tag return $fixme; }


email: mandog

Replies are listed 'Best First'.
Re: C vs perl
by Ovid (Cardinal) on Apr 28, 2002 at 06:16 UTC

    Well, I can't comment on the C code, but I think you can make the Perl a bit clearer (or at least more correct). The following assumes that you want each <p>...</p> section on a new line. If not, the final join should be on an empty string.

    sub MakePtag{ my $fixme = shift; # take in our parameters return join "\r\n", # join with newlines map { "<p>$_</p>" } # wrap each line in <p></p> tags grep { /\S/ } # must have at least one non-whitespace + character split "\r\n", $fixme; # break apart on the newlines }

    Of course, I'd be shot for suggesting:

    sub MakePtag { join "\r\n", map {"<p>$_</p>"} grep {/\S/} split "\r\n" +, $_[0] } # :)

    If you stick with your solution, you'll want to chomp $fixme to avoid wasted tags on the end. Oh, and you had the slash on the trailing paragraph tag backwards :)

    sub MakePtag{ chomp(my $fixme = shift;) # take in our parameters $fixme=~s|(\r\n)|</p><p>|g; # replace all \r\n with <\p><p> $fixme = "<p>$fixme</p>"; # Add beginning and ending tags return $fixme; }

    Interesting question: what hoops would you have to just through with C to duplicate the grep functionality that I tossed in?

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: C vs perl
by samtregar (Abbot) on Apr 28, 2002 at 07:56 UTC
    Well, that is some aggresively ugly C code. If one objective of you presentation is to convince competent C coders to try Perl then I suggest you take another run at it. With that code the first thing they'll think is that you just don't know enough C to know how much it rocks.

    Specific comments:

    • Do it in one pass. No one likes to see an alogorithm that has to scan through the input text more than once. That might mean realloc()ing memory as you run short, but you should be able to take a good guess based on the input text length.
    • Consider making use of the str*() library routines. Perl has much better string support but it's not as though C is totally lacking!
    • Think about building your parser around as swicth-driven state-machine. This is how C parsers are commonly built.
    • Maybe do it with YACC instead? No one should build parsers in C by hand once they learn YACC! I bet the YACC implementation would compare favorably with Perl.

    -sam

      I totally agree with samtregar that this code really convinces nobody that you know enough C to compare the 2 languages.

      What I don't agree with is using realloc instead of scanning the string twice, as realloc will have to copy the string over to a new location if it fails to allocate a larger size of contiguous memory at the same location. I might be wrong and it might even be implementations dependant (what malloc library guarantees giving you the same location if you realloc to a larger size?).

      Hope this helps...

        Do you have a better idea? It's pretty hard to know how much memory to allocate when you don't know how big your results will grow! Perl realloc()s on SVs all the time for just this reason.

        I suppose he could build a linked-list of text blocks and then reassemble them into a single contiguous block at the end. I doubt that would perform better than realloc() though.

        -sam

      What about using strtok in C? That will give you something similar to foreach (split).
Re: C vs perl
by abstracts (Hermit) on Apr 28, 2002 at 07:56 UTC
    Hello

    Your C code is broken somwhat, let's see:

    extern char * danCGIReplacedCRLF(char * szFixMe){ int i,j,iLen=0,iNewLen=0; int iCRLFcount=0; char* szResult=NULL; iLen=strlen(szFixMe); for (i=0;i<=iLen;i++){ /* this should be i < len */ if (szFixMe[i]==(int) '\n' || szFixMe[i]==(int) '\r') iCRLFcount++; /* and here you're counting how many \r or \n you see, not how many "\r\n" sequences */ } /* you don't need the <p>+NULL+text, you need <p>+text */ // need space for first <p> +NULL+ text // + replace each CR &LF + last " <\p>" iNewLen=3+1+strlen(szFixMe)+(iCRLFcount*8)+5; szResult = malloc(iNewLen); if (szResult){ /* what's wrong with strcpy(szResult, "<p>"); ? */ szResult[0]='<'; szResult[1]='p'; szResult[2]='>'; szResult[3]='\0'; i=0; j=3; while (i<=iLen) if (szFixMe[i]==(int) '\n' || szFixMe[i]==(int) '\r'){ /* here again, you're checking for \r or \n, now the pair */ strcat(szResult," </p><p>"); j+=8; /* and now I'm completely lost!, why are you looking for \n\r? I thought we were looking for \r\n. */ // deal with Newline CR pairs if(szFixMe[i]==(int) '\n' && szFixMe[i+1]==(int) '\r' || szFixMe[i]==(int) '\r' || szFixMe[i+1]==(int) ' +\n') i+=2; else i++; }else{ szResult[j]=szFixMe[i]; j++; szResult[j]='\0'; //dest string in strcat needs termin +ating \0 i++; } //outer loop & if statement //end of new document needs </p> to match at start strcat(szResult," </p>"); } // if(szResult !=NULL ) return szResult; }
    Hmmm, I don't think this C code even comes closely to the Perl code given by you. Here one way to do it. This code aims to be simple and fast, rather than optimally fast. Enjoy :-).
    /*********************************************************/ #if 0 we want to do the following: sub MakePtag{ my ($fixme)=@_; # take in our parameters $fixme='<p>'.$fixme; # Prepend a <p> tag to our string $fixme=~s|(\r\n)|</p><p>|g; # replace all \r\n with <\p><p> $fixme.='</p>'; # Append closing </p> tag return $fixme; } #endif char* make_p_tag(char* str){ int len = 0; int cr = 0; /* this will tell you if we found \r */ char *s = str; char *newstr, *s2; /* s#\r\n#<p></p>#, increases len by 5 (7-2)*/ while(*s != '\0'){ /* while we have more data */ ++len; if(*s == '\r'){ cr = 1; /* we found one */ } else if(*s == '\n' && cr){ /* we found \n and \r in the prev step */ cr = 0; /* forget we found a \r */ len += 5; } else { cr = 0; /* also forget we found an isolated \r*/ } ++s; } s = str; s2 = newstr = calloc(len+8, sizeof(char)); strcpy(s2,"<p>"); s2 += 3; while(*s != '\0'){ *s2 = *s; /* copy this char to the output string */ if(*s == '\r'){ cr = 1; /* mark that we put a \r */ } else if(*s == '\n' && cr){ /* now we placed \r\n */ cr = 0; strcpy(s2-1,"</p><p>"); /* put </p><p> */ s2 += 5; /* and advance */ } else { cr = 0; /* forget it */ } ++s; ++s2; } strcpy(s2,"</p>"); /* put terminating </p> */ s2+=4; *s2 = '\0'; return newstr; }
    Hope this helps.

    PS: You need </P> not <\P>
    PS2: I know this is the *Perl* monastery, but I thought this shouldn't hurt :-)

    Update: meant to say strcpy instead of strcat:

    /* what's wrong with strcpy(szResult, "<p>"); ? */

      Thanks for the corrections. --My C code is pretty ugly. Especially the \r\n vs \n\r thing .

      For what it is worth I **do** need to allocate another byte for the terminating \0

      strlen does **not** include the terminating null see man strlen



      email: mandog
        My comment was regarding the following statement:
        szResult[3]='\0';
        So, basically, you were putting '<', 'p', '>', '\0' at the beginning on the szResult. You need not put the null there.
Re: C vs perl
by broquaint (Abbot) on Apr 28, 2002 at 14:20 UTC
    That is some mighty fat C. While I am all for people using perl in situations like this I have to respect my elders and come in to defend C's ability to be compact when it wants to be.
    char* crlf2ptag(char* str) { char *ret, *pstr, *pret; /* ok, this could be done better but it's only a string ;-) */ ret = (char*) malloc(strlen(str) << 2 + 8); strncpy(ret, "<p>", 3); pstr = str; pret = &ret[3]; while(*pstr != '\0') if(*pstr == '\r' && *(pstr+1) != '\0' && *(pstr+1) == '\n') { strncpy(pret, "</p><p>", 7); pret += 7; pstr += 2; } else *pret++ = *pstr++; ret[strlen(ret) - 3] = '\0'; return ret; }
    This compiles under cygwin's gcc 2.95 in win98 and I'd be surprised if there were problems elsewhere (make sure you include string.h though!). I'm sure this could be golfed to heck with even more pointer trickery, but my point is that the as long as you are a competent craftsmen it doesn't really matter what tools you use.
    HTH

    _________
    broquaint

    update: now checks for \r and \n to appease abstracts and should allocate enough space for all but the most \r\n ridden strings.

      Says broquaint:
      /* ok, this could be done better but it's only a string ;-) */ ret = (char*) malloc(strlen(str) + strlen(str));
      String or not, it's clear that if you were going to do that, you should have done this instead:
      ret = (char*) malloc(strlen(str) * 2);
      Also, your function does not work. (I think you forgot to test it before you posted.) You need to have
      pret += 7;
      in the if block.

      --
      Mark Dominus
      Perl Paraphernalia

        Also, your function does not work. (I think you forgot to test it before you posted.)
        I'd walked away from the computer and was going about my rainy afternoon before a little light-bulb appeared over my head and I ran back to add the offending line in hope that no one would notice. But the *second* I saw you in Other Users I knew it was coming ;-)

        A lesson learned for the day - think before posting kids.

        _________
        broquaint

      ++broquaint (after fixes, of course). A mighty phat response. One question since I am still learning C... when you do
      ret = (char*) malloc(strlen(str) + strlen(str));
      Are ret[0], ret[1], ret[2], ... ret[n] all set to '\0'? My manual for malloc() says "The memory is not cleared", but in my debugger it looks like all these values are zero.

      Also, I don't think you need to cast malloc(). In fact, it might hide errors.

      ---
      "A Jedi uses the Force for knowledge and defense, never for attack."
        malloc just hands you back some memory that it has handy. If it's newly allocated from the system, odds are that it'll be zeroed. (On most OSes these days freshly minted process memory comes pre-zeroed since it's no more expensive to hand you zeroed memory as any other type) malloc keeps a free list, though, and when you free() some memory, you may get that same chunk back later, with whatever gook might be in it.

        If you want guaranteed zeroed memory, use calloc() instead. It zeroes the memory before it's handed to you, and generally in the way most efficient for the OS you're on.

        Also, I don't think you need to cast malloc(). In fact, it might hide errors.

        The type of malloc is void*, I think I remember a compiler telling me that char* is not the same as void*. Consequently I had to cast malloc to char*.
      First of all, this does NOT replace crlf pairs with </p><p>. Period. Second of all, there is a buffer overflow as you malloc(2*strlen(str)). Please pass your routine "1\r2\r3\r" and see how it will give you a nice SEGFAULT (which you might not even see since you're ona Win98 box).
      I did some thinking and the best I could come up with in a short amount of time is this:
      char* destroycrlf(char* original) { char *ret, *token, sep[] = "\r"; ret = (char *)calloc(strlen(original) * 7 + 8,sizeof(char)); token = strtok( original, sep ); sprintf(ret ,"<p>" ); while (token) { if (token[0] == '\n') sprintf(ret, "%s<\\p><p>%s", ret, &token[1]); else sprintf(ret, "%s%s", ret, token); token = strtok( NULL, sep ); } sprintf(ret, "%s<\\p>", ret); return ret; }
      Maybe not the shortest, but it should work and is fairly quick.
Re: C vs perl
by Juerd (Abbot) on Apr 28, 2002 at 12:36 UTC

    Your example is not fair. In Perl, you use a regex to substitute, but in the C example, you don't use any regex at all. This isn't really a C versus Perl example, but more like a brute-force versus regex one.

    If you want to compare languages, use a regex engine in your C example, and use the same substitution. Or, alternatively, write the Perl code without regexes. Of course you can comment on Perl having a very fast built-in regex engine, and you could give a long and short example of Perl code.

    Please use equivalent examples. Regexes are in no way unique to Perl.

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      Juerd wrote:

      If you want to compare languages, use a regex engine in your C example, and use the same substitution. Or, alternatively, write the Perl code without regexes.

      While I understand your point of view, I have to disagree. Different languages naturally lead to different solutions. A program is correct if I does exactly what it is supposed to do and nothing more. The implementation is almost an afterthought. Consider the following snippet.

      foreach my $alpha ( @alphas ) { foreach my $beta ( @betas ) { if ( $alpha eq $beta ) { # do something } } }

      A friend of mine uses that in job interviews. He tells the programmer that a project was moved into production only to discover that this snippet was consuming 20% of the run time, a fact not discovered in testing. My friend has two questions. First, what is the problem? Second, how do you fix it? Interestingly, he says, most programmers that he interviews do not see the problem. Of those who do, many cannot fix it.

      Here was my initial reaction.

      my %alpha; @alpha{ @alphas } = undef; # suppress void context warnings foreach my $beta ( @betas ) { if ( exists $alpha{ $beta } ) { # do something } }

      Now, that seems all well and good, but I am informed that many C programmers look at the problem and say "sort one list and do a binary search". It turns out to not be as fast as the above method, but it's much faster than the original code. Personally, I never would have though of the sort and binary search method. I don't think like a C programmer anymore (if I ever did). Someone from another language altogether might say "sort both lists and look for intersections." Depending on what you're really doing with the data, that answer might be acceptable too.

      To really compare languages, you have to let the them show how a problem would be solved "their way". If you tried to shoehorn C or Perl into the programming style of the other, what are you really comparing? Speed? Perl doesn't have pointers (shhh... no one mention Devel::Pointer), so some of the weird things that C programmers come up with using pointers just don't translate over. C doesn't (natively) have regular expressions or hashes. Typical C programmers probably wouldn't look for those solutions. That doesn't make them invalid, but trying to contrast the inherent abilities of different languages means they shouldn't be reduced to the lowest common denominator. If all roads lead to Rome, who cares what road I take?

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        UPDATED
        foreach my $alphabeta ( my $z; grep { $z = $_; grep { $z eq $_ } @alphas } @betas ){ # do something }
        Granted this still has the same problem as the initial code, but it's twice as fast and saves memory over your speedier (20xOriginal) solution. It also has the side-effect of effectively uniq-ing @betas. If you don't like that replace the outter grep with map.

        --
        perl -pew "s/\b;([mnst])/'$1/g"

      They are not unique to Perl, however they are a core feature of the language, bit of a difference there.

      I think the better option would be to instead use the suggested cleaned up Perl and C provided by others, and implement a "dumb" Perl version as well. Perhaps as if to demonstrate how "awkward" it would be.

      --
      perl -pew "s/\b;([mnst])/'$1/g"

C vs perl followup
by mandog (Curate) on Apr 29, 2002 at 01:06 UTC

    Thanks to everyone who replied. I'll post an update when I have final presentation notes

    I'll be using corrections from Anonymous Monk and ovid to the perl version of the function and broquaint's version of the C function with corrections by dominus and abstracts

    Time permitting, I'll do up a Flex and Bison version and a pcre (perl compatible regular expressions) version. Juerd makes the good point that it would be fairer to compare a C regex to a perl regex solution. However, my preliminary research and very small experience suggest C regexs have a fair amount of overhead that wouldn't be recovered in my toy project.

    Juerd will no doubt produce elegant, working code and correct me if I er but...

    To use the pcre library, I'll need to:
    • Allocate Memory
    • Compile the Regular Expression
    • Execute the regular expression
    • Play w/ a special substr function, a loop and maybe some strcat()
    To use Flex/Bison I'll have to:
    • Build a .l file
    • Build a .y file (or write a concat loop in C ??)
    • Run Flex
    • Run Yacc
    In Perl I just use:
     $fixme=~s|\n\r|</p><p>|g

    Time permitting, I'll also do an "ugly" version in Perl (thanks belg4mit)

    A couple other thoughts. It took me about 5 hours to write the C version of the function and about 50 minutes to write the Perl version of the function. At no time, during my perl coding did I dump core... My perl version was much closer to the more ideal perl version created by ovid than the more ideal C version created by broquaint

    My guess is that as a average Perl & C programmer, I'm closer to being a great perl programmer than I am to being a great C programmer. (not that I'm especially close...)



    email: mandog
Re: C vs perl
by Anonymous Monk on Apr 28, 2002 at 10:43 UTC
    I know nothing about C, but a little Perl. The first thing I noticed was that you capture \r\n but don't use $1. Besides that I'd like to remove some unnecessary code.

    sub MakePtag { my ($s) = @_; $s =~ s!\r\n!</p><p>!g; return "<p>$s</p>"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://162618]
Approved by DaWolf
Front-paged by DaWolf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-16 22:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found