Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

truncating form field input to 4000 characters

by emilford (Friar)
on Nov 10, 2005 at 15:24 UTC ( [id://507401]=perlquestion: print w/replies, xml ) Need Help??

emilford has asked for the wisdom of the Perl Monks concerning the following question:

I have a form with a couple of form fields that have a 4000 character limit. I put in place some JS code that fails form submission if any of the fields are 4001+.

The problem is that some strings that the JS code finds to be < 4000, Oracle does not and everything comes crashing down. I decided to throw in an extra check w/in my Perl code that does a substr($x, 0, 4000) on all values, just to double check.

The problem I found, however, the string that JS and wc think is 4000, Perl thinks is 4052. I'm assuming that this has something to do with line breaks, etc. Perl truncates the string down to what it thinks is 4000 characters, but it's actually hacking off a chunk of the user's input.

So, my question is in regard to matching up what JS and Oracle think is 4000 characters to what Perl thinks is 4000 characters. How do I get Perl to recognize this difference?

Replies are listed 'Best First'.
Re: truncating form field input to 4000 characters
by JediWizard (Deacon) on Nov 10, 2005 at 15:49 UTC

    This appears to be related to a cgi-script, right? Are you using the CGI module to fetch the parameter values from the form? I have seen people trying to read the parameters in themselves (without CGI.pm) and getting bitten by URI escape sequences in their input. Without seeing your code... this would be my first guess.


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      Yes, I am using CGI.pm to read in the parameters. After a bit more testing, I've found that JS and MS-Word's count show the text as < 4000, but doing a wc -m from the command line shows >4000. I'm assuming this has something to do with line breaks, etc. Would the text be manipulated in anyway between input into the form field and pulling it w/ CGI.pm?
Re: truncating form field input to 4000 characters
by kwaping (Priest) on Nov 10, 2005 at 15:42 UTC
    What's different or special about that string that's giving you problems? Does it contain non-printing characters or something else out of the ordinary?
Re: truncating form field input to 4000 characters
by Skeeve (Parson) on Nov 10, 2005 at 17:56 UTC
    Line breaks may be a reason and also UTF-8 characters as they are composed of 2 or more bytes.

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: truncating form field input to 4000 characters
by ikegami (Patriarch) on Nov 10, 2005 at 16:28 UTC
    Did you try to send non-ASCII characters? They may have gotten converted to HTML entities (&...;). HTML::Entities might be useful here.
      I found the problem to be the different in line feeds. The testers were cutting and pasting from a Word Document into the HTML form. I believe the line feeds from Windows is a "\r\n". Javascript doesn't count the \r as an extra character where Perl and Oracle do. Removing the\r seemed to solve the problem.
        I believe the line feeds from Windows is a "\r\n"

        That also happens to be the line feed used by the HTTP protocol (officially, anyway -- most HTTP servers and clients will accept simple \n line feeds, but it's not 100% correct). Your web client is probably where the \r\n line feeds are coming from in this case, not necessarily Windows.

        I believe the line feeds from Windows is a "\r\n"

        This is (sometimes) a mis-belief

        perldoc perlport clearly states:

        Newlines
        
               In most operating systems, lines in files are terminated by newlines.
               Just what is used as a newline may vary from OS to OS.  Unix tradition-
               ally uses "\012", one type of DOSish I/O uses "\015\012", and Mac OS
               uses "\015".
        
               Perl uses "\n" to represent the "logical" newline, where what is logi-
               cal may depend on the platform in use.  In MacPerl, "\n" always means
               "\015".  In DOSish perls, "\n" usually means "\012", but when accessing
               a file in "text" mode, STDIO translates it to (or from) "\015\012",
               depending on whether you're reading or writing.  Unix does the same
               thing on ttys in canonical mode.  "\015\012" is commonly referred to as
               CRLF.
        

        So to be picky a "\r\n" on Windows should give you "\015\015\012" and not "\015\012".


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: truncating form field input to 4000 characters
by kulls (Hermit) on Nov 11, 2005 at 05:38 UTC
    Hi,
    Are you using any templates for handling UI(html)?.
    if so, you can add  escape=html and  escape=js in the form fields in order to control the special characters. I guess the value gets truncated due to special characters.

    -Kulls

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://507401]
Approved by Roy Johnson
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-26 08:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found