Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Untaint IP address/hostname question

by jcpunk (Friar)
on Mar 08, 2004 at 17:06 UTC ( #334876=perlquestion: print w/replies, xml ) Need Help??

jcpunk has asked for the wisdom of the Perl Monks concerning the following question:

I have a form which will (once it is actually written) take either an IP address or a hostname and begin doing things with it. But before I can do anything with it it has to get untainted. I have got this check
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\w+\.\w{2,3})
but it only works on some hosts (host.com or host.ca but not thing.host.com nor thing.org.uk) any thoughts on how to clean this up a bit or a way to tighten down the check?

jcpunk
all code is tested, and doesn't work so there :p (varient on common PM sig for my own ammusment)

Replies are listed 'Best First'.
Re: Untaint IP address/hostname question
by Juerd (Abbot) on Mar 08, 2004 at 17:15 UTC

    Regexp::Common's $RE{net}{IPv4} and $RE{net}{domain}{-nospace}

    Note that 2130706433 is in fact a valid IP address (equal to 127.0.0.1) and that you might just want to try inet_ntoa inet_aton $ip instead. (These can be found in the standard module Socket).

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      remember that umlauts () are valid for german urls (.de) since 01.March.
      i dont think Regexp::Common covers that.
        Regexp::Common covers what's in RFC 2396 and RFC 2626 when it comes to HTTP URIs. If those RFC's are superseeded, I'd be interested in hearing about them.

        Abigail

      2130706433 is not a valid IP address as most people think of them. It is the decimal integer corresponding to the binary IP address for 127.0.0.1. The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses. Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form.

        2130706433 is not a valid IP address as most people think of them.

        Likewise, "login=juerd" is not a valid cookie as most people think of them. They expect them to be edible. What most people think and what is technically correct isn't always the same.

        The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses.

        Yes, like the ones formed like "127.0.0.1". This is only a de-facto standard, not an official one. It happens to be accepted by almost everything that takes an IP address. Decimal numbers like "2130706433" are also a de-facto standard; they are just not used as much. The libraries found in Unix, Linux, Windows and Mac OS all think "2130706433" and "127.0.0.1" are the same address.

        Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form.

        We could argue about the meaning of "everyone else" or about "anything else", or even about who you think "people" are. Or we could just stick to your point and discuss the "standard" status of dotted decimal IP addresses. That some applications and even some protocols require IP addresses to be stringified like that does not mean that it is the only standard - or that it even is a standard.

        Should you have an STD, RFC or another official document that says more on this subject, I'll be happy to hear about it.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Untaint IP address/hostname question
by UnderMine (Friar) on Mar 08, 2004 at 17:39 UTC
    (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|(?:\w+\.)+\w{2,3})
    Solves the above hostname issue but misses a good few invalid values.
    (\d+|(?:\d{1,3}\.){3}\d{1,3}|(?:\w+\.)+\w{2,3})
    Is better but you are far better using Regex::Common functions.

    Hope it helps
    UnderMine

Re: Untaint IP address/hostname question
by fokat (Deacon) on Mar 09, 2004 at 05:04 UTC

    <PLUG CLASS="shameless">
    Consider using NetAddr::IP, as it recognizes most IP address formats in common (and not so common) use.
    </PLUG>

    Best regards

    -lem, but some call me fokat

Re: Untaint IP address/hostname question
by imcsk8 (Pilgrim) on Mar 08, 2004 at 19:53 UTC
    if you really want to use a regexp you culd rewrite your current one to look as this one:
    (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|(\w+\.)+\w{2,3}$)
    ignorance, the plague is everywhere
    --guttermouth
      Not sure that regex above works all that well as an untainter:)... it allows:

      999.000.999.000

      as an IP address and look what it does to the legal domain name neonutt.firstpart-secondpart.co.uk

      Just for your IP addresses (not for your domain names), maybe something like this regex gets closer to what you need?

      /((\d | [01]?\d\d | 2[0-4]\d | 25[0-5] )\.){3}(\d | [01]?\d\d | 2[0-4] +\d | 25[0-5] )/

      Do people really test for the binary representation of the address too? I haven't seen it that often... but, then again, I dont' get out often.

      -hsinclai
Re: Untaint IP address/hostname question
by ambrus (Abbot) on Mar 09, 2004 at 16:31 UTC

    It depends on how you will use the host name. If you convert it directly with gethostbyname (which is the safest solution), you can probably accept any hostname. If you pass it to some external program or shell, you'll have to check what characters that program accepts. The important point here is not to check that the hostname is a valid hostname, but rather that it using it won't do something bad. That is, even if a hostname is valid, it can screw your program if whatever you pass it misinterprets it. If the hostname for example starts with a hyphen (I don't know if that can be valid or not), and you call a program with it and it interprets it as a switch, that's bad, even though the user gave you a valid hostname.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://334876]
Approved by flyingmoose
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2020-12-02 18:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (44 votes). Check out past polls.

    Notices?