Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Re: Untaint IP address/hostname question

by iburrell (Chaplain)
on Mar 08, 2004 at 19:51 UTC ( [id://334908]=note: print w/replies, xml ) Need Help??


in reply to Re: Untaint IP address/hostname question
in thread Untaint IP address/hostname question

2130706433 is not a valid IP address as most people think of them. It is the decimal integer corresponding to the binary IP address for 127.0.0.1. The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses. Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form.
  • Comment on Re: Re: Untaint IP address/hostname question

Replies are listed 'Best First'.
Re: Re: Re: Untaint IP address/hostname question
by Juerd (Abbot) on Mar 08, 2004 at 20:30 UTC

    2130706433 is not a valid IP address as most people think of them.

    Likewise, "login=juerd" is not a valid cookie as most people think of them. They expect them to be edible. What most people think and what is technically correct isn't always the same.

    The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses.

    Yes, like the ones formed like "127.0.0.1". This is only a de-facto standard, not an official one. It happens to be accepted by almost everything that takes an IP address. Decimal numbers like "2130706433" are also a de-facto standard; they are just not used as much. The libraries found in Unix, Linux, Windows and Mac OS all think "2130706433" and "127.0.0.1" are the same address.

    Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form.

    We could argue about the meaning of "everyone else" or about "anything else", or even about who you think "people" are. Or we could just stick to your point and discuss the "standard" status of dotted decimal IP addresses. That some applications and even some protocols require IP addresses to be stringified like that does not mean that it is the only standard - or that it even is a standard.

    Should you have an STD, RFC or another official document that says more on this subject, I'll be happy to hear about it.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      It is a historical standard because it was implemented in the BSD inet_ntoa and copied into other implementation. It may even be standardized in POSIX.

      No RFC describes the long form IP address. The RFCs I know that describe grammars for IPv4 addresses only support dotted quad form. This includes URLs.

      You can see a few places where differences between expectations create problems. For example, most web browsers parse out the host portion of the http URL and pass it to inet_aton. So they accept "long form" address even when the RFCs say they shouldn't. This is seen with scammers writing URLs like: http://www.example.com@0x7F000001/. They use the username and unexpected IP address syntax to hide the destination.

      Including the long form IP addresses in a regular expression makes them much more complicated. The regex has to match one to three components that could be decimal, hex, or octal numbers. Just to accept a format that is only used by a few people.

        It is a historical standard because it was implemented in the BSD inet_ntoa and copied into other implementation.

        As was the long decimal format.

        No RFC describes the long form IP address.

        As none describes the dotted quad form IP address.

        The RFCs I know that describe grammars for IPv4 addresses only support dotted quad form. This includes URLs.

        They are all protocols. Protocols using IP don't define IP. Note by the way that the RFC for URLs (1630) defines host as digits . digits . digits . digits, thus allowing 999.123.0.12345. They require a quad dotted decimal address, but it doesn't say anywhere that that address is an IPv4 address. (Or perhaps I missed that specification)

        most web browsers parse out the host portion of the http URL and pass it to inet_aton.

        That is exactly what I suggest everyone should do. I've found it hard to find a tool on my Linux box that doesn't think 0x7F000001 is invalid. You talk about expectations. I think doing what other tools do lives up to people's expectations.

        Including the long form IP addresses in a regular expression makes them much more complicated.

        I'm suggesting that no regex be used.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://334908]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-19 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found