Re: Untaint IP address/hostname question

Replies are listed 'Best First'.
Re: Re: Untaint IP address/hostname question by Taulmarill (Deacon) on Mar 08, 2004 at 17:26 UTC
remember that umlauts (噈㩗笚) are valid for german url愀 (.de) since 01.March. i don愒 think Regexp::Common covers that.	[reply]
Re: Untaint IP address/hostname question by Abigail-II (Bishop) on Mar 08, 2004 at 17:38 UTC
Regexp::Common covers what's in RFC 2396 and RFC 2626 when it comes to HTTP URIs. If those RFC's are superseeded, I'd be interested in hearing about them. Abigail	[reply]
Re: Re: Untaint IP address/hostname question by sri (Vicar) on Mar 08, 2004 at 18:11 UTC
It's RFC3490, Internationalizing Domain Names in Applications (IDNA).	[reply]
Re: Re: Untaint IP address/hostname question by iburrell (Chaplain) on Mar 08, 2004 at 19:38 UTC
Internationalized Domain Names adds a mechanism to encode non-ASCII characters in the allowed characters for domain names. The question is should Regexp::Common match the encoded or non-encoded domain names. I would say that the RE should not be changed and the higher level code needs to do the translation.	[reply]
Re: Re: Untaint IP address/hostname question by Taulmarill (Deacon) on Mar 08, 2004 at 17:49 UTC
just look for "Internationalized Domain Names" in your favorite internet search engine. i惴 too lazy to look for rfc愀 right now.	[reply]
Re: Re: Untaint IP address/hostname question by iburrell (Chaplain) on Mar 08, 2004 at 19:51 UTC
2130706433 is not a valid IP address as most people think of them. It is the decimal integer corresponding to the binary IP address for 127.0.0.1. The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses. Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form.	[reply]
Re: Re: Re: Untaint IP address/hostname question by Juerd (Abbot) on Mar 08, 2004 at 20:30 UTC
2130706433 is not a valid IP address as most people think of them. Likewise, "login=juerd" is not a valid cookie as most people think of them. They expect them to be edible. What most people think and what is technically correct isn't always the same. The Unix inet_ntoa accepts all kinds of non-standard forms for IP addresses. Yes, like the ones formed like "127.0.0.1". This is only a de-facto standard, not an official one. It happens to be accepted by almost everything that takes an IP address. Decimal numbers like "2130706433" are also a de-facto standard; they are just not used as much. The libraries found in Unix, Linux, Windows and Mac OS all think "2130706433" and "127.0.0.1" are the same address. Everyone else thinks that IP addresses are represented as four decimal numbers sepated by periods. Using anything else will confuse people and programs that expect the standard form. We could argue about the meaning of "everyone else" or about "anything else", or even about who you think "people" are. Or we could just stick to your point and discuss the "standard" status of dotted decimal IP addresses. That some applications and even some protocols require IP addresses to be stringified like that does not mean that it is the only standard - or that it even is a standard. Should you have an STD, RFC or another official document that says more on this subject, I'll be happy to hear about it. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply]
Re: Re: Re: Re: Untaint IP address/hostname question by iburrell (Chaplain) on Mar 08, 2004 at 23:02 UTC
It is a historical standard because it was implemented in the BSD inet_ntoa and copied into other implementation. It may even be standardized in POSIX. No RFC describes the long form IP address. The RFCs I know that describe grammars for IPv4 addresses only support dotted quad form. This includes URLs. You can see a few places where differences between expectations create problems. For example, most web browsers parse out the host portion of the http URL and pass it to inet_aton. So they accept "long form" address even when the RFCs say they shouldn't. This is seen with scammers writing URLs like: http://www.example.com@0x7F000001/. They use the username and unexpected IP address syntax to hide the destination. Including the long form IP addresses in a regular expression makes them much more complicated. The regex has to match one to three components that could be decimal, hex, or octal numbers. Just to accept a format that is only used by a few people.	[reply]
Re: Re: Re: Re: Re: Untaint IP address/hostname question by Juerd (Abbot) on Mar 09, 2004 at 00:00 UTC


Perl: the Markov chain saw
	PerlMonks