Reference: New Module Consideration?
Well, I've decided to go through with my idea, so now I'm wondering what types of tests would the community find most useful. The current list of tests being considered:
- Credit Card - verify that it LOOKS like one
- Date - DD/MM/YYYY etc... perhaps check that's its possible
- E-Mail - basic syntax check, might attempt to use Email::Valid
- INT - integer vs float
- IP - could it be an ip?
- Time - obvious...
- URI - parseable by URI
- Year - similar to int probably
- HCOLOR - a valid HEX color def
- HTML - has valid *looking* syntax
Also, as requested:
- Sum = specified #
- Sum < specified #
- Sum > specified #
- Sum <= specified #
- Sum >= specified #
- Sum != specified #
Any requests?
My code doesn't have bugs, it just develops random features.
Flame ~ Lead Programmer: GMS | GMS
Re: Data Validation Tests
by Aristotle (Chancellor) on Jan 26, 2003 at 00:14 UTC
|
| [reply] |
Re: Data Validation Tests
by dempa (Friar) on Jan 25, 2003 at 23:29 UTC
|
| [reply] |
Re: Data Validation Tests
by IlyaM (Parson) on Jan 26, 2003 at 09:23 UTC
|
Make API open so anybody can add new test types with plugin modules without modifying source code of your module.
BTW have you looked on Data::Verify? It seems to be a similar project. Have you considering cooperation?
--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support
UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org
| [reply] |
|
An open API is already in the design. Just load the module and refer to it as a 'custom' test. I already have a test module (count and compare) that uses that mechanism because it needed to be able to curry the function.
As for Data::Verify, I have looked at it, but I'm not sure if it's really practical to try to combine them directally. It's still on my "To Be Considered" list.
My code doesn't have bugs, it just develops random features.
Flame ~ Lead Programmer: GMS | GMS
| [reply] |
|
Use "Data::Type" instead (was Data::Verify).
- Now "quite" stable api (alpha).
- 90% of your requested "value types"
- Documentation extended.
- Added more tests.
I am always happy about new ideas and contributions via
http://www.sf.net/projects/datatype
or directly to me.
Greetings,
Murat (murat.uenalan@cpan.org)
</re> | [reply] |
Re: Data Validation Tests
by Aragorn (Curate) on Jan 26, 2003 at 11:42 UTC
|
Maybe it's just me, but I like to have validation routines that go with the modules which actually deal with the data type at hand. For example, Business::CreditCard has a validate function. The documentation of URI shows the "official" regex to match an URI. Date::Calc functions will return an error when fed an invalid date.
The terms strong cohesion and loose coupling spring to mind. Functions for validating data should be part of the module processing that data. Having a separate module with all kinds of unrelated validation routines increases the risk of (possibly) not keeping up with changes in the format, and I don't really need credit card number validation routines in my logfile processing script which extracts IP addresses.
Arjen | [reply] [d/l] |
|
While your concern is valid, the idea is to have a way to summarize all validation in a declaration. The benefit is that you avoid manually writing the validation logistics, which is incredibly boring work. Something like
my @_id = $cgi->params('id');
error_exit("You must select at least one ID.")
unless @_id;
my @id;
for(@_id) {
my ($match) = /\A(\d{5,9}[ABC])\z/;
error_exit("Invalid ID: $_")
unless defined $match;
push @id, $match;
}
Even if you replace the regex by ID::validate(), the whole thing remains rather clunky - esp if you imagine that you have to write such a snippet for 25 different parameters: pure drone work. It is much easier to use a Data::FormValidatoresque declaration like
my $validator = Data::FormValidator->new({
delete_items_page => {
required => [qw(id .. ..)],
constraints => {
id => '/\A(\d{5,9}[ABC])\z/',
# ..
},
},
});
While this doesn't compare favourably so far, adding more parameters to validate would be trivial and quick with the latter code. Adding another two dozen checks is easy and doesn't result in an unmanageable amount of code.
I do agree that the validator should not contain its own actual validation routines. That's why I don't like Data::FormValidator as is; I would prefer there being plugin modules that integrate other modules' own supplied validation routines - much the way File::Find::Rule works.
Makeshifts last the longest. | [reply] [d/l] [select] |
|
| [reply] |
|
|
Re: Data Validation Tests
by Abigail-II (Bishop) on Jan 26, 2003 at 23:56 UTC
|
I would like to have all of those tests in Regexp::Common.
Some of the proposed tests are already part of Regexp::Common,
like the numeric tests (integers, reals) and the IP addresses.
Some URI classes are in there as well (http, ftp, tel, fax
and tv at the moment).
But doing validation is a lot harder than you think. You need
to find authorative documentation (there are many, many URI
schemes, there are only a few URI schemes that have RFC that
aren't either ambiguous, unclear from which conflicting RFCs
they import terms, or defined in superceeded RFCs - but not in
the superseeding RFC itself. A lot of schemes are only documented in
internet drafts, of which the latest has expired years ago),
and regexes are hard to test right. You have to consider lots
of cases, and combinations of cases, and also a lot of cases
where the regex should fail. Two weeks ago, I redid the test
suite for http URIs, which is actually one of the better defined
URI schemes, and it took me two full nights to get it all working.
It did turn up a few bugs as well.
I've wanted to add dates to Regexp::Common for quite some time
as well, but were do you start? There are so many forms to choose
from. Perhaps start with dates in ISO format? It sounds simple,
until you actually read the 33 page specification.
Email addresses.... Once, they will be part of Regexp::Common.
I've done them using Parse::RecDescent (in RFC::RFC822::Address),
and it won't be a pretty regex, as it will be recursive. I haven't
had the guts to do this beast yet.
I don't think valid credit card numbers would be hard - but
I lack their specification. If you can provide me with it,
I'll add it to Regexp::Common (but the spec should be better
than "a 14 digit number").
Send me regexes and specifications, preferably with an extensive
test suite, and I'll add it to Regexp::Common (current version:
2.104, 87 patterns in 11 classes, 156778 tests in 30 files).
Abigail | [reply] |
|
>the spec should be better than "a 14 digit number"
Actually even that would be wrong.
The spec for credit cards includes 13- to 16-digit numbers as well.
The spec would be, roughly, "a 13-to-16-digit Luhn number beginning with one of a list of prefixes."
There's an article about it here and a Perl implementation for checking here
--
Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer. M-J D
| [reply] |
Why reinvent the wheel?
by autarch (Hermit) on Jan 27, 2003 at 04:29 UTC
|
Please, before you release yet another redundant module (YARM), please consider whether not what you want wouldn't be better done as patches to existing modules, including, but not limited to ...
Params::Validate
Data::Verify
Data::FormValidator
Data::Validator::Item
Class::ParamParser & Class::ParmList
Getargs::Long
Plus related modules like Regexp::Common, Email::Valid, Business::CreditCard, and more.
Yes, your proposed API is somewhat different from the existing offerings. But is it so different that it offers completely unique functionality? I don't think so. In fact, it seems fairly close to Data::Validator::Item.
One of the big problems with CPAN is that people just go ahead and upload more and more modules that basically do the same thing as everything else, just with a different API. Some categories are overwhelmingly full (DBI wrappers, for example). Parameter validation isn't quite at the point yet, but it's getting close.
If there's something you really want that doesn't exist, pick the existing module you like best, and offer the author patches. If that doesn't work out, then consider creating your own module.
| [reply] |
|
You have a valid argument, and I haven't counted out simply taking what I have and converting it to a patch. Then again, I am also looking at becoming 'glue' by attaching the best elements I can find of each, as well as using other modules to perform tests. For example, the date tests I hope to do with Date::Parse, part of the TimeDate package. Though I admit, this does mean that there would be a great deal of 'reccomended' modules along with D::V::O.
Thank you for the advice. Even if this is eventually rejected by the community as a whole, it will still be practice, and another opportunity to expand my skill with perl. (Flame: Remembers 2 "OOh, what does THIS do"'s that occured during the development up to this point)
Edit: Oh, I would like to know what similaities you see between this and Data::Validator::Item
My code doesn't have bugs, it just develops random features.
Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)
| [reply] |
|
I just repeat myself. But i invested much effort into "Data::Type" (wa
+s Data::Verify) which actually does all what you are talking about. I
+t encapsulates many CPAN "value type" checking modules already.
Business::ISSN 0.90 by ISSN
Locale::SubCountry by LOCALE::COUNTRYCODE, LOCALE::COUNTRYNAME, LO
+CALE::REGIONCODE, LOCALE::REGIONNAME
Net::IPv6Addr 0.2 by IP
Locale::Language 2.02 by LOCALE::LANGCODE, LOCALE::LANGNAME
Business::CINS 1.13 by CINS
Email::Valid 0.14 by EMAIL
Date::Parse 2.23 by DATE
Business::CreditCard 0.27 by CREDITCARD
Regexp::Common 2.104 by INT, IP, QUOTED, REAL, URI
Business::UPC 0.02 by UPC
An update will hit CPAN/SF.net soon.
From the pod:
Data::Type x.x.x supports 44 types:
BINARY - binary code
BOOL - a true or false value
CINS 0.1.3 - a CUSIP International Numbering System Nu
+mber
BIO::CODON 0.1.3 - a DNA (default) or RNA nucleoside triphos
+phates triplet
LOCALE::COUNTRYCODE 0.1.5 - country code
LOCALE::COUNTRYNAME 0.1.5 - country name
CREDITCARD - is one of a set of creditcard type (DINER
+S, BANKCARD, VISA, ..
DATE 0.1.1 - a date (mysql or Date::Parse conform)
DATETIME - a date and time combination
DEFINED 0.1.4 - a defined (not undef) value
DK::YESNO - a simple answer (ja, nein)
BIO::DNA 0.1.3 - a dna sequence
DOMAIN 0.1.4 - a network domain name
EMAIL - an email address
ENUM - a member of an enumeration
GENDER - a gender male, female
HEX - hexadecimal code
INT - an integer
IP 0.1.4 - an IP (V4, V6, MAC) network address
ISSN 0.1.3 - an International Standard Serial Number
LOCALE::LANGCODE 0.1.3 - a Locale::Language language code
LOCALE::LANGNAME 0.1.3 - a language name
LONGTEXT - text with a max length of 4294967295 (2^3
+2 - 1) characters (..
MEDIUMTEXT - text with a max length of 16777215 (2^24
+- 1) characters (al..
NUM - a number
OS::PATH 0.1.6 - a path string (not really functional)
PORT 0.1.4 - a network port number
QUOTED - a quoted string
REAL - a real
REF - a reference to a variable
LOCALE::REGIONCODE 0.1.5 - region code
LOCALE::REGIONNAME 0.1.5 - region name
BIO::RNA 0.1.3 - a rna sequence
SET - a set (can have a maximum of 64 members (
+mysql))
TEXT - blob with a max length of 65535 (2^16 - 1
+) characters (alias..
TIME - a time (mysql)
TIMESTAMP - a timestamp (mysql)
TINYTEXT - text with a max length of 255 (2^8 - 1) c
+haracters (alias my..
UPC 0.1.3 - standard (type-A) Universal Product Code
URI - an http uri
VARCHAR - a string with limited length of choice (d
+efault 60)
WORD - a word (without spaces)
YEAR - a year in 2- or 4-digit format
YESNO - a simple answer (yes, no)
And 4 filters:
chomp - chomps
lc - lower cases
strip - strip
uc - upper cases
TYPES BY GROUP
Locale
LOCALE::COUNTRYCODE, LOCALE::COUNTRYNAME, LOCALE::LANGCODE, LOCALE::
+LANGNAME, LOCALE::REGIONCODE, LOCALE::REGIONNAME
Logic
BIO::CODON, BIO::DNA, BIO::RNA, DEFINED, DOMAIN, EMAIL, IP, OS::PATH
+, PORT, REF, URI
Database
Logic
ENUM, SET
Time or Date related
DATE, DATETIME, TIME, TIMESTAMP, YEAR
String
LONGTEXT, MEDIUMTEXT, TEXT, TINYTEXT
Business
CINS, CREDITCARD, ISSN, UPC
W3C
String
BINARY, HEX
Numeric
BOOL, INT, NUM, REAL
String
DK::YESNO, GENDER, QUOTED, VARCHAR, WORD, YESNO
GROUP "Database"
These are types identical to mysql database builtin types.
CREDITCARD
This type isn't tested at all and nobody should rely on it without rig
+orous testing.
Supported are: 'Diners Club', 'Australian BankCard', 'VISA', 'Discover
+/Novus', 'JCB', 'MasterCard', 'Carte Blache', 'American Express'.
They are parameterised as: DINERS, BANKCARD, VISA, DISCOVER, JCB, MAST
+ERCARD, BLACHE, AMEX.
CONTRIBUTIONS
The author is happy to receive more types (formats) and add to this li
+brary. If you
have a algorithm/regex for validating it, the better. Just email me.
PREREQUISITES
Class::Maker (0.05.10),
Error (0.15),
IO::Extended (0.05),
Tie::ListKeyedHash (0.41),
Iter (0)
and for types
Business::ISSN 0.90 by ISSN
Locale::SubCountry by LOCALE::COUNTRYCODE, LOCALE::COUNTRYNAME, LO
+CALE::REGIONCODE, LOCALE::REGIONNAME
Net::IPv6Addr 0.2 by IP
Locale::Language 2.02 by LOCALE::LANGCODE, LOCALE::LANGNAME
Business::CINS 1.13 by CINS
Email::Valid 0.14 by EMAIL
Date::Parse 2.23 by DATE
Business::CreditCard 0.27 by CREDITCARD
Regexp::Common 2.104 by INT, IP, QUOTED, REAL, URI
Business::UPC 0.02 by UPC
| [reply] [d/l] |
Re: Data Validation Tests
by demerphq (Chancellor) on Jan 28, 2003 at 14:58 UTC
|
Date - DD/MM/YYYY etc... perhaps check that's its possible
You should be aware that this date format is particularly nonstandard and difficult to check. Use of ISO date formats YYYY/MM/DD or similar should be standard practice for all professional programmers. In fact I would even go so far as to suggest that a programmer that blindly allows DD/MM/YYYY even at the request of a client is doing the client a disservice. ISO standards and their relatives (DIN for example specifies the same format) are there for a reason.
Incidentally to back this claim up consider that MM/DD/YYYY and DD/MM/YYYY are both popular written date formats. Unfortunately there is no way to determine if 02/03/2003 is the third of Febuary or the second of March. There is no such ambiguity in YYYY/MM/DD. (Or if there is I would argue it is of signifigantly less likelyhood of occuring.)
--- demerphq
my friends call me, usually because I'm late....
| [reply] |
|
That was primaraly an example. I'll be employing Date::Parse in the test though, which I do believe supports that...
Thanks for the correction though.
My code doesn't have bugs, it just develops random features.
Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)
| [reply] |
Re: Data Validation Tests
by shotgunefx (Parson) on Jan 27, 2003 at 01:30 UTC
|
I created a similar work for a project once. One thing I would suggest is optional (min,max) parameters. What they would do would depend on the context.
For int, float, other numbers, it could be used to make sure they are within a given range.
For date types that the time is within a given period.
For text that the length is within the allowed limits.
Also I would add TEXT and ALPHANUMERIC /^\w-+$/ to your type list.
-Lee
"To be civilized is to deny one's nature." | [reply] |
Re: Data Validation Tests
by hsmyers (Canon) on Jan 29, 2003 at 19:39 UTC
|
Given the amount of time I hang-out in book space, I would have liked to see something along these lines for ISBN numbers. When last I checked CPAN there was at least one solution (more actually) but most required something along the lines of use Kitchen::Sink; so after a bit of looking around I came up with:
sub checkISBN {
my @digits = split(//,uc(shift));
my $n = scalar(@digits);
my $sum = 0;
my $m = 10;
my $cd;
if ($n != 10) {
return (0,($n < 10 ? '-' : '+'));
}
else {
for (0..@digits - 2) {
$sum += $digits[$_] * $m--;
}
$cd = qw(0 X 9 8 7 6 5 4 3 2 1)[$sum % 11];
return ($cd eq $digits[-1],$cd);
}
}
Don't know if this is what you had in mind, but I found it useful...
--hsm
"Never try to teach a pig to sing...it wastes your time and it annoys the pig." | [reply] [d/l] |
|
Hmm, ISBN. Sounds reasonable. Thanks for the suggestion, and the sample. I'll see if I can work it into the plan.
My code doesn't have bugs, it just develops random features.
Flame ~ Lead Programmer: GMS (DOWN) | GMS (DOWN)
| [reply] |
|
|