I'm just revisiting this old thread of mine, a lot has changed but I never gave up on Email Validation.People always ask why validate this way when you could easily just send them an email and expect confirmation, if I were developing my own email addresses for people how can I validate a non-existing username? I can't, so I must write my own rules as to what can or can't be used and I'm basing it on the RFCs.Now the code:CODE<?phpfunction validate_email($email) ... read more.
I am just wondering what you would do to validate against an email address, basically I want to know what Regular Expression you would use to check for a valid format. I've seen so many different ways of doing it, some I can definitely say can produce wrong results, but I'm not saying mine is perfect either, basically I want to perfect it.
Here's what I have, I use PHP's function eregi for this, so case is not important.
What I wanted to achieve was the first character of any email address should be a letter if I'm not mistaken, I probably should go back and read the RFC 2822 as this should have been my first stop in understanding how emails are handled and what is accepted and what isn't. Next I allow an alphanumeric words with underscore, dot and hyphen (minus), next comes the @ sign, after it I check again if the first character is a letter or number, after that another alphanumeric word which allows hypens as well, I don't think you can have an underscore in a domain name?, then comes a dot, then only a word with only letters, I'm still wondering whether to include numbers or not, but so far seen no signs of why I should. The last bit is the trickiest to make it work, basically it checks if another dot follows, if so it checks that there's a letter afterwards then a word of only letters, so there must exist at least 2 letters after the dot, if not, no match, it then continuously checks it over an over again with the additional dots.
Which means something like a2z-abc_def@a2z-abc.com.net.co.uk is valid, even if it does not exist, but that's where checking the host comes into play.
It's that end part that worries me a bit though as it can accept a large amount of dots and words over and over again, although I maybe able to find a limit in the RFC's so I will fix that accordingly to it, else it may be a problem. Other than that, after verifying if the email is valid, I would then check against DNS records or Hostname to see if the domain of the email exists, I could query the sever if the user exists but most servers suggest the user does even when they don't so I rather see if the host exists instead.
So what can you suggest to improve it? I know about shortform ways of writing it, but this is how eregi accepts it. I will now go off and read the RFCs on it, might look at the earlier one first before the newest.
you can send them a link to validate their address... you put the emailaddress in your db in a table that has two fields: the address and a boolean value, default 0 and send them a link like: yourhost.com/validate?mail=theiremail and they you just put the boolean on 1
for the reg expression, can't really think of anything better than the one you have oh yeah, by the way, tnx for the linux boot thingy: i inserted my 1st cd, entered repair on boot and i was given the option to repair my bootloader! hurray! tnx mate
you can send them a link to validate their address... you put the emailaddress in your db in a table that has two fields: the address and a boolean value, default 0 and send them a link like: yourhost.com/validate?mail=theiremail and they you just put the boolean on 1
for the reg expression, can't really think of anything better than the one you have oh yeah, by the way, tnx for the linux boot thingy: i inserted my 1st cd, entered repair on boot and i was given the option to repair my bootloader! hurray! tnx mate
I understand the confirmation email side of things, although this is no membership system but rather just an ordinary form fill out where I want to eliminate any errors. I mean I could do multiple checks on it, splitting it etc, but I wanted a sure fire, basically one liner way of checking it, although now that I've thought about multiple checks, maybe I can look at how many checks I could possibly do to make sure it's valid and then try to build from that and simplify it to just a one liner.
so you want to validate email addresses? i would like to advice that you should use Client Side Scripting for that manner. This includes JavaScript and VBScript. This way, the validation will only be done at client, thus eliminating any unnecessary burden to the server and minimize the use for your bandwidth.
I think I will settle on this as the method although I will work out the minimum values, as that's my last resort for this, far from perfect but it does meet most problems I have seen. I will also try and see if I can find any performance difference in converting to lowercase and removing the case insensitive bit, escaping the characters or enclosing them in braces as it's just a small thing for now.
If you look at cryptwizards code, the only problem I can see is on the end \.[a-z0-9\-\.]
which will say blah_this@abc-def.com.au to be valid, but it also allows blah_this@abc-def.....com....au.. to be valid too.
After every dot, it's probably a good idea to check for a letter or number, but not if it's in the first part before the @ sign.
The last bit of an email, usually falls in the line of net, biz, com, org, etc but you can also get com.au, co.uk, etc. I believe there's no dashes in this part so I assume it's safe to leave those out. If there ever was a problem brought to my attention about this I would fix it promptly, but for now, I find my code fitting for it's purpose.
I have thought about client side checking, but I'm still left using the server to verify the rest of the data, so decided to do the whole thing server side, it's all built into my validator class so it's probably best to try and keep the code altogether than to partially split them between client and server. I also have less trust in client checking than server checking, considering that if I did do client side checking, I'd still use the server to verify it again, which means double the effort.
Short answer, it's impossible. even if you used RFC's issued 2000+ character regular expression, it would still fail some perfectly valid, deliverable addresses. As far as checking the host, it can still be thrown off by gobbleDgook@hotmail.com. In response to the suggestion of using client-side scripts, lots of people (myself included) have them disabled most of the time. so, the onl way you can truly alidate an e-mail address is to actually send an e-mail to it
Use this code to test an email address for proper syntax. It will return '1' if the email address passes or a '0' if it fails. You can use it in conjunction with the Ace Perl Popup Window to let the user know they entered an invalid email address.
For example: if(!&check_email(email)){ &PError("Error. Invalid email address."); }
Usage (Step 1):
Use this code to check the email address. $valid will be '1' or true if it passes or a '0' or false if it fails. Replace [EMAIL] with the variable name you want to check
CODE
$valid=&check_email("[EMAIL]");
Code (Step 2):
Add the following code somewhere within your script
CODE
sub check_email { # Initialize local email variable with input to subroutine. # $email = $_[0];
# If the e-mail address contains: # if ($email =~ /(@.*@)|(\.\.)|(@\.)|(\.@)|(^\.)/ ||
# the e-mail address contains an invalid syntax. Or, if the # # syntax does not match the following regular expression pattern # # it fails basic syntax verification. #
# Basic syntax requires: one or more characters before the @ sign, # # followed by an optional '[', then any number of letters, numbers, # # dashes or periods (valid domain/IP characters) ending in a period # # and then 2 or 3 letters (for domain suffixes) or 1 to 3 numbers # # (for IP addresses). An ending bracket is also allowed as it is # # valid syntax to have an email address like: user@[255.255.255.0] #
# Return a false value, since the e-mail address did not pass valid # # syntax. # return 0; }
Yes I know that this is a Perl & CGI but PHP is considered cgi and server sides like javasxript with a user not haing it enabled will not work while the above will. Plus it takes care of .de.tk.anything with two character tlds.
I'm just revisiting this old thread of mine, a lot has changed but I never gave up on Email Validation.
People always ask why validate this way when you could easily just send them an email and expect confirmation, if I were developing my own email addresses for people how can I validate a non-existing username?
I can't, so I must write my own rules as to what can or can't be used and I'm basing it on the RFCs.
Now the code:
CODE
<?php function validate_email($email) { $err = ''; if(preg_match('/^((?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[^.][a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"))@((?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.[^.])+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.[^.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]))$/', $email, $matches)) { if(($pos = strpos($matches[0], '@')) > 64) { $err .= 'The username (local part) of your email address is too long.</br>' . "\n"; } if(isset($matches[0]{$domlen = 256 + $pos})) { $err .= 'The domain name for your email address is too long.</br>' . "\n"; } } if($err === '') { echo 'it\'s correct'; return true; } echo $err; return false; } validate_email($email = 'test@home.com'); ?>
This code has things here just to aide in visual representation, but basically I only rely on whether it's true or false being returned and also applying the error messages to the correct locations of the form.
I don't actually recommend using this, it's based off the RFC and allows way too much than it should but that's my own personal reason, it does help you understand what's valid and what's not though and as always if there's any mistakes or problems with it, let me know and I'll try to sort it.
To validate an email, I was using this method, some years ago I did a research on what's the best method and found that that one was "BEST", maybe today everything changed, but I guess I don't care as usually now the best method is to check this with a preg math and later send a confirmation email that it was really a good email address..
CODE
$email = $_POST['email'];
if (isset($email{64})) {
echo 'Your Email address is over 64 characters long';
}
if (!preg_match("/^([a-z0-9._-](\+[a-z0-9])*)+@[a-z0-9.-]+\.[a-z]{2,6}$/i", $email)) {