GNOME Bugzilla – Bug 350621
the new bugbuddy doesn't allow for .cat email addresses
Last modified: 2006-12-14 03:13:57 UTC
The new bug buddy program has a place for you to enter your email address. It seems to have some vague intelligence: 1) It will grey out the "Send" button until it detects a valid email address. 2) Validity is determined by looking at the last bit (like the "ca" in desrt.ca) 3) All two letter (presumed country codes) are permitted, even if they're not valid countries. 4) Other top-level domains are special cased (.com, .biz, .info, etc) 5) .cat is not included. 6) People with .cat email addresses cannot submit bugs using these addresses. Cheers
The list at http://www.iana.org/domain-names.htm should be used.
I meant: http://data.iana.org/TLD/tlds-alpha-by-domain.txt (should be quite easy to automatically generate code from this file (which would be in cvs) and to add a "make update" target to download the latest version of this file)
this should be an easy piece of code for anyone. Marking as gnome-love
Can anyone give me a help to fix this? Where to see?
Hi Can we not throw this kind of crap into a general purpose library? How often are applications validating email addresses?
weej; I doubt it would end up in a library. Note: a valid resolution to this bug is to simply remove the checking code or reduce its complexity. For the case of 'reduced complexity' we should probably just accept any address that matches: /^[-_.a-z0-9]+@([-a-z0-9]+\.)+[-a-z0-9]{2,}$/i Another approach (which sort of sucks a bit) is to just have the thing to a DNS lookup to see if it has a valid hostname or not. The solution to this bug doesn't have to be difficult.
MX-lookup would block for a while, so I vote for using a regex. To be totally honest, the server should be verifying this kind of thing, anyway, as the client code can be bypassed trivially. But you knew that.
We could use the regexpr approach as soon as we get it in glib.
Created attachment 78110 [details] [review] Patch to email address syntax checking Please review this.
Created attachment 78111 [details] [review] Less rubbish version Sorry guys, forgot to make my functions static!
Created attachment 78113 [details] [review] Third time lucky! As per mariano's feedback, adjusted a few of the character checks to use g_ascii_isalnum.
some comments about the patch. a lot of these are excessive nit-picks. some of these aren't even adhered to by the other authors of the file... but you can be better than they are :) k. here goes: - most typically the comments about what a function does preceed the function entirely rather than being included just prior to the variable declarations. - if you put your auxiliary functions above your main function you won't need to use prototypes. - the indentation at the end of your final function is messed up. - your final function could be modified to not scan the string twice (as it currently does). just do a for loop once, but increment an integer instead of incrementing the string pointer. then inspect the integer at the end to reason about the length. you could even have an extra '64 and over' termination condition if you really want. - you make inconsistent use of 'function (arg)' vs 'function(arg)'. the first is right. - casting to unsigned char isn't required - you should make your functions take (const char *) where possible. i think this is everywhere. some of your local variables are even good candidates for const char *. - loops like: for() if(this) continue; if (that) continue; return FALSE; are much easier to read if you write: for() if (!this && !that) return FALSE; - "Split the on the dot" has an extra 'the' in it (see? nit-pick!) - indentation and the end of the 3rd last and 2nd last function is also messed up. final } should rest in column 0. - assignment of the result of a logical and to a variable is a little bit strange here but there's nothing wrong with it per se. it should definitely be wrapped to be fewer than 80 characters on a line, though - you have a memory leak which will occur quite a lot as the user is entering their email. in the (!local_part) and (!domain) cases you return FALSE without first freeing the parts. a better approach might be to leave the is-null checking to the subroutines, thus reducing the number of branches that you have to worry about in your main routine. if you continue to use && like you are then this approach won't access parts[1] without first knowing that parts[0] is non-null. - you have a 'return FALSE' at the start of email_local_part_is_valid that's indented 4 tabs when it should only be 2. another case of this exists for the other return FALSE in that function. - the trick with strchr is less readable and less efficient than just doing it with a sequence of cases like (x != 1 && x != 2 && x != 3)... note: != and &&. (see point #8 above :) - some of your comments are also longer than 80 characters to a line. in general, always wrap. - again with the leaks in email_domain_is_valid. if your for loop returns FALSE then g_strfreev is not called. - the return FALSE in said for loop is not indented far enough :) that is all for now. the functionality of the patch itself looks pretty much correct (except for the memory leaks). thanks for working on this -- cheers.
I'll have another play this evening :) Thanks
(In reply to comment #12) > some comments about the patch. a lot of these are excessive nit-picks. some > of these aren't even adhered to by the other authors of the file... but you can > be better than they are :) > > k. here goes: > > - most typically the comments about what a function does preceed the function > entirely rather than being included just prior to the variable declarations. Sorted > - if you put your auxiliary functions above your main function you won't need > to use prototypes. Again, sorted. I knew this, but I thought the function order made more sense, logically. If you'd prefer not to have prototypes then fair enough. > - the indentation at the end of your final function is messed up. Hmm... Maybe a dodgy patch or something. It seems fine now, at least. > - your final function could be modified to not scan the string twice (as it > currently does). just do a for loop once, but increment an integer instead of > incrementing the string pointer. then inspect the integer at the end to reason > about the length. you could even have an extra '64 and over' termination > condition if you really want. Done. > - you make inconsistent use of 'function (arg)' vs 'function(arg)'. the first > is right. Sorted. I normally use function(arg) but realised it was wrong and changed all that I could see. I obviously missed a couple. > - casting to unsigned char isn't required Fixed > - you should make your functions take (const char *) where possible. i think > this is everywhere. some of your local variables are even good candidates for > const char *. Gotta figure this one out first > - loops like: > for() > if(this) > continue; > if (that) > continue; > > return FALSE; > > are much easier to read if you write: > for() > if (!this && !that) > return FALSE; Unless it is suboptimal to do it the first way, I'd rather not cram all the checks into one line. When the conditions are "this" and "that", yeah, it looks reasonable, but I was coding it how I think the process through. i.e., "is this character one of these? OK, next one.". Maybe I'm just not used to it, but looking at a load of boolean operators is less readable to me than basic flow. Let me know what you think of the new patch anyway, I'm not that passionate about it. > - "Split the on the dot" has an extra 'the' in it (see? nit-pick!) Fixed > - indentation and the end of the 3rd last and 2nd last function is also messed > up. final } should rest in column 0. No idea what happened here. Again, I can't see it now so whatever! > - assignment of the result of a logical and to a variable is a little bit > strange here but there's nothing wrong with it per se. it should definitely be > wrapped to be fewer than 80 characters on a line, though What else can I do? I need to free before I return. I've left it as is for now, but I'd like to know what else I can do that's more accepted. > - you have a memory leak which will occur quite a lot as the user is entering > their email. in the (!local_part) and (!domain) cases you return FALSE without > first freeing the parts. a better approach might be to leave the is-null > checking to the subroutines, thus reducing the number of branches that you have > to worry about in your main routine. if you continue to use && like you are > then this approach won't access parts[1] without first knowing that parts[0] is > non-null. Hmm... I've changed this around a bit. Not sure if it's better or worse. Again, there's just a load of operators everywhere now, but if that's the way we want to dance the dance then sobeit. :) > - you have a 'return FALSE' at the start of email_local_part_is_valid that's > indented 4 tabs when it should only be 2. another case of this exists for the > other return FALSE in that function. Really not seeing this... I'm consistently using tabs with width 8. Maybe it's an editor setting making it look strange? > - the trick with strchr is less readable and less efficient than just doing it > with a sequence of cases like (x != 1 && x != 2 && x != 3)... note: != and &&. > (see point #8 above :) Hmm, mariano recommended I did it this way instead. I've commented it. > - some of your comments are also longer than 80 characters to a line. in > general, always wrap. Done. > - again with the leaks in email_domain_is_valid. if your for loop returns > FALSE then g_strfreev is not called. Fixed. > - the return FALSE in said for loop is not indented far enough :) !? :P See if the indentation is still borked, I've no idea what went wrong. > that is all for now. > > the functionality of the patch itself looks pretty much correct (except for the > memory leaks). > > thanks for working on this -- cheers. > NP :D
Created attachment 78151 [details] [review] Patch with changes recommended by desrt I had a go at flagging some consts. Let me know if I've missed anything I should have done.
replies: - you still have whitespace trouble, most likely caused by using diff -w. don't do that. - your use of the for/continue construct is a bit strange because the end of the for loop is never ever reached, and therefore it's not really acting much like a loop. there's absolutely nothing technically wrong with this, and without using a long sequence of boolean operators (as you noted) i'm not sure of a better solution. - i doubt you meant to do this: /* If character is "-", "_" or "." it is valid. */ if (strchr("-_.", *character)) continue; if (strchr("-_.", *character) != NULL) continue; - also note (as seen in above code) you still have some cases of function() - this is extremely strange: for (i = 0; character = domain_label + i; i++) { this loop will not terminate until the 'domain_label' pointer wraps all the way around to NULL (but your program will crash much sooner). ((well, actually, that's not entirely true. since '\0' will be detected as an "invalid character" the loop will exit reporting that the address contains invalid characters)) perhaps you want for (i = 0; *(character = domain_label + i); i++) or for (i = 0; *(character = &domain_label[i]); i++) or const char character; // not const char * ... for (i = 0; character = domain_label[i]; i++) or for (i = 0; domain_label[i]; i++) { const char *character = domain_label[i]; or just use the last form of the loop and abandon the use of 'character' entirely. with optimisation the compiler will cache the value of domain_label[i] anyway. you have other loops like this. my personal preference is that using an integer index is more readable but others might disagree. - this is nice (much better than my proposed solution): /* Check we have the 2 parts */ if (!(local_part = parts[0]) || !(domain = parts[1])) { that's all for now :)
(In reply to comment #16) > replies: > > - you still have whitespace trouble, most likely caused by using diff -w. > don't do that. Can't remember where I picked that habit up, but at least we've found out what was wrong. :) > - your use of the for/continue construct is a bit strange because the end of > the for loop is never ever reached, and therefore it's not really acting much > like a loop. there's absolutely nothing technically wrong with this, and > without using a long sequence of boolean operators (as you noted) i'm not sure > of a better solution. OK > - i doubt you meant to do this: > /* If character is "-", "_" or "." it is valid. */ > if (strchr("-_.", *character)) > continue; > > if (strchr("-_.", *character) != NULL) > continue; > > - also note (as seen in above code) you still have some cases of function() Sorted (I've actually dropped the strchr call in favour of the simple == checks (again)) > - this is extremely strange: > > for (i = 0; character = domain_label + i; i++) { > *snip* > perhaps you want > > for (i = 0; *(character = domain_label + i); i++) Yeah. > > - this is nice (much better than my proposed solution): > /* Check we have the 2 parts */ > if (!(local_part = parts[0]) || !(domain = parts[1])) { > > that's all for now :) > Great, I'll attach a new patch! :D
Created attachment 78170 [details] [review] Running out of things to type here
after a quick reading (mind you, 4:00am), nothing obviously wrong. i still think this: *(character = domain_label + i) is a little more grotesque than the alternatives but that's a matter of personal style :) fer: ping.
Comment on attachment 78170 [details] [review] Running out of things to type here There are some issues with the patch: It allows "fer@onirica.a" it allows "fernando.herrera@k" it allows "fherrera@onirica.c" I guess that the code looking for the dot should look for it after the @ part. Also should allown only domains ending in strlen(end) >= 2
mud on my face, now :) the problem is that it doesn't check the length of the array of fragments split on "."
All of those domains ("onirica.a", "k", and "onirica.c") are technically valid domains, and there's nothing to stop any of them from coming into existance tomorrow. I'm not sure we should be second guessing IANA's policy. If we want something truly robust, we could add an MX/A record check to the routine as we earlier discussed but that introduces latency. Alternatively, we could add the check as a later step, after the user hits the button. With that in mind, what do you think? We could special case the last domain label and look it up against the list of known IANA TLDs, as was almost the case previously.
Also, bear in mind that something like "n@ai" *is* a valid address. [1] Hell, if you were totally 1337, you could have the address "god@.", but that's another story. As long as the bit after the @ makes up a valid domain, it is good to go. [1] <http://lists.ximian.com/pipermail/evolution/2002-January/016633.html>
mud off of my face. we had discussed this earlier on irc and agreed that this was the correct behaviour. there is at least one country (i forget which) which has their top-level domain resolvable.
ok, my fault then :) I've committed to HEAD. Thanks Alex and Ryan for the work on this!. 2006-12-14 Fernando Herrera <fherrera@onirica.com> * src/bug-buddy.c: (email_local_part_is_valid), (email_domain_label_is_valid), (email_domain_is_valid), (email_is_valid): Rework the email address checking. Patch by Alex Jones. Fix bug #350621