After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 350621 - the new bugbuddy doesn't allow for .cat email addresses
the new bugbuddy doesn't allow for .cat email addresses
Status: RESOLVED FIXED
Product: bug-buddy
Classification: Deprecated
Component: general
2.15.x
Other All
: Normal trivial
: ---
Assigned To: Bug-buddy Maintainers
Bug-buddy Maintainers
Depends on:
Blocks:
 
 
Reported: 2006-08-09 17:43 UTC by Allison Karlitskaya (desrt)
Modified: 2006-12-14 03:13 UTC
See Also:
GNOME target: ---
GNOME version: 2.15/2.16


Attachments
Patch to email address syntax checking (4.51 KB, patch)
2006-12-11 03:17 UTC, Alexander “weej” Jones
none Details | Review
Less rubbish version (4.55 KB, patch)
2006-12-11 03:24 UTC, Alexander “weej” Jones
none Details | Review
Third time lucky! (4.13 KB, patch)
2006-12-11 04:02 UTC, Alexander “weej” Jones
none Details | Review
Patch with changes recommended by desrt (4.50 KB, patch)
2006-12-11 19:30 UTC, Alexander “weej” Jones
none Details | Review
Running out of things to type here (4.33 KB, patch)
2006-12-11 22:21 UTC, Alexander “weej” Jones
needs-work Details | Review

Description Allison Karlitskaya (desrt) 2006-08-09 17:43:00 UTC
The new bug buddy program has a place for you to enter your email address.  It seems to have some vague intelligence:

1) It will grey out the "Send" button until it detects a valid email address.
2) Validity is determined by looking at the last bit (like the "ca" in desrt.ca)
3) All two letter (presumed country codes) are permitted, even if they're not valid countries.
4) Other top-level domains are special cased (.com, .biz, .info, etc)
5) .cat is not included.
6) People with .cat email addresses cannot submit bugs using these addresses.

Cheers
Comment 1 Vincent Untz 2006-08-23 13:18:33 UTC
The list at http://www.iana.org/domain-names.htm should be used.
Comment 2 Vincent Untz 2006-08-23 13:19:53 UTC
I meant: http://data.iana.org/TLD/tlds-alpha-by-domain.txt
(should be quite easy to automatically generate code from this file (which would be in cvs) and to add a "make update" target to download the latest version of this file)
Comment 3 Fernando Herrera 2006-12-02 15:44:17 UTC
this should be an easy piece of code for anyone. Marking as gnome-love
Comment 4 Bruno Boaventura 2006-12-02 19:54:00 UTC
Can anyone give me a help to fix this? Where to see?
Comment 5 Alexander “weej” Jones 2006-12-10 03:34:52 UTC
Hi

Can we not throw this kind of crap into a general purpose library?

How often are applications validating email addresses?
Comment 6 Allison Karlitskaya (desrt) 2006-12-10 03:59:20 UTC
weej; I doubt it would end up in a library.

Note: a valid resolution to this bug is to simply remove the checking code or reduce its complexity.

For the case of 'reduced complexity' we should probably just accept any address that matches:

/^[-_.a-z0-9]+@([-a-z0-9]+\.)+[-a-z0-9]{2,}$/i

Another approach (which sort of sucks a bit) is to just have the thing to a DNS lookup to see if it has a valid hostname or not.

The solution to this bug doesn't have to be difficult.
Comment 7 Alexander “weej” Jones 2006-12-10 04:34:25 UTC
MX-lookup would block for a while, so I vote for using a regex. To be totally honest, the server should be verifying this kind of thing, anyway, as the client code can be bypassed trivially.

But you knew that.
Comment 8 Fernando Herrera 2006-12-10 15:31:28 UTC
We could use the regexpr approach as soon as we get it in glib.
Comment 9 Alexander “weej” Jones 2006-12-11 03:17:28 UTC
Created attachment 78110 [details] [review]
Patch to email address syntax checking

Please review this.
Comment 10 Alexander “weej” Jones 2006-12-11 03:24:30 UTC
Created attachment 78111 [details] [review]
Less rubbish version

Sorry guys, forgot to make my functions static!
Comment 11 Alexander “weej” Jones 2006-12-11 04:02:10 UTC
Created attachment 78113 [details] [review]
Third time lucky!

As per mariano's feedback, adjusted a few of the character checks to use g_ascii_isalnum.
Comment 12 Allison Karlitskaya (desrt) 2006-12-11 05:50:56 UTC
some comments about the patch.  a lot of these are excessive nit-picks.  some of these aren't even adhered to by the other authors of the file... but you can be better than they are :)

k.  here goes:

- most typically the comments about what a function does preceed the function entirely rather than being included just prior to the variable declarations.

- if you put your auxiliary functions above your main function you won't need to use prototypes.

- the indentation at the end of your final function is messed up.

- your final function could be modified to not scan the string twice (as it currently does).  just do a for loop once, but increment an integer instead of incrementing the string pointer.  then inspect the integer at the end to reason about the length.  you could even have an extra '64 and over' termination condition if you really want.

- you make inconsistent use of 'function (arg)' vs 'function(arg)'.  the first is right.

- casting to unsigned char isn't required

- you should make your functions take (const char *) where possible.  i think this is everywhere.  some of your local variables are even good candidates for const char *.

- loops like:
  for()
    if(this)
      continue;
    if (that)
      continue;

    return FALSE;

are much easier to read if you write:
  for()
    if (!this && !that)
      return FALSE;

- "Split the on the dot" has an extra 'the' in it (see?  nit-pick!)

- indentation and the end of the 3rd last and 2nd last function is also messed up.  final } should rest in column 0.

- assignment of the result of a logical and to a variable is a little bit strange here but there's nothing wrong with it per se.  it should definitely be wrapped to be fewer than 80 characters on a line, though

- you have a memory leak which will occur quite a lot as the user is entering their email.  in the (!local_part) and (!domain) cases you return FALSE without first freeing the parts.  a better approach might be to leave the is-null checking to the subroutines, thus reducing the number of branches that you have to worry about in your main routine.  if you continue to use && like you are then this approach won't access parts[1] without first knowing that parts[0] is non-null.

- you have a 'return FALSE' at the start of email_local_part_is_valid that's indented 4 tabs when it should only be 2.  another case of this exists for the other return FALSE in that function.

- the trick with strchr is less readable and less efficient than just doing it with a sequence of cases like (x != 1 && x != 2 && x != 3)...  note: != and &&. (see point #8 above :)

- some of your comments are also longer than 80 characters to a line.  in general, always wrap.

- again with the leaks in email_domain_is_valid.  if your for loop returns FALSE then g_strfreev is not called.

- the return FALSE in said for loop is not indented far enough :)

that is all for now.

the functionality of the patch itself looks pretty much correct (except for the memory leaks).

thanks for working on this -- cheers.
Comment 13 Alexander “weej” Jones 2006-12-11 09:03:04 UTC
I'll have another play this evening :)

Thanks
Comment 14 Alexander “weej” Jones 2006-12-11 19:11:19 UTC
(In reply to comment #12)
> some comments about the patch.  a lot of these are excessive nit-picks.  some
> of these aren't even adhered to by the other authors of the file... but you can
> be better than they are :)
> 
> k.  here goes:
> 
> - most typically the comments about what a function does preceed the function
> entirely rather than being included just prior to the variable declarations.

Sorted

> - if you put your auxiliary functions above your main function you won't need
> to use prototypes.

Again, sorted. I knew this, but I thought the function order made more sense, logically. If you'd prefer not to have prototypes then fair enough.

> - the indentation at the end of your final function is messed up.

Hmm... Maybe a dodgy patch or something. It seems fine now, at least.
 
> - your final function could be modified to not scan the string twice (as it
> currently does).  just do a for loop once, but increment an integer instead of
> incrementing the string pointer.  then inspect the integer at the end to reason
> about the length.  you could even have an extra '64 and over' termination
> condition if you really want.

Done.
 
> - you make inconsistent use of 'function (arg)' vs 'function(arg)'.  the first
> is right.

Sorted. I normally use function(arg) but realised it was wrong and changed all that I could see. I obviously missed a couple.

> - casting to unsigned char isn't required

Fixed
 
> - you should make your functions take (const char *) where possible.  i think
> this is everywhere.  some of your local variables are even good candidates for
> const char *.

Gotta figure this one out first
 
> - loops like:
>   for()
>     if(this)
>       continue;
>     if (that)
>       continue;
> 
>     return FALSE;
> 
> are much easier to read if you write:
>   for()
>     if (!this && !that)
>       return FALSE;

Unless it is suboptimal to do it the first way, I'd rather not cram all the checks into one line. When the conditions are "this" and "that", yeah, it looks reasonable, but I was coding it how I think the process through. i.e., "is this character one of these? OK, next one.". Maybe I'm just not used to it, but looking at a load of boolean operators is less readable to me than basic flow. Let me know what you think of the new patch anyway, I'm not that passionate about it.
 
> - "Split the on the dot" has an extra 'the' in it (see?  nit-pick!)

Fixed

> - indentation and the end of the 3rd last and 2nd last function is also messed
> up.  final } should rest in column 0.

No idea what happened here. Again, I can't see it now so whatever!
 
> - assignment of the result of a logical and to a variable is a little bit
> strange here but there's nothing wrong with it per se.  it should definitely be
> wrapped to be fewer than 80 characters on a line, though

What else can I do? I need to free before I return. I've left it as is for now, but I'd like to know what else I can do that's more accepted.

> - you have a memory leak which will occur quite a lot as the user is entering
> their email.  in the (!local_part) and (!domain) cases you return FALSE without
> first freeing the parts.  a better approach might be to leave the is-null
> checking to the subroutines, thus reducing the number of branches that you have
> to worry about in your main routine.  if you continue to use && like you are
> then this approach won't access parts[1] without first knowing that parts[0] is
> non-null.

Hmm... I've changed this around a bit. Not sure if it's better or worse. Again, there's just a load of operators everywhere now, but if that's the way we want to dance the dance then sobeit. :)
 
> - you have a 'return FALSE' at the start of email_local_part_is_valid that's
> indented 4 tabs when it should only be 2.  another case of this exists for the
> other return FALSE in that function.

Really not seeing this... I'm consistently using tabs with width 8. Maybe it's an editor setting making it look strange?
 
> - the trick with strchr is less readable and less efficient than just doing it
> with a sequence of cases like (x != 1 && x != 2 && x != 3)...  note: != and &&.
> (see point #8 above :)

Hmm, mariano recommended I did it this way instead. I've commented it.

> - some of your comments are also longer than 80 characters to a line.  in
> general, always wrap.

Done.
 
> - again with the leaks in email_domain_is_valid.  if your for loop returns
> FALSE then g_strfreev is not called.

Fixed.
 
> - the return FALSE in said for loop is not indented far enough :)

!? :P See if the indentation is still borked, I've no idea what went wrong.
 
> that is all for now.
> 
> the functionality of the patch itself looks pretty much correct (except for the
> memory leaks).
> 
> thanks for working on this -- cheers.
> 

NP :D
Comment 15 Alexander “weej” Jones 2006-12-11 19:30:36 UTC
Created attachment 78151 [details] [review]
Patch with changes recommended by desrt

I had a go at flagging some consts. Let me know if I've missed anything I should have done.
Comment 16 Allison Karlitskaya (desrt) 2006-12-11 20:07:06 UTC
replies:

- you still have whitespace trouble, most likely caused by using diff -w.  don't do that.

- your use of the for/continue construct is a bit strange because the end of the for loop is never ever reached, and therefore it's not really acting much like a loop.  there's absolutely nothing technically wrong with this, and without using a long sequence of boolean operators (as you noted) i'm not sure of a better solution.


- i doubt you meant to do this:
                /* If character is "-", "_" or "." it is valid. */
                if (strchr("-_.", *character))
                        continue;

                if (strchr("-_.", *character) != NULL)
                        continue;

- also note (as seen in above code) you still have some cases of function()

- this is extremely strange:

        for (i = 0; character = domain_label + i; i++) {

this loop will not terminate until the 'domain_label' pointer wraps all the way around to NULL (but your program will crash much sooner).  ((well, actually, that's not entirely true.  since '\0' will be detected as an "invalid character" the loop will exit reporting that the address contains invalid characters))

perhaps you want

        for (i = 0; *(character = domain_label + i); i++)

or

        for (i = 0; *(character = &domain_label[i]); i++)

or

        const char character; // not const char *

        ...

        for (i = 0; character = domain_label[i]; i++)

or

        for (i = 0; domain_label[i]; i++) {
                const char *character = domain_label[i];

or just use the last form of the loop and abandon the use of 'character' entirely.  with optimisation the compiler will cache the value of domain_label[i] anyway.  you have other loops like this.  my personal preference is that using an integer index is more readable but others might disagree.

- this is nice (much better than my proposed solution):
        /* Check we have the 2 parts */
        if (!(local_part = parts[0]) || !(domain = parts[1])) {

that's all for now :)
Comment 17 Alexander “weej” Jones 2006-12-11 22:15:18 UTC
(In reply to comment #16)
> replies:
> 
> - you still have whitespace trouble, most likely caused by using diff -w. 
> don't do that.

Can't remember where I picked that habit up, but at least we've found out what was wrong. :)
 
> - your use of the for/continue construct is a bit strange because the end of
> the for loop is never ever reached, and therefore it's not really acting much
> like a loop.  there's absolutely nothing technically wrong with this, and
> without using a long sequence of boolean operators (as you noted) i'm not sure
> of a better solution.

OK

> - i doubt you meant to do this:
>                 /* If character is "-", "_" or "." it is valid. */
>                 if (strchr("-_.", *character))
>                         continue;
> 
>                 if (strchr("-_.", *character) != NULL)
>                         continue;
> 
> - also note (as seen in above code) you still have some cases of function()

Sorted (I've actually dropped the strchr call in favour of the simple == checks (again))
 
> - this is extremely strange:
> 
>         for (i = 0; character = domain_label + i; i++) {
> 

*snip*

> perhaps you want
> 
>         for (i = 0; *(character = domain_label + i); i++)

Yeah.

> 
> - this is nice (much better than my proposed solution):
>         /* Check we have the 2 parts */
>         if (!(local_part = parts[0]) || !(domain = parts[1])) {
> 
> that's all for now :)
> 

Great, I'll attach a new patch! :D
Comment 18 Alexander “weej” Jones 2006-12-11 22:21:59 UTC
Created attachment 78170 [details] [review]
Running out of things to type here
Comment 19 Allison Karlitskaya (desrt) 2006-12-12 09:32:37 UTC
after a quick reading (mind you, 4:00am), nothing obviously wrong.

i still think this:

  *(character = domain_label + i)

is a little more grotesque than the alternatives but that's a matter of personal style :)




fer: ping.
Comment 20 Fernando Herrera 2006-12-14 02:52:09 UTC
Comment on attachment 78170 [details] [review]
Running out of things to type here

There are some issues with the patch:

It allows "fer@onirica.a"
it allows "fernando.herrera@k"
it allows "fherrera@onirica.c"

I guess that the code looking for the dot should look for it after the @ part.
Also should allown only domains ending in strlen(end) >= 2
Comment 21 Allison Karlitskaya (desrt) 2006-12-14 03:01:34 UTC
mud on my face, now :)

the problem is that it doesn't check the length of the array of fragments split on "."
Comment 22 Alexander “weej” Jones 2006-12-14 03:04:16 UTC
All of those domains ("onirica.a", "k", and "onirica.c") are technically valid
domains, and there's nothing to stop any of them from coming into existance
tomorrow.

I'm not sure we should be second guessing IANA's policy. If we want something
truly robust, we could add an MX/A record check to the routine as we earlier
discussed but that introduces latency. Alternatively, we could add the check as
a later step, after the user hits the button.

With that in mind, what do you think?

We could special case the last domain label and look it up against the list of
known IANA TLDs, as was almost the case previously.
Comment 23 Alexander “weej” Jones 2006-12-14 03:06:39 UTC
Also, bear in mind that something like "n@ai" *is* a valid address. [1] Hell, if you were totally 1337, you could have the address "god@.", but that's another story. As long as the bit after the @ makes up a valid domain, it is good to go.

[1] <http://lists.ximian.com/pipermail/evolution/2002-January/016633.html>
Comment 24 Allison Karlitskaya (desrt) 2006-12-14 03:08:12 UTC
mud off of my face.

we had discussed this earlier on irc and agreed that this was the correct behaviour.

there is at least one country (i forget which) which has their top-level domain resolvable.
Comment 25 Fernando Herrera 2006-12-14 03:13:57 UTC
ok, my fault then :)

I've committed to HEAD.

Thanks Alex and Ryan for the work on this!.

2006-12-14  Fernando Herrera  <fherrera@onirica.com>

        * src/bug-buddy.c: (email_local_part_is_valid),
        (email_domain_label_is_valid), (email_domain_is_valid),
        (email_is_valid): Rework the email address checking.
        Patch by Alex Jones. Fix bug #350621