After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 419376 - Functions using named subpatterns behave inconsistently when G_REGEX_DUPNAMES is used
Functions using named subpatterns behave inconsistently when G_REGEX_DUPNAMES...
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.13.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 434358
 
 
Reported: 2007-03-17 15:03 UTC by Marco Barisione
Modified: 2007-05-29 09:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Marco Barisione 2007-03-17 15:03:31 UTC
It's not clear what g_regex_fetch_named(), g_regex_fetch_named_pos() and g_regex_get_string_number() should behave when G_REGEX_DUPNAMES is used and PCRE behavior seems inconsistent.
Comment 1 Matthias Clasen 2007-03-17 18:31:01 UTC
The way things currently are, you get one of the matches, which one being undefined, according to the pcre docs. I think this is fine, but the 
documentation needs to point it out.

It would also be very good to include an example of appropriate uses of 
DUPNAMES in the docs, e.g. a(?'middle' c+)b|b(?'middle' d+)a .
Comment 2 Yevgen Muntyan 2007-03-17 19:24:24 UTC
From man pcrepattern:
---------------------
By default, a name must be unique within a pattern, but it is possible to relax  this  constraint  by  setting  the  PCRE_DUPNAMES option  at  compile  time. This can be useful for patterns where only one instance of the named parentheses can match.
...
The convenience function for extracting the data by name returns the substring for the first, and in this example, the only, subpattern of that name that matched.
---------------------

In any case current GRegex problem is (was) that it gets named subpattern in a wrong way.

Matthias, what do you mean by "this is fine"? I.e. what exactly is fine?
Comment 3 Marco Barisione 2007-03-17 20:18:29 UTC
Both g_regex_fetch_named() and g_regex_fetch_named_pos() are broken because they were written before PCRE 6.7, the version that added PCRE_DUPNAMES.

pcre_get_stringtable_entries() can be used to retrieve every subpattern with a given name but I'm not sure we need to wrap it but we can use it internally.

pcre_get_stringnumber() returns one of the numbers that are associated with the name, but it is undefined which it is.

The man page says that pcre_get_named_substring() and pcre_copy_named_substring() call pcre_get_stringnumber(), and if it succeeds, they then call pcre_copy_substring() or pcre_get_substring(), as appropriate.
If the return value of pcre_get_stringnumber() is undefined, then also the return value of pcre_get_named_substring() and pcre_copy_named_substring() is undefined.

We could just say that the return value of the functions using named patterns is undefined but I don't like it, so I'm going to look for a nice solution (maybe using pcre_get_stringtable_entries) but only after fixing bug #419368.
Comment 4 Matthias Clasen 2007-03-18 03:14:41 UTC
> Matthias, what do you mean by "this is fine"? I.e. what exactly is fine?

From my reading of the PCRE docs, DUPNAMES is only intended to be used if the
pattern is such that only one of the identically named matches can happen at
a time.  If you have multiple matches with the same name, you must have
violated that constraint, therefore it is fine to return an undefined result.
Comment 5 Yevgen Muntyan 2007-03-18 04:05:31 UTC
(In reply to comment #4)
> > Matthias, what do you mean by "this is fine"? I.e. what exactly is fine?
> 
> From my reading of the PCRE docs, DUPNAMES is only intended to be used if the
> pattern is such that only one of the identically named matches can happen at
> a time.  If you have multiple matches with the same name, you must have
> violated that constraint, therefore it is fine to return an undefined result.

OK, I thought you were talking about general case, because what Marco said wasn't clear either:

> pcre_get_stringnumber() returns one of the numbers that are associated with 
> the name, but it is undefined which it is.

 Indeed looks so (I bet it's the first subpattern with this name, and I bet it's just undocumented, not left undefined as some evil/wrong thing).

> The man page says that pcre_get_named_substring() and
> pcre_copy_named_substring() call pcre_get_stringnumber(), and if it succeeds,
> they then call pcre_copy_substring() or pcre_get_substring(), as appropriate.
> If the return value of pcre_get_stringnumber() is undefined, then also the
> return value of pcre_get_named_substring() and pcre_copy_named_substring() is
> undefined.

 This is not so, docs explicitly say get_named_substring() will return first 
one matched.

There is indeed something strange in man page: it says 
"If the name is known to be unique (PCRE_DUPNAMES was not set), you can find 
the number from the name by calling pcre_get_stringnumber()." 
and it says 
"These functions call pcre_get_stringnumber(), and if it succeeds, they then
call pcre_copy_substring() or pcre_get_substring(), as appropriate."

The last sentence is probably just a leftover from old version. 
get_named_substring() does the right thing, tested, and it is clearly intended 
to do the right thing.

Finally, using DUPNAMES when named matches are unique is not something illegal, 
and documentation doesn't say it is. Docs say when it could be useful, as they are nice docs; they don't say "it is for this case", the say "it can be useful in this case".
Comment 6 Yevgen Muntyan 2007-03-18 04:18:34 UTC
Oops, pcre_get_stringnumber() actually returns randomish subpattern, it uses binary search to find the name. But indeed get_named_substring() does not use get_stringnumber().
Comment 7 Matthias Clasen 2007-03-18 04:35:34 UTC
Just to clarify: 

What I wanted to say is that if you have a pattern of

 a(?'m' b)c(?'m' d)e

I think it is fine for g_regex_fetch_named (regex, "m", "abcde")
to return either "b" or "d". The documentation should clearly
indicate that this pattern violates the constraints of DUPNAMES.
Comment 8 Yevgen Muntyan 2007-03-18 05:08:31 UTC
(In reply to comment #7)
> Just to clarify: 
> 
> What I wanted to say is that if you have a pattern of
> 
>  a(?'m' b)c(?'m' d)e
> 
> I think it is fine for g_regex_fetch_named (regex, "m", "abcde")
> to return either "b" or "d". 

No, it's not fine. pcre returns first subpattern matched, so glib should do the same.

> The documentation should clearly
> indicate that this pattern violates the constraints of DUPNAMES.

There are no constraint like that (unless glib introduces its own constraints). It's what I was trying to say.
Comment 9 Yevgen Muntyan 2007-05-01 20:20:21 UTC
Patch at #419376 fixes this, everything is consistent (== works as pcre).