After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795299 - Inconsistent results when using a predicate on a node set from a union
Inconsistent results when using a predicate on a node set from a union
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: xpath
2.9.4
Other Linux
: Normal normal
: ---
Assigned To: Nick Wellnhofer
Depends on:
Blocks:
 
 
Reported: 2018-04-16 13:45 UTC by Pieter Frenssen
Modified: 2018-04-17 09:33 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Pieter Frenssen 2018-04-16 13:45:31 UTC
I have an HTML page with two tables, one of which has a row with mixed header and data cells, the second having only data cells:

```
<html>
  <head>
    <title>Table test page</title>
  </head>
  <body>
    <table>
      <tr>
        <th>Header 1</th>
        <td>Header 2</td>
      </tr>
    </table>
    <table>
      <tr>
        <th>Header 1</th>
        <th>Header 2</th>
      </tr>
    </table>
  </body>
</html>
```

I am using the following XPath expression to select table cells from the first row of the first table, by combining the results for `<th>` and `<td>`. This works fine, I get both cells back. Note that the first table combines <th> and <td> in a single row.

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)' test.html
<th>Cell 1</th><td>Cell 2</td>
```

When I try to get the first cell by appending `[1]` to the expression I get the second cell back instead of the first. Why?
The expected result of this expression is `<th>Cell 1</th>`:

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html
<td>Cell 2</td>
```

Appending `[2]` yields the second cell successfully. But why not the first?

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html
<td>Cell 2</td>
```

Similarly, for the second table I can get all cells with this expression. The second table only contains `<th>` cells in the first row.

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)' test.html
<th>Cell 1</th><th>Cell 2</th>
```

However, when I append `[1]` to get only the first cell back, I get an empty result. The expected result for this expression is `<th>Cell 1</th>`.

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html
XPath set is empty
```

The second cell can be retrieved successfully by appending `[2]`. But why not the first?

```
$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html
<th>Cell 2</th>
```

Note that by replacing the first `descendant-or-self` with `//` the expression seems to work as expected in all cases for both the first and second table:

```
$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)' test.html
<th>Cell 1</th><td>Cell 2</td>

$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html
<th>Cell 1</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html
<td>Cell 2</td>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)' test.html
<th>Cell 1</th><th>Cell 2</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html
<th>Cell 1</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html
<th>Cell 2</th>
```

Here is an interactive version of the test case implemented in PHP: https://3v4l.org/WuPSt

I'm not sure whether this is a bug or an incorrect usage of XPath on my end. I looked through the documentation and my understanding is that the result of a union is a node-set, and predicates can be used on node-sets. Since both parts of the union are using `descendant-or-self` I assume the forward document order should be used.
Comment 1 Nick Wellnhofer 2018-04-16 18:56:27 UTC
This turned out to be a long-standing bug related to XPATH_OP_RESET. Should be fixed with the following commit:

https://git.gnome.org/browse/libxml2/commit/?id=e22a83b1d095dac25ce05e1a2d9f263f41d11c68
Comment 2 Pieter Frenssen 2018-04-17 09:33:49 UTC
Thanks!!