GNOME Bugzilla – Bug 795299
Inconsistent results when using a predicate on a node set from a union
Last modified: 2018-04-17 09:33:49 UTC
I have an HTML page with two tables, one of which has a row with mixed header and data cells, the second having only data cells: ``` <html> <head> <title>Table test page</title> </head> <body> <table> <tr> <th>Header 1</th> <td>Header 2</td> </tr> </table> <table> <tr> <th>Header 1</th> <th>Header 2</th> </tr> </table> </body> </html> ``` I am using the following XPath expression to select table cells from the first row of the first table, by combining the results for `<th>` and `<td>`. This works fine, I get both cells back. Note that the first table combines <th> and <td> in a single row. ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)' test.html <th>Cell 1</th><td>Cell 2</td> ``` When I try to get the first cell by appending `[1]` to the expression I get the second cell back instead of the first. Why? The expected result of this expression is `<th>Cell 1</th>`: ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html <td>Cell 2</td> ``` Appending `[2]` yields the second cell successfully. But why not the first? ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html <td>Cell 2</td> ``` Similarly, for the second table I can get all cells with this expression. The second table only contains `<th>` cells in the first row. ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)' test.html <th>Cell 1</th><th>Cell 2</th> ``` However, when I append `[1]` to get only the first cell back, I get an empty result. The expected result for this expression is `<th>Cell 1</th>`. ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html XPath set is empty ``` The second cell can be retrieved successfully by appending `[2]`. But why not the first? ``` $ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html <th>Cell 2</th> ``` Note that by replacing the first `descendant-or-self` with `//` the expression seems to work as expected in all cases for both the first and second table: ``` $ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)' test.html <th>Cell 1</th><td>Cell 2</td> $ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html <th>Cell 1</th> $ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html <td>Cell 2</td> $ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)' test.html <th>Cell 1</th><th>Cell 2</th> $ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html <th>Cell 1</th> $ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html <th>Cell 2</th> ``` Here is an interactive version of the test case implemented in PHP: https://3v4l.org/WuPSt I'm not sure whether this is a bug or an incorrect usage of XPath on my end. I looked through the documentation and my understanding is that the result of a union is a node-set, and predicates can be used on node-sets. Since both parts of the union are using `descendant-or-self` I assume the forward document order should be used.
This turned out to be a long-standing bug related to XPATH_OP_RESET. Should be fixed with the following commit: https://git.gnome.org/browse/libxml2/commit/?id=e22a83b1d095dac25ce05e1a2d9f263f41d11c68
Thanks!!