GNOME Bugzilla – Bug 668245
Problems with directory names containing spaces
Last modified: 2021-07-05 13:22:54 UTC
readXmlFile is given a local path name (something like "/Users/eric/Library/Application Support/Llamabama/prefs.xml") and in the resulting docptr value that it returns, docptr->URL is set to "/Users/eric/Library/Application%20Support/Llamabama/prefs.xml"
Created attachment 205672 [details] Testcase: C file with compile and run instructions The comments in the C file explain how to compile the code (on OSX and Linux, on Windows you'll need to jump through the usual hoops), how to run it.
A URI can't contain spaces, so the space character must be escaped.
(In reply to Nick Wellnhofer from comment #2) > A URI can't contain spaces, so the space character must be escaped. However, the CLI commands xmllint and xsltproc do not take URI's as arguments, but pathnames.
> However, the CLI commands xmllint and xsltproc do not take URI's as arguments, but pathnames. xmllint does take URIs as arguments: $ xmllint http://www.xmlfiles.com/examples/note.xml <?xml version="1.0" encoding="ISO8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> It also seems to handle spaces in filenames correctly when printing error messages: $ xmllint 'file 1.xml' file 1.xml:1: parser error : Opening and ending tag mismatch: a line 1 and b <a></b> ^ You can also call xmlURIUnescapeString to unescape the URI from an xmlDoc.
But xmllint also takes pathnames as arguments. Actually its behavior is really strange since the current working directory matters when some argument is a pathname with spaces in a directory name, even though an absolute pathname is provided!!! An example (first without spaces in pathnames): zira:~> cat /tmp/foo/book.xml <?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE book [ <!ENTITY preface SYSTEM "preface.xml"> ]> <book> &preface; </book> zira:~> cat /tmp/foo/preface.xml <?xml version="1.0" encoding="utf-8"?> <preface> <title>About this document</title> </preface> zira:~> xmllint --noent /tmp/foo/book.xml <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE book [ <!ENTITY preface SYSTEM "preface.xml"> ]> <book> <preface> <title>About this document</title> </preface> </book> zira:~> cd /tmp/foo zira:/tmp/foo> xmllint --noent /tmp/foo/book.xml <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE book [ <!ENTITY preface SYSTEM "preface.xml"> ]> <book> <preface> <title>About this document</title> </preface> </book> Currently, everything is fine. But, now, let's rename "foo" to "a b". zira:/tmp> mv foo "a b" Then from my home directory, I now get an error: zira:~> xmllint --noent /tmp/a\ b/book.xml warning: failed to load external entity "preface.xml" /tmp/a b/book.xml:6: parser error : Failure to process entity preface &preface; ^ /tmp/a b/book.xml:6: parser error : Entity 'preface' not defined &preface; ^ But note that from "/tmp/a b", with the same command line, everything is fine: zira:~> cd /tmp/a\ b zira:/tmp/a b> xmllint --noent /tmp/a\ b/book.xml <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE book [ <!ENTITY preface SYSTEM "preface.xml"> ]> <book> <preface> <title>About this document</title> </preface> </book> The bug (with fewer tests) was initially reported here in the Debian BTS: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516916
A strace shows that when the directory is named "foo", then xmllint tries to read "/tmp/foo/preface.xml", which is correct. But when the directory is named "a b", then xmllint tries to read "preface.xml", which fails when the current directory is not the one with the XML files. This is even potentially a vulnerability because if the user has a preface.xml in his home directory, from which the command is run, then it is this file that is read: zira:~> cat preface.xml <?xml version="1.0" encoding="utf-8"?> <preface> <title>Confidential information</title> </preface> zira:~> xmllint --noent /tmp/a\ b/book.xml <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE book [ <!ENTITY preface SYSTEM "preface.xml"> ]> <book> <preface> <title>Confidential information</title> </preface> </book>
OK, but this issue has very little to do with the original bug report. Anyway, since this is the "upstream" bug for the Debian bug, I'm changing the title and reopening. As the Debian bug mentions, if you replace spaces with '%20', it works as expected. The thing is that libxml2 and xmllint actually work on URIs. This isn't properly documented in the man page and the API documentation which mostly talk about "files". My guess is that there are some internal tweaks to make filenames with non-URI characters work. That's why "xmllint a\ b.xml" works and doesn't even replace the space with %20 in error messages. But when dealing with directories these tweaks fail to work. I think that xmllint should handle command line arguments as follows: - Try to parse it as a URI. If this works and the URI has one of the supported schemes ("http" or "ftp", maybe also "file") then use the URI. - Otherwise, treat the argument as a filename and escape it before passing it on. This means that spaces and other characters will appear in their escaped form like '%20' in error messages. Public API functions like xmlReadFile should use the same approach. Internally, libxml2 should always deal with URIs without having to guess what the string actually represents. If you look at xmlFileOpen, for example, it first tries the original URI, then the unescaped URI. https://git.gnome.org/browse/libxml2/tree/xmlIO.c?id=8bbe4508ef2a97110eac02f16782678c38ea97af#n950 This is unpredictable. If xmlFileOpen knows that it only receives URIs, it can always unescape.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.