GNOME Bugzilla – Bug 760836
Phantom variables/functions in XML, created from non-code files
Last modified: 2016-09-05 13:46:04 UTC
Checked in version 1.8.12-GIT (<doxygen version="1.8.12">) The xml output memberdefs of kind function/variable that are non code. For example, the following snippet from all.xml was taken while trying to create xml output for package - https://github.com/waylan/Python-Markdown/releases/tag/2.6.5-final <memberdef kind="function" id="extra_8txt_1afc80a6c723fc08b438df6fb016fc461e" prot="public" static="no" const="no" explicit="no" inline="no" virt="non-virtual"> <type>title There may</type> <definition>title There may be</definition> <argsstring>(index.html) that are distributed with Python-Markdown that are not included here in Extra.The features of those extensions are not part of PHP Markdown Extra</argsstring> <name>be</name> <param> <type>index.</type> <declname>html</declname> </param> <briefdescription> </briefdescription> <detaileddescription> </detaileddescription> <inbodydescription> </inbodydescription> <location file="Python-Markdown-2.6.5-final/docs/extensions/extra.txt" line="37" column="1"/> </memberdef> <memberdef kind="function" id="extra_8txt_1aee61e6bce6c1e5b8c0802804ce9a805e" prot="public" static="no" const="no" explicit="no" inline="no" virt="non-virtual"> <type>title There may and not part of Python Markdown Extra If you really would like Extra to include additional we suggest creating your own clone of Extra under a different</type> <definition>title There may and not part of Python Markdown Extra If you really would like Extra to include additional we suggest creating your own clone of Extra under a different name</definition> <argsstring>(see the[Extension API](api.html)).Markdown Inside HTML Blocks---------------------------Unlike the other Extra features</argsstring> <name>name</name> <param> <type>see </type> <declname>the</declname> <array>(api.html)[Extension API]</array> </param> <briefdescription> </briefdescription> <detaileddescription> </detaileddescription> <inbodydescription> </inbodydescription> <location file="Python-Markdown-2.6.5-final/docs/extensions/extra.txt" line="42" column="1"/> </memberdef> This bug does not occur in a previous version of Doxygen, however I'm unsure exactly which version I'm using (it's written 1.8.11 in the VERSION & in xml, but haven't updated in around 6 months, before the 1.8.11 release date 30-12-2015) From the looks of it, doxygen crawls and autodocuments `.txt`, as appears in `Python-Markdown-2.6.5-final/docs/extensions/extra.txt` If you need any more data or examples, please let me know. Thx.
I checked previous versions: The bug was introduced after Aug 23, 2015. The version of Aug 23, 2015 (https://github.com/doxygen/doxygen/tree/663544cc0caf9109ea10c33f38b1e07e7a01a575) doesn't seem to have this bug. Hope this helps
Which steps did you take to get the xml output on a reduced input and signal in which output file the problem occurs. Please attach the self-contained example (source+config file in a tar or zip) that allows us to reproduce the problem?
The config I'm using is the following: # Doxyfile 1.8.8 #--------------------------------------------------------------------------- # Project related configuration options #--------------------------------------------------------------------------- DOXYFILE_ENCODING = UTF-8 PROJECT_NAME = "My Project" PROJECT_NUMBER = PROJECT_BRIEF = PROJECT_LOGO = OUTPUT_DIRECTORY = pakgdoxy CREATE_SUBDIRS = NO ALLOW_UNICODE_NAMES = NO OUTPUT_LANGUAGE = English BRIEF_MEMBER_DESC = YES REPEAT_BRIEF = YES ABBREVIATE_BRIEF = ALWAYS_DETAILED_SEC = NO INLINE_INHERITED_MEMB = NO FULL_PATH_NAMES = YES STRIP_FROM_PATH = STRIP_FROM_INC_PATH = SHORT_NAMES = NO JAVADOC_AUTOBRIEF = NO QT_AUTOBRIEF = NO MULTILINE_CPP_IS_BRIEF = NO INHERIT_DOCS = YES SEPARATE_MEMBER_PAGES = NO TAB_SIZE = 4 ALIASES = TCL_SUBST = OPTIMIZE_OUTPUT_FOR_C = NO OPTIMIZE_OUTPUT_JAVA = YES OPTIMIZE_FOR_FORTRAN = NO OPTIMIZE_OUTPUT_VHDL = NO EXTENSION_MAPPING = MARKDOWN_SUPPORT = YES AUTOLINK_SUPPORT = YES BUILTIN_STL_SUPPORT = NO CPP_CLI_SUPPORT = NO SIP_SUPPORT = NO IDL_PROPERTY_SUPPORT = YES DISTRIBUTE_GROUP_DOC = NO SUBGROUPING = YES INLINE_GROUPED_CLASSES = NO INLINE_SIMPLE_STRUCTS = NO TYPEDEF_HIDES_STRUCT = NO LOOKUP_CACHE_SIZE = 0 #--------------------------------------------------------------------------- # Build related configuration options #--------------------------------------------------------------------------- EXTRACT_ALL = YES EXTRACT_PRIVATE = YES EXTRACT_PACKAGE = YES EXTRACT_STATIC = YES EXTRACT_LOCAL_CLASSES = YES EXTRACT_LOCAL_METHODS = NO EXTRACT_ANON_NSPACES = NO HIDE_UNDOC_MEMBERS = NO HIDE_UNDOC_CLASSES = NO HIDE_FRIEND_COMPOUNDS = NO HIDE_IN_BODY_DOCS = NO INTERNAL_DOCS = NO CASE_SENSE_NAMES = YES HIDE_SCOPE_NAMES = NO SHOW_INCLUDE_FILES = YES SHOW_GROUPED_MEMB_INC = NO FORCE_LOCAL_INCLUDES = NO INLINE_INFO = YES SORT_MEMBER_DOCS = YES SORT_BRIEF_DOCS = NO SORT_MEMBERS_CTORS_1ST = NO SORT_GROUP_NAMES = NO SORT_BY_SCOPE_NAME = NO STRICT_PROTO_MATCHING = NO GENERATE_TODOLIST = YES GENERATE_TESTLIST = YES GENERATE_BUGLIST = YES GENERATE_DEPRECATEDLIST= YES ENABLED_SECTIONS = MAX_INITIALIZER_LINES = 30 SHOW_USED_FILES = YES SHOW_FILES = YES FILE_VERSION_FILTER = LAYOUT_FILE = CITE_BIB_FILES = #--------------------------------------------------------------------------- # Configuration options related to warning and progress messages #--------------------------------------------------------------------------- QUIET = YES WARNINGS = YES WARN_IF_UNDOCUMENTED = YES WARN_IF_DOC_ERROR = YES WARN_NO_PARAMDOC = NO WARN_FORMAT = "$file:$line: $text" WARN_LOGFILE = #--------------------------------------------------------------------------- # Configuration options related to the input files #--------------------------------------------------------------------------- INPUT = INPUT_ENCODING = UTF-8 FILE_PATTERNS = RECURSIVE = YES EXCLUDE = EXCLUDE_SYMLINKS = NO EXCLUDE_PATTERNS = */tests/* */test/* EXCLUDE_SYMBOLS = EXAMPLE_PATH = EXAMPLE_PATTERNS = EXAMPLE_RECURSIVE = NO IMAGE_PATH = INPUT_FILTER = FILTER_PATTERNS = FILTER_SOURCE_FILES = NO FILTER_SOURCE_PATTERNS = USE_MDFILE_AS_MAINPAGE = #--------------------------------------------------------------------------- # Configuration options related to source browsing #--------------------------------------------------------------------------- SOURCE_BROWSER = NO INLINE_SOURCES = NO STRIP_CODE_COMMENTS = YES REFERENCED_BY_RELATION = NO REFERENCES_RELATION = NO REFERENCES_LINK_SOURCE = YES SOURCE_TOOLTIPS = YES USE_HTAGS = NO VERBATIM_HEADERS = YES #--------------------------------------------------------------------------- # Configuration options related to the alphabetical class index #--------------------------------------------------------------------------- ALPHABETICAL_INDEX = YES COLS_IN_ALPHA_INDEX = 5 IGNORE_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the HTML output #--------------------------------------------------------------------------- GENERATE_HTML = NO HTML_OUTPUT = html HTML_FILE_EXTENSION = .html HTML_HEADER = HTML_FOOTER = HTML_STYLESHEET = HTML_EXTRA_STYLESHEET = HTML_EXTRA_FILES = HTML_COLORSTYLE_HUE = 220 HTML_COLORSTYLE_SAT = 100 HTML_COLORSTYLE_GAMMA = 80 HTML_TIMESTAMP = YES HTML_DYNAMIC_SECTIONS = NO HTML_INDEX_NUM_ENTRIES = 100 GENERATE_DOCSET = NO DOCSET_FEEDNAME = "Doxygen generated docs" DOCSET_BUNDLE_ID = org.doxygen.Project DOCSET_PUBLISHER_ID = org.doxygen.Publisher DOCSET_PUBLISHER_NAME = Publisher GENERATE_HTMLHELP = NO CHM_FILE = HHC_LOCATION = GENERATE_CHI = NO CHM_INDEX_ENCODING = BINARY_TOC = NO TOC_EXPAND = NO GENERATE_QHP = NO QCH_FILE = QHP_NAMESPACE = org.doxygen.Project QHP_VIRTUAL_FOLDER = doc QHP_CUST_FILTER_NAME = QHP_CUST_FILTER_ATTRS = QHP_SECT_FILTER_ATTRS = QHG_LOCATION = GENERATE_ECLIPSEHELP = NO ECLIPSE_DOC_ID = org.doxygen.Project DISABLE_INDEX = NO GENERATE_TREEVIEW = NO ENUM_VALUES_PER_LINE = 4 TREEVIEW_WIDTH = 250 EXT_LINKS_IN_WINDOW = NO FORMULA_FONTSIZE = 10 FORMULA_TRANSPARENT = YES USE_MATHJAX = NO MATHJAX_FORMAT = HTML-CSS MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest MATHJAX_EXTENSIONS = MATHJAX_CODEFILE = SEARCHENGINE = YES SERVER_BASED_SEARCH = NO EXTERNAL_SEARCH = NO SEARCHENGINE_URL = SEARCHDATA_FILE = searchdata.xml EXTERNAL_SEARCH_ID = EXTRA_SEARCH_MAPPINGS = #--------------------------------------------------------------------------- # Configuration options related to the LaTeX output #--------------------------------------------------------------------------- GENERATE_LATEX = NO LATEX_OUTPUT = latex LATEX_CMD_NAME = latex MAKEINDEX_CMD_NAME = makeindex COMPACT_LATEX = NO PAPER_TYPE = a4 EXTRA_PACKAGES = LATEX_HEADER = LATEX_FOOTER = LATEX_EXTRA_FILES = PDF_HYPERLINKS = YES USE_PDFLATEX = YES LATEX_BATCHMODE = NO LATEX_HIDE_INDICES = NO LATEX_SOURCE_CODE = NO LATEX_BIB_STYLE = plain #--------------------------------------------------------------------------- # Configuration options related to the RTF output #--------------------------------------------------------------------------- GENERATE_RTF = NO RTF_OUTPUT = rtf COMPACT_RTF = NO RTF_HYPERLINKS = NO RTF_STYLESHEET_FILE = RTF_EXTENSIONS_FILE = #--------------------------------------------------------------------------- # Configuration options related to the man page output #--------------------------------------------------------------------------- GENERATE_MAN = NO MAN_OUTPUT = man MAN_EXTENSION = .3 MAN_SUBDIR = MAN_LINKS = NO #--------------------------------------------------------------------------- # Configuration options related to the XML output #--------------------------------------------------------------------------- GENERATE_XML = YES XML_OUTPUT = xml XML_PROGRAMLISTING = NO #--------------------------------------------------------------------------- # Configuration options related to the DOCBOOK output #--------------------------------------------------------------------------- GENERATE_DOCBOOK = NO DOCBOOK_OUTPUT = docbook DOCBOOK_PROGRAMLISTING = NO #--------------------------------------------------------------------------- # Configuration options for the AutoGen Definitions output #--------------------------------------------------------------------------- GENERATE_AUTOGEN_DEF = NO #--------------------------------------------------------------------------- # Configuration options related to the Perl module output #--------------------------------------------------------------------------- GENERATE_PERLMOD = NO PERLMOD_LATEX = NO PERLMOD_PRETTY = YES PERLMOD_MAKEVAR_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the preprocessor #--------------------------------------------------------------------------- ENABLE_PREPROCESSING = YES MACRO_EXPANSION = NO EXPAND_ONLY_PREDEF = NO SEARCH_INCLUDES = YES INCLUDE_PATH = INCLUDE_FILE_PATTERNS = PREDEFINED = EXPAND_AS_DEFINED = SKIP_FUNCTION_MACROS = YES #--------------------------------------------------------------------------- # Configuration options related to external references #--------------------------------------------------------------------------- TAGFILES = GENERATE_TAGFILE = ALLEXTERNALS = NO EXTERNAL_GROUPS = YES EXTERNAL_PAGES = YES PERL_PATH = /usr/bin/perl #--------------------------------------------------------------------------- # Configuration options related to the dot tool #--------------------------------------------------------------------------- CLASS_DIAGRAMS = YES MSCGEN_PATH = DIA_PATH = HIDE_UNDOC_RELATIONS = YES HAVE_DOT = NO DOT_NUM_THREADS = 0 DOT_FONTNAME = Helvetica DOT_FONTSIZE = 10 DOT_FONTPATH = CLASS_GRAPH = YES COLLABORATION_GRAPH = YES GROUP_GRAPHS = YES UML_LOOK = NO UML_LIMIT_NUM_FIELDS = 10 TEMPLATE_RELATIONS = NO INCLUDE_GRAPH = YES INCLUDED_BY_GRAPH = YES CALL_GRAPH = NO CALLER_GRAPH = NO GRAPHICAL_HIERARCHY = YES DIRECTORY_GRAPH = YES DOT_IMAGE_FORMAT = png INTERACTIVE_SVG = NO DOT_PATH = DOTFILE_DIRS = MSCFILE_DIRS = DIAFILE_DIRS = PLANTUML_JAR_PATH = DOT_GRAPH_MAX_NODES = 50 MAX_DOT_GRAPH_DEPTH = 0 DOT_TRANSPARENT = NO DOT_MULTI_TARGETS = NO GENERATE_LEGEND = YES DOT_CLEANUP = YES The xml example I attached is just a snippet of the all.xml. The zip can be found in https://github.com/waylan/Python-Markdown/releases/tag/2.6.5-final Tell me if you need anything else. I can attach the full all.xml if it's needed. I'm constantly monitoring the xml output on various open-source packages. This bug occurred in multiple packages, not only this specific package. And again, it was added after Aug 23,2015. Hope this helps, tell me if there's anything more you need.
I didn't find the all.xml when directly running doxygen, as far as I know by head this is a result of a "postprocessing" step with xsl etc. I looked for the text part: <definition>title There may be</definition> <argsstring>(index.html) that are distributed with Python-Markdown that are not included here in Extra.The features of those extensions are not part of PHP Markdown Extra</argsstring> Which I found in the 1.8.11 version in extra_8txt.xml. This file was not present in the 1.8.10 version. In version 1.8.10 the txt files were not processed by default, in the 1.8.11 version they are. When enabling the txt files in in the 1.8.10 version the file this extra_8txt.xml will appear too and has the same content (except for the version number) as in the 1.8.11 version. You might want to disable the *.txt (and *.md) files or even, in this case, the complete docs directory.
Thank you for your comments Why have txt files been enabled since 1.8.10? What was the feature request? I think it's a bad solution that by default, non code files will be recognized as proper functions in the xml, with argsstrings, params, code body etc. As a user, for me, doxygen's strength is to distinguish between actual code and other stuff, and by default it should read only comments of the actual code. Isn't it better that by default doxygen only parses a specific set of extensions known as code like .h .cpp etc? Moreover, i think it breaks doxygen, because the list of txt extensions that a user can have is endless there's no way to exclude all of them. As the bug title says, a user that runs doxygen on a given repository will have a high chance of having phantom functions as a result. On the other hand, the list of code files is finite. If the feature request that asked for parsing txt file is important, an easy fix would maybe be to introduce an additional flag in the cfg additional non-code file extensions that doxygen will parse. Hope my comment is helpful, if I can provide any more feedback, please let me know.
Regarding my statement: "In version 1.8.10 the txt files were not processed by default, in the 1.8.11 version they are. When enabling the txt files in in the 1.8.10 version the file this extra_8txt.xml will appear too and has the same content (except for the version number) as in the 1.8.11 version." in Comment 4 I now think it is not true as I didn't find any reference yet why it is present now and was not present in the past. Looks like txt file are processed as C files (historical reason, now still present due to compatibility, maybe setting the txt files to md files in the EXTENSION_MAPPING might help. I think that in principle txt files are still useful as they might contain some background information (like in md, dox files etc).
Hi, I don't think it's there for historical reasons - it's new. It did not occur in the Aug 25, 2015 version in Github (https://github.com/doxygen/doxygen/tree/663544cc0caf9109ea10c33f38b1e07e7a01a575), with the attached config file, as described in comment2. I've looked at the git commit messages since then and couldn't find any message that directly relates to why it was changed. I'm not intimate with Doxygen source code to git blame the specific line that defines which extensions are read by default. I can do more tests and find the specific commit that changed it. Think of users using doxygen out of the box on a given GitHub repository, containing source files and other non-source files. If Doxygen will create functions/namespaces/classes that are simply not there, they will not give Doxygen a second chance, and that would be a shame. The problem will get bigger as the number of non-source files increases. I personally don't use the HTML output - will these "phantom" functions/classes/namespaces be created there as well? If you look at any GitHub repository, I agree with you that there is useful information in these files, but I think the easiest way to view them is to directly look at the files and directories of the repository. A simple link to the repository will give you all that info. I don't think Doxygen should crawl these files. One question just to be clear - The new default will make it read .txt & .md extensions only, or everything? What about .rst .doc .docx .xls .xml .html etc...
Did some further research and found pull request 383 (https://github.com/doxygen/doxygen/pull/383) from August 16 2015 and incorporated on August 31 2015 with comment: Make list of default extensions consistent with language mapping list In the forum a discrepancy was noted between config.l and config.xml (http://doxygen.10944.n7.nabble.com/FILE-PATTERNS-one-custom-plus-all-defaults-td7308.html). The config.xml list was based on the function initDefaultExtensionMapping in util.cpp and the list in the function Config::check in config.l. Now both routines use the same list. One of the file extensions added by default was 'txt' and this is by default interpreted as C code. In the past it was possible to specify 'txt' files as well and they would be interpreted as C code (main usage will probably have been to give some text and place this between C comment signs so the default interpreter could be used). The 'txt' files mentioned in this bug report are more markdown style and using an EXTENSION_MAPPING would be beneficial. Looking at the config.xml file this file has not been updated and thus there is now and inconsistency between the documentation and the implementation. This inconsistency should be removed.
List of supported extensions and the parser that is used: ".dox", "c" ".txt", "c" ".doc", "c" ".c", "c" ".C", "c" ".cc", "c" ".CC", "c" ".cxx", "c" ".cpp", "c" ".c++", "c" ".ii", "c" ".ixx", "c" ".ipp", "c" ".i++", "c" ".inl", "c" ".h", "c" ".H", "c" ".hh", "c" ".HH", "c" ".hxx", "c" ".hpp", "c" ".h++", "c" ".idl", "idl" ".ddl", "idl" ".odl", "idl" ".java", "java" ".as", "javascript" ".js", "javascript" ".cs", "csharp" ".d", "d" ".php", "php" ".php4", "php" ".php5", "php" ".inc", "php" ".phtml", "php" ".m", "objective-c" ".M", "objective-c" ".mm", "c" ".py", "python" ".pyw", "python" ".f", "fortran" ".for", "fortran" ".f90", "fortran" ".vhd", "vhdl" ".vhdl", "vhdl" ".tcl", "tcl" ".ucf", "vhdl" ".qsf", "vhdl" ".md", "md" ".markdown", "md"
Thank you for all your hard work. I can verify that: 1) The said pull request was the cause of the mentioned bug 2) changing the EXTENSION_MAPPING to txt=md solved it. If I may, I will suggest changing the default away from txt=c, and not only changing the documentation. I think it will fit better most GitHub repositories who tend to have many txt files, especially since doxygen is used these days for other languages than c. I also think that for the repository used to report this bug, the new default creates a bug, because classes that are not there are created. I also checked the html output - please look at the html created for class "so" in "classes" tab, "class list" tab which simply does not exist in the code. Same If you look on "classes" tab, "class hierarchy", class "so" seems to inherit from classes that do not exist. I think that currently the best default would be txt=md or don't parse txt files (which was the default behavior before Aug 31 2015) - both options are programming language agnostic.
Created attachment 320097 [details] comment10 html screenshot
I'll remove parsing .txt files by default like before. Then the user can configure doxygen to do this using the "c" or "markdown" parser if needed, using FILE_PATTERNS and EXTENSION_MAPPING.
Thank you, On the same topic, what about the extensions ".dox" and ".doc"? The same logic for ".txt" should hold. If I understand correctly from http://doxygen.10944.n7.nabble.com/FILE-PATTERNS-one-custom-plus-all-defaults-td7308.html#a7310 ,the default was that ".dox" was parsed, and ".doc" was not. IMHO, leaving ".dox" for backward compatibility reasons alone is not a good idea.
This bug was previously marked ASSIGNED, which means it should be fixed in doxygen version 1.8.12. Please verify if this is indeed the case. Reopen the bug if you think it is not fixed and please include any additional information that you think can be relevant (preferably in the form of a self-contained example).