Bug 368382 – Support a pair of <![CDATA[ and ]]> in the description in the source.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 368382 - Support a pair of <![CDATA[ and ]]> in the description in the source.


Summary:	Support a pair of <![CDATA[ and ]]> in the description in the source.


Status:	RESOLVED FIXED

Product:	gtk-doc
Classification:	Platform
Component:	general
Version:	1.7
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtk-doc maintainers
QA Contact:	gtk-doc maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-10-31 16:36 UTC by Shiino Yuki
Modified:	2007-01-16 13:43 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Support a pair of <![CDATA[ and ]]> in the description in the source file. (7.95 KB, patch) 2006-10-31 16:45 UTC, Shiino Yuki	none	Details \| Review
Patch to support CDATA and \|[ ... ]\|, and improve <programlisting> handling (18.27 KB, patch) 2007-01-08 17:51 UTC, Damon Chaplin	none	Details \| Review

Description Shiino Yuki 2006-10-31 16:36:15 UTC

I made a patch which enable gtkdoc-mkdb to parse a pair of
<![CDATA[ and ]]> in the description in the source code,
like bellow:

/**
 * func_name:
 * @par1: ...
 * @par2: ...
 * Your description is here. And a small example code is following.
 *
<informalexample><programlisting><![CDATA[
#include <stdio.h>

int main(void) {
   printf("hello, world\n");
   return 0;
}
]]></programlisting></informalexample>
 */

    This feature is very useful to write a small example to
show how a function may be used. Sometimes, I would like to
write a sample code in a source file rather than in a XML
document file. It's easier to maintain sample code.

    The patch can be downloaded from the following URL.

http://shiino.yuki.googlepages.com/gtk-doc_CDATA.patch

This patch can be applied to CVS current version(2006/10/31),
and was originally posted to gtk-doc-list.

http://mail.gnome.org/archives/gtk-doc-list/2006-October/msg00002.html

Comment 1 Shiino Yuki 2006-10-31 16:45:03 UTC

Created attachment 75727 [details] [review]
Support a pair of <![CDATA[ and ]]> in the description in the source file.

Comment 2 Yeti 2006-12-19 20:05:21 UTC

After using this experimentally I quite like it.  I have not encountered any problems, however I have only tried --sgml-mode, because that's what everyone (including me) seems to use.

What spoils the beauty of direct code sample inclusion a bit is that once a program listing is put into a CDATA section in a C source code comment, neither

  /&ast; Blah blah &ast;/
  /<!-- -->* Blah blah *<!-- -->/

nor

  /* Bla blah */

can be used for comments in the sample code, because the former is kept literally and the latter ends the comment prematurely making the C compiler choke.

However, this is more a material for further thoughts not a problem of this patch: if one decides a CDATA section brings more problems than it solves, one simply does not use a CDATA section.

Comment 3 Matthias Clasen 2006-12-19 20:15:59 UTC

I actually think it would be nicer to expand our @/#/% minilanguage a bit,
and all to write something like

Here is an inline example |foo = 1| in a paragraph. 
And here is a larger example, as a separate program:

|[ 
   foo = 1;
   bar = 2;
   free (baz);
   /* all done :-) */ 
]|

where |bla| would expand to <literal>bla</literal> and
|[bla]| would expand to 
<informalexample><programlisting><![CDATA[
bla
]]></programlisting></informalexample>

Comment 4 Yeti 2006-12-19 20:34:12 UTC

Of course this would be nice, but it's quite prone to breaking on | and || operators and things like

* @msgid: Message id to translate, containing `|'-separated prefix.

At this moment GLib contains about 3 possible breakages, Gtk+ about a dozen.  Anyway, these abbreviations of <literal> are orthogonal in purpose to CDATA sections.

Two-character delimiters like |[ ... ]| for CDATA would be nice again, although I don't quite get how you make the C compiler to accept nested comments such as

/**
 * |[ 
 *   foo = 1;
 *   bar = 2;
 *   free (baz);
 *  /* all done :-) */ 
 * ]|
 **/

Or do you intend this only for expand-content files?  Does it worth doing then?

Comment 5 Matthias Clasen 2006-12-19 21:02:25 UTC

Hmm, yea. Those are some complications. Not sure how to overcome them atm.

Comment 6 Damon Chaplin 2007-01-04 13:39:10 UTC

I think the CDATA approach has 2 major disadvantages - you can't use comments, and
it makes the source code comment block look ugly.

It looks like the main (only?) use for it is with <programlisting>, so maybe we should just do some special handling there, i.e. not add <para> and </para> tags and not do the special character handling for '@' '()' etc.

Would that solve all of the problem?

Comment 7 Yeti 2007-01-04 13:55:53 UTC

It would do for me (probably), but it does not address the situation one *wants* to linkify some code.  These two things are othogonal, if you infer one from another, prepare for future feature requests from people who want it the other way.  

Therefore I prefer some kind of explicit disable-mangling-inside-this markup, although I agree CDATA is not the best approach.

Comment 8 Damon Chaplin 2007-01-04 15:16:27 UTC

The "function()" and "macro()" links don't work if there are any arguments, so aren't that useful in example code.

The "%CONSTANT" one can be confused with the C '%' operator, so can't be used.

But we could still support "#symbol" links, for all symbols (even functions, I think). We'd just need to check for preprocessor directives like "#include", "#define", "#if" etc. (We should probably do that anyway.)


I think that would be a reasonable default - i.e. we don't change anything
in <programlisting> except '#symbol' links (and we replace '<', '>' and '&' with entities if needed).

Comment 9 Yeti 2007-01-04 20:03:57 UTC

(In reply to comment #8)
> But we could still support "#symbol" links, for all symbols (even functions, I
> think). We'd just need to check for preprocessor directives like "#include",
> "#define", "#if" etc. (We should probably do that anyway.)

# can appear in string literals, stringification or symbol concatenation.

> The "function()" and "macro()" links don't work if there are any arguments,
> so aren't that useful in example code.
> ...
> I think that would be a reasonable default - i.e. we don't change anything
> in <programlisting> except '#symbol' links (and we replace '<', '>' and '&'
> with entities if needed).

This has started to deviate considerably from the original goal to have means to disable linkification in a chunk of text -- which is still the thing I'd like to see primarily: with clear and simple rules and not tied to some random DocBook element.  I'd rather avoid getting into situation when the description of what it does looks like the documentation of the syntax of an unnamed language (which you should be familiar with):

[cite]
Because the outcome may be determined by voting based on heuristic estimators,
the result is not strictly predictable.
[/cite]

Well, the really original reasoning was different:

CDATA cannot contain markup
=> linkification of CDATA contents is always wrong
=> CDATA is rendered unsable
=> don't touch CDATA contents to make it usable

This concerns only the usability of CDATA constructs, their use for code samples is just a possible application.

As it had turned out CDATA can clash with C comment parsing because no escaping whatsoever is possible in its content, this thread started to consider other possibilities to preserve content literally while leaving some method of escaping.  Now you suggest some special-processing of some special DocBook items.  This is too narrow and especially it would be much better solved by some construct like Mathias' |[...]| that would preserve the content literally *and* make it an <informalexample><programlisting> block.  This would be a better solution even if you cared only about <programlisting> blocks and ignored other cases one could want to preserve a chunk of text literally.  In addition it would be compatible with the current processing of <programlisting>s (of course, the addition of a new markup also breaks compatibility but we are free to choose it as something that is very unlikely to occur).

Comment 10 Matthias Clasen 2007-01-04 21:07:24 UTC

One option might be to use RCDATA instead of CDATA and use character entities to 
mask the C comment delimiters. But those fine distinctions probably got thrown out with the bathwater when XML ate its father...

Comment 11 Damon Chaplin 2007-01-05 01:18:02 UTC

Matthias: I don't think XML supports RCDATA. I can't see it in the spec.

Yeti: The original goal was to support example code in the docs. The top comment doesn't mention cross-reference links at all.


We need to fix <programlisting> anyway, to stop doing the <para></para> insertion inside it, and to turn off the '()' and '%' links which don't really work inside it.

Then it should be usable for example code (comments are still a problem, but we can't help with that).

Is there really any point in worrying about supporting CDATA or something like |[...]| after we've fixed <programlisting>? I doubt anyone needs much else, and
they can always use entities anyway.

I don't really want to spend much more time on this either.

Comment 12 Matthias Clasen 2007-01-05 03:48:33 UTC

treating programlisting in that way ought to be good enough for 99% of the cases
where I am currently working around by using (<!-- -->) for function calls or
carefully put ' * ' at the beginning of empty lines to prevent <para> splitting.

I do think that |[ ]| or similar would be a bit nicer on the eyes in inline docs
(basically any literal xml in C source files hurts a bit), but thats just cosmetics. 

While we are at it, can we also make & work without escaping in programlistings ?

Comment 13 Yeti 2007-01-05 19:21:04 UTC

(In reply to comment #11)
> The original goal was to support example code in the docs. The top
> comment doesn't mention cross-reference links at all.

On the other hand, the orirignal patch allowed CDATA practically everywhere including places I can hardly imagine people putting <programlistins>s to, that is as a generic mechanism.  Why would someone put work to it?

> Is there really any point in worrying about supporting CDATA or something like
> |[...]| after we've fixed <programlisting>? I doubt anyone needs much else, and
> they can always use entities anyway.

Should I file a separate feature request for a generic mechanism then?  What if I want to write literally the content of <screen> or <synopsis> -- or even inline elements like <computeroutput>, <userinput> and <literal>?  What if DocBook invents new elements?

In addition, are you sure you are not in fact breaking <programlisting>, that is will everyones's current linkification and special characters workarounds continue to work?

While it cannot be something simple and clear instead of special-casing of special cases?

Comment 14 Yeti 2007-01-05 19:34:18 UTC

(In reply to comment #12)
> While we are at it, can we also make & work without escaping in programlistings
> ?

You can choose: Either you have a mechanism to put characters that would clash with the container language (C) there, and then you need an escaping mechanism.  Or it takes everything literally (like CDATA) and then you cannot put anything disallowed in.

The mechanism does not have to be SGML entities (i.e. &), although they are kind of natural here.  The first crucial question is how you want to write text that is not possible to write literally, such as

  /* blah */

And the second crucial question is if/how badly will the proposed solution mangle existing documents.  I can't answer these because my view of the whole issue is obviously incompatible.

Comment 15 Yeti 2007-01-05 19:45:26 UTC

(In reply to comment #14)
> ...mechanism to put characters that would clash with the container language... 

Or sequences of characters.  This does not change much except you can also consider the possibility to invent some complex notation for these sequences (that would allow to write its own escapes somehow too) instead of single characters.

A side note:

  int *p, amp;

  p = &amp;

is valid and meaningful C code.

Comment 16 Damon Chaplin 2007-01-08 17:51:09 UTC

Created attachment 79768 [details] [review]
Patch to support CDATA and |[ ... ]|, and improve <programlisting> handling

Here's a patch to:
 a) Support CDATA in source code comments. (We don't touch anything in CDATA.)
 b) Support |[ ... ]| to include example code. (Just gets converted to
    <informalexample><programlisting>)
 c) Improve <programlisting> handling, by not inserting <para></para> and
    not expanding "()", "@" or "%". (It still expands '#' so people can use
    cross-references to symbols if they want to.).
 d) Improve '#' links a bit by skipping symbols that look like C preprocessor
    directives (e.g. "#include").
 e) Improve '&' handling a bit by converting to '&amp;' if it doesn't
    already look like the start of an entity reference.
 f) Added special case for "#include <xxxx>" in <programlisting> so the "<"
    and ">" get replaced by entities.

I've tested it a bit but it needs more testing before committing. I've checked
the GTK+ output and it doesn't break anything there.

Comment 17 Yeti 2007-01-10 13:36:29 UTC

(In reply to comment #16)
> it needs more testing

It works for me in the sense it does not break existing documentation, I have not tried the new stuff (yet).

Comment 18 Damon Chaplin 2007-01-16 13:43:18 UTC

I tested it on glib as well and it didn't break that so I've committed it.

Reopen if it causes problems.