Bug 472660 – Style information for GtkTextIter

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 472660 - Style information for GtkTextIter


Summary:	Style information for GtkTextIter


Status:	RESOLVED FIXED

Product:	gtksourceview
Classification:	Platform
Component:	General
Version:	1.90.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	GTK Sourceview maintainers
QA Contact:	GTK Sourceview maintainers

URL:
Whiteboard:

Depends on:
Blocks:	566982 586457 587619

Reported:	2007-09-01 21:10 UTC by Mathias Hasselmann (IRC: tbf)
Modified:	2010-01-09 18:54 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
A potential solution for the problem. (2.85 KB, patch) 2007-09-01 21:16 UTC, Mathias Hasselmann (IRC: tbf)	none	Details \| Review
Alterative solution (12.01 KB, patch) 2007-09-03 10:38 UTC, Mathias Hasselmann (IRC: tbf)	none	Details \| Review
Custom attributes per context (34.71 KB, patch) 2009-11-29 21:12 UTC, jessevdk@gmail.com	none	Details \| Review
Implemented classes support (34.56 KB, patch) 2009-12-06 16:01 UTC, jessevdk@gmail.com	none	Details \| Review
Revised API names and added forward/backward context class toggle (44.26 KB, patch) 2009-12-23 22:15 UTC, jessevdk@gmail.com	none	Details \| Review
Revision 2, code cleanup and small API change (45.37 KB, patch) 2009-12-31 12:53 UTC, jessevdk@gmail.com	accepted-commit_now	Details \| Review

Description Mathias Hasselmann (IRC: tbf) 2007-09-01 21:10:21 UTC

Currently gtksourceview doesn't have any API to figure out which source style was used to apply tags to a given source buffer position. A possible method to publish this information would look like this:

 - name GtkTextTags with the style-id used to create them
 - provide some API like gtk_source_language_get_parent_style for figuring out which category (defined by parent/"map_to" styles) a language style belongs to:

    const char* 
    gtk_source_language_get_parent_style (GtkSourceLanguage *language,
                                          const char        *style_id);

By using this changes the following code could be used to identify some region:

        var buffer = (SourceBuffer)view.buffer;
        var position = buffer.cursor_position;

        TextIter iter;
        buffer.get_iter_at_offset (out iter, position);
        weak SList<weak TextTag> tags = iter.get_tags ();
        stdout.printf ("tags at %d:\n", position);

        foreach (TextTag t in tags) {
            var style = buffer.language.get_parent_style (t.name);

            while (!style.has_prefix ("def:")) {
                var parent = buffer.language.get_parent_style (style);

                if (null == parent)
                    break;

                style = parent;
            }

            stdout.printf (" - %p, type=%s, name=%s, style=%s\n", 
                           t, t.get_type ().name (), t.name, style);
        }

        tags.free ();

A function to identify comments cold look this:

    bool is_comment (TextTag tag) {
        var language = ((SourceBuffer) tag.buffer).language;
        var style = language.get_parent_style (tag.name);

        while (null != style) {
            if (style == "def:comment")
                return true;

            style = language.get_parent_style (style);
        }

        return false;
    }

Comment 1 Mathias Hasselmann (IRC: tbf) 2007-09-01 21:16:02 UTC

Created attachment 94781 [details] [review]
A potential solution for the problem.

Maybe quarks should be used?

Comment 2 Paolo Borelli 2007-09-03 08:26:28 UTC

Cut&paste of my comments on irc

<tbf> hey pbor
<pbor> hi tbf 
<tbf> pbor: did you have a chance to look at my patch for getting style ids for a GtkTextTag? (http://bugzilla.gnome.org/show_bug.cgi?id=472660)
<pbor> yes, I had a quick look
<pbor> as I told you the other day I really don't think this is the path to follow
<pbor> you don't want to know if a word is colored like a comment
<tbf> pbor: why: its trivial code given the information needed
<pbor> you want to know if a word *is* a comment
<tbf> hmm.... ok. that difference. duh.
<pbor> you want to expose the <context> not the <style>
<pbor> or in your use case you want to expose a <context> property
<pbor> that is is-spell-checkable
<pbor> at least that's what I think
<pbor> I want to discuss it with paolo and muntyan though
<tbf> well, def:comment is an easy heuristic
<pbor> As I told you my idea would be to extend the lang syntax and allow an spell-check="true" attribute to contexts

Comment 3 Mathias Hasselmann (IRC: tbf) 2007-09-03 10:38:10 UTC

Created attachment 94854 [details] [review]
Alterative solution

Attached an alternative solution for the problem:

2007-09-03  Mathias Hasselmann  <mathias.hasselmann@gmx.de>

	Implement gtk_source_iter_is_verbatim_text which help to figure out if
	a text iter covers verbatim text which is appliable for instance for
	spell checking. Currently this information is retrieved by traversing
	the entire tree Context nodes. Maybe an index should be used here.
	Fixes bug #472660. 

	TODO: Update language files to use the "verbatim" attribute.

	* gtksourceview/gtksourcebuffer.c (gtk_source_iter_is_verbatim_text):
	  new function to figure out if a text iter covers verbatim text
	* gtksourceview/gtksourcecontextengine.c (get_context_for_tag),
	  (gtk_source_context_engine_is_verbatim_text),
	  (_gtk_source_context_engine_class_init): implement virtual
	  is_verbatim_text method
	* gtksourceview/gtksourcecontextengine.h (GtkSourceContextFlags): 
	  add GTK_SOURCE_CONTEXT_VERBATIM flag to mark verbatim regions
	* gtksourceview/gtksourceengine.c
	  (_gtk_source_engine_is_verbatim_text): new function
	* gtksourceview/gtksourceengine.h (_GtkSourceEngineClass):
	  add virtual is_verbatim_text method
	* gtksourceview/gtksourceiter.c: add comment about where to find
	  gtk_source_iter_is_verbatim_text.
	* gtksourceview/gtksourceiter.h: declare new function
	  gtk_source_iter_is_verbatim_text
	* gtksourceview/gtksourcelanguage-parser-2.c (get_context_flags):
	  recognize "verbatim" attribute
	* gtksourceview/language-specs/language2.rng (context-simple),
	  (context-container): allow boolean "verbatim" attribute
	* gtksourceview/language-specs/c.lang: add "verbatim" attributes
	* tests/test-widget.c (update_cursor_position): 
	  check result of gtk_source_iter_is_verbatim_text

Comment 4 Paolo Maggi 2007-09-03 16:30:53 UTC

I'm not a native english speaker, but "verbatim" is not very evocative to me, so it is not clear at first sight what this function should be used for.

To better understand if this is the right approach, I'd like to see how it work for languages like "XML", "HTML", "Shell scripts", "Docbook" and "Latex", i.e. languages that have different characteristics.

BTW, we need a way to export other "meta information" about contexts, for example to implement "code folding".

Comment 5 Yevgen Muntyan 2007-09-03 17:08:22 UTC

(In reply to comment #4)
> I'm not a native english speaker, but "verbatim" is not very evocative to me,
> so it is not clear at first sight what this function should be used for.
> 
> To better understand if this is the right approach, I'd like to see how it work
> for languages like "XML", "HTML", "Shell scripts", "Docbook" and "Latex", i.e.
> languages that have different characteristics.

Absolutely.

> BTW, we need a way to export other "meta information" about contexts, for
> example to implement "code folding".

Folding seems different from this verbatim thing. But indeed perhaps we will want something else? Some folks could want some "docstring" context which should be formatted according to some rules or something. Like spell-checking.

Proposal: add an attribute to contexts, like this: <context attributes="verbatim">. For each value (like "verbatim") we create a tag, and mark with that tag all the text which falls into those contexts.
Mathias proposes to let user provide the tag or 

GtkTextTag* gtk_source_buffer_create_attribute_tag (GtkSourceBuffer * buffer, const gchar *attribute_name);

which does look nice. 

I'd call this particular attribute "spell-check", by the way. Why invent some names to hide the fact that it's needed for spell-checking?

Comment 6 thelema 2007-09-03 17:31:42 UTC

I wonder if per-context metadata could provide the backbone for smart-indenting.  It'll take a lot more contexts and more extensive parsing of languages, but maybe something like 
<context id="braces" indent="t">
  <start>{</start>
  <end>}</end>
</context> 
could guide gsv to know that lines that start within this context (maybe with exception of final line if it only contains }) should be indented by one "t"ab.  We might also have indent="-t" to go back a tab, "2" for two spaces of indent, etc.

Just an idea.

Comment 7 Johannes Schmid 2008-02-05 19:20:37 UTC

I really like the idea to be able to receive information from the GtkTextTag. For anjuta we would like to have a bit more fine-tuning eg.

typedef enum GtkSourceCodeType
{
  GTK_SOURCE_CODE_TYPE_STRING,
  GTK_SOURCE_CODE_TYPE_COMMENT,
  GTK_SOURCE_CODE_TYPE_KEYWORD,
  GTK_SOURCE_CODE_TYPE_OTHER
  /* etc. */
}

plus

GTK_SOURCE_CODE_IS_VERBATIM ((o)) ((o) == GTK_SOURCE_CODE_TYPE_STRING \
  || (o) == GTK_SOURCE_CODE_TYPE_COMMENT)

Something like that. Of course we could add more types but we should have it only for the most common things.

Comment 8 Johannes Schmid 2008-05-19 09:48:32 UTC

Again, time to ping someone on this!

After thinking about it again, I guess it would be enough to just have "verbatim" (or some better fitting name). It would just separate code from any non-code stuff which is important for autocompletion and other things.

Spell checking should probably be yet another attribute.

Comment 9 Johannes Schmid 2009-02-06 16:27:12 UTC

Any news on this? Should I write a patch that support custom-attributes? Like "spell-checking", "comment", "string"?

Comment 10 Johannes Schmid 2009-04-17 14:59:43 UTC

Any news on this?

Comment 11 jessevdk@gmail.com 2009-11-29 21:12:37 UTC

Created attachment 148703 [details] [review]
Custom attributes per context

I think we should try to get this feature in the current cycle. The patch implements assigning any kind of custom attribute on a per-context basis. At the moment, you do this by adding <attribute name=""/> elements in the context. Sub-contexts can suppress attributes using <attribute name="" enabled="false"/>.

The code and the way to specify attributes is subject to discussion and change, but I think at least the underlying way of dealing with meaningful attributes on contexts is quite solid.

The patch modifies test-widget and the C language to include some attributes and display them at the bottom of the window while you navigate the cursor. Comments are welcome.

Comment 12 Johannes Schmid 2009-12-02 00:03:08 UTC

This patch really looks promising. I am not quite sure if we need also these attributes mentioned in the lang file - maybe we could agree on a smaller subset. But it's not task of the mechanism to decide that. Maybe some applications want to use more attributes.

I am not sure though if it is useful to have the has_attribute() method as it's more or less get_attributes() != NULL. Would we have a performance problem otherwise?

Comment 13 jessevdk@gmail.com 2009-12-05 12:41:26 UTC

1) I think we should have some documented set of meaningful attributes and apply them to the lang files. Applications could decide what to do with them. Right now it's not very trivial for applications to add other attributes directly. If we can converge on the right set of attributes to apply by default, I think it's not needed for now. Otherwise, one idea might be to have ways to inject xml when a language is loaded, maybe using xslt, if the application chooses to do so. This way, applications could extend the lang file.

2) I think there are three reasons at the moment to keep has_attribute:
2.1) It's a bit more efficient (not that much though)
2.2) Right now, get_attributes returns a list of strings which has to be freed using g_list_foreach (list, g_free, NULL); g_list_free (list);, so not very convenient for just checking an attributes existence
2.3) API wise it's nice, code is easy to read, and conforms other API like gio for attributes

That said, the API, and the way attributes are specified in XML is open for discussion, so anyone who has anything to comment/discuss, please do so.

Comment 14 jessevdk@gmail.com 2009-12-06 16:01:34 UTC

Created attachment 149201 [details] [review]
Implemented classes support

A small revision after some discussions. This basicly renames 'attribute' to 'class' (it describes better the purpose of the functionality), and defines them differently in the XML specification:

<context class="class1 class2" class-disabled="class3"/>

Comment 15 Paolo Borelli 2009-12-06 16:24:58 UTC

Maybe rename the getter from gtk_source_buffer_get_classes_at_iter to gtk_source_buffer_get_context_classes_at_iter for clarity given that "class" is till a generic word? (but better than attribute ;)

Apart from that I am fine with the patch... if someone else has different opinions about the terminology we used, now it would be a good time to say it :)

Comment 16 José Aliste 2009-12-10 23:02:37 UTC

Hi, some initials comments about the patch.

1. I think the way of extending the lang schema is good!

2. gtk_source_buffer_iter_has_class and gtk_source_buffer_get_class_at_iter will be better placed in gtksourceiter.c IMO.

3. I think the implementation of  gtk_source_buffer_has_class could be improved,
  e.g.  by doing a gtk_text_tag_table_lookup on the class name, and then comparing the list of tags with the tag obtained by the lookup. I imagine that this would be faster than what you are currently doing (we would need a pointer to the GtkTextTagTable structure though, and I really don't know if it would be much faster... but anyway).

4. For code folding this patch is not enough as I understand it. However, there is not so much to add to it!!!! I think once this lands and my patches for manual code folding also land, we can easily implement syntax code folding. Here is what will be need to be done (from my understanding of your code, please correct me when appropriate), explained in an example.  

Suppose you are marking the <context> that matches /* and */ for comments in C and you add the class fold to it. Then, in some point in your patch you are doing 
apply_classes (ce, classes, start_offset, end_offset); 
which is going to do 
gtk_text_buffer_apply_tag with:
 - the tag associated to fold class,
 - the start iter at the match of the /*,
 - the  end iter at the end of */.

So to make folds work, we just need to create a sourcefold with these two iters, which will convert hem into marks (a fold is a struct having a start mark, a stop mark, a type (whether is was generated by the highlighting engine, manually or other...)).
Please note, that the tag is not needed for folding (and you can't have the tag being assigned twice...) 

So in resume, this patch looks really great.

Another thing that is missing here (and not strictly related) is to add the  attribute fold-indent to the <language> tag, so we know when to fold by indentation.

Comment 17 jessevdk@gmail.com 2009-12-13 22:05:20 UTC

2) GtkSourceIter is actually quite bad, and a bit of a left over from the old days. There isn't even a thing called GtkSourceIter, and as such it's quite difficult for bindings to handle correctly. Therefore, we decided to put it in GtkSourceBuffer

3) Just doing the lookup will not do anything, the tag table just contains all the tags that _may_ be applied at some part of the document. What would be better is to use the hash table name -> tag, lookup the tag, and then use gtk_text_iter_has_tag. I think that should be the fastest way. That said, the function as it is now will not be too inefficient since there will only be few tags on an iter.

4) I don't have in depth knowledge about the implementation, but I think we could have two classes defined 'start-fold' and 'end-fold', which are applied to the relevant contexts. You can also apply classes to sub contexts, so I think it should be doable. Then we would maybe need something similar to gtk_text_iter_forward_to_tag_toggle so that you can forward/backward to classes.

Haven't thought too much about it, but maybe this could be used to detect fold regions.

Comment 18 jessevdk@gmail.com 2009-12-23 22:15:52 UTC

Created attachment 150326 [details] [review]
Revised API names and added forward/backward context class toggle

This latest patch makes some improvements to the API names and adds API to iter forward/backward toggle to a specific context class. This API is similar to gtk_text_iter_forward/backward_to_tag_toggle. There seems to be a small issue with the context classes in general, which is that the context engine is highlighting (and thus annotating context classes) lazily. Which means you cannot get context class information from any text that has not been highlighted yet. Currently, this means that you have to have scrolled it in to view once.

This kind of limits the usefulness of the context classes to things that you do on things that you see in the view (there are still some applications). I'm not sure if there is a good solution for this right now.

Comment 19 Paolo Borelli 2009-12-23 23:40:53 UTC

Review of attachment 150326 [details] [review]:

quick couple of comments I spotted, so I do not forget them

::: gtksourceview/gtksourcebuffer.c
@@ +1939,3 @@
+ * @context_class: class to search for
+ *
+ * Check if the class @klass is set on @iter.

@context_class

@@ +1979,3 @@
+ * Since: 2.10
+ **/
+GSList *

what about returning an array instead of a list

Comment 20 Johannes Schmid 2009-12-23 23:58:08 UTC

hmm, only having visible text is some kind of limitation of course but could still be ok because the visible area is normally also the most interesting in regard of user interaction.

Comment 21 jessevdk@gmail.com 2009-12-24 11:58:23 UTC

(In reply to comment #19) 
> what about returning an array instead of a list

I myself am not a big fan of arrays over lists, but sure, I can change that.

(In reply to comment #20)
> hmm, only having visible text is some kind of limitation of course but could
> still be ok because the visible area is normally also the most interesting in
> regard of user interaction.

It depends, some useful things like folding, or navigating the document (jumping over comments, strings, scopes) you could do with these classes, but the mentioned limitation makes this not really doable. We discussed it a bit and I think we will first go ahead with the patch, and then try to look into how we can solve this internally.

Comment 22 Paolo Borelli 2009-12-24 17:53:55 UTC

Review of attachment 150326 [details] [review]:

I am commenting just on the contextengine part, the rest seems straightforward.

In general I'd like to see the term context_class (as opposed to simply class) also in the internals, since it can be confusing.

More comments inline

::: gtksourceview/gtksourcecontextengine.c
@@ +545,3 @@
 /* TAGS AND STUFF -------------------------------------------------------------- */
 
+static ClassDefinition *

is it worth having this helper struct? seems a lot of code for little gain, you could use text tag directly and set data on it for enable/disable
You are already using set_data to store the name, no?

@@ +570,3 @@
+	ClassTag *attrtag = g_slice_new (ClassTag);
+
+	attrtag->tag = tag;

if we keep the helper struct, shouldn't we ref/unref the tag?

@@ +965,3 @@
+	                               segment->context);
+
+	apply_classes (ce, classes, start_offset, end_offset);

nitpicking: it looks a bit confusing that you first get the tag and then the classes, but you apply the classes before the if (tag) branch... also, may be worth checking if classes is not null? (I know the function works with null but seems to do wasted workI

Comment 23 jessevdk@gmail.com 2009-12-31 12:51:12 UTC

Review of attachment 150326 [details] [review]:

Agree with everything, except the helper struct (see inline comments). New patch follows.

::: gtksourceview/gtksourcebuffer.c
@@ +1939,3 @@
+ * gtk_source_buffer_iter_has_context_class:
+/**
+

Fixed

@@ +1979,3 @@
+ * Since: 2.10
+ **/
+GSList *

Ok, done

::: gtksourceview/gtksourcecontextengine.c
@@ +546,2 @@
 /* TAGS AND STUFF -------------------------------------------------------------- */
+static ClassDefinition *

I think the definition is nice because it's not tied to the buffer. The tags are then created on demand. Also 'disable' is not a tag, but it means 'remove the corresponding tag', so setting enable/disable on a tag does not really make sense.

@@ +570,3 @@
+	ClassTag *attrtag = g_slice_new (ClassTag);
+
+	attrtag->tag = tag;

I believe these are freed when the context is freed (which is specific to the buffer). As far as I can see the same happens for the highlighting tags (they are not reffed).

@@ +965,3 @@
+
+	                               segment->context);
+	classes = get_context_classes (ce,

Indeed, fixed both

Comment 24 jessevdk@gmail.com 2009-12-31 12:53:55 UTC

Created attachment 150610 [details] [review]
Revision 2, code cleanup and small API change

Revision of previous patch after review

Comment 25 Paolo Borelli 2009-12-31 13:29:37 UTC

Review of attachment 150610 [details] [review]:

A couple of really nitpickish things pointed out inline, but apart from that this can go in for me.

If it's not too complicated it would be cool to see it rebased in three separate commits before pushing, but if it is a pain we can live without that:

 - make engine private
 - actual patch
 - lang file changes

::: gtksourceview/gtksourcebuffer.c
@@ +81,3 @@
 #define MAX_CHARS_BEFORE_FINDING_A_MATCH    10000
 
+#define TAG_CLASS_NAME "GtkSourceViewTagClassName"

maybe "context" here too?

@@ +2031,3 @@
+		if (name != NULL)
+		{
+			ret = g_realloc (ret, sizeof (gchar *) * ++num);

well, this works too, but when I suggested the array I was thinking of using GPtrArray internally, could be slightly cleaner

::: gtksourceview/gtksourcecontextengine.c
@@ +117,3 @@
 #define ENGINE_STYLES_MAP(ce) ((ce)->priv->ctx_data->lang->priv->styles)
 
+#define TAG_CLASS_NAME "GtkSourceViewTagClassName"

maybe context here too

Comment 26 Paolo Borelli 2009-12-31 15:49:57 UTC

I pushed jesse's branch on behalf of Jesse who had to run. Infrastructure is there and I am closing this bug. We will probably need to add furter classes as the need turns up, right now we have the following context classes "comment", "string" and "no-spell-check" (we decided to go for a negation on the spell check class so that the default behavior is the same as it is now)

It would be really great if all the interested parties start experimenting with this feature asap so that we still have time to tweak it if needed.

Happy 2010 everyone!

Comment 27 Johannes Schmid 2010-01-09 18:54:14 UTC

Would be great if someone could implement support for the new information in GtkSpell as this could fix bug #587619!

Anyway, thanks a lot, I will help to fix the other two long standing issues in anjuta depending on it. Actually, I already fixed them locally...