After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 546060 - gtk_source_language_manager_get_language_for_file()
gtk_source_language_manager_get_language_for_file()
Status: RESOLVED FIXED
Product: gtksourceview
Classification: Platform
Component: General
git master
Other Mac OS
: Normal normal
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
: 516758 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-08-03 04:29 UTC by Yevgen Muntyan
Modified: 2008-08-06 00:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (9.70 KB, patch)
2008-08-03 04:30 UTC, Yevgen Muntyan
reviewed Details | Review
patch (14.10 KB, patch)
2008-08-04 03:13 UTC, Yevgen Muntyan
none Details | Review
final patch (22.43 KB, patch)
2008-08-06 00:58 UTC, Yevgen Muntyan
none Details | Review

Description Yevgen Muntyan 2008-08-03 04:29:53 UTC
See the patch.
Comment 1 Yevgen Muntyan 2008-08-03 04:30:32 UTC
Created attachment 115761 [details] [review]
patch
Comment 2 Yevgen Muntyan 2008-08-03 05:50:49 UTC
"Since:" is missing.
Comment 3 Yevgen Muntyan 2008-08-03 07:47:54 UTC
Also, need to remove all gnomevfs traces, since it was used for mime type detection.
Comment 4 Yevgen Muntyan 2008-08-03 07:48:03 UTC
*** Bug 516758 has been marked as a duplicate of this bug. ***
Comment 5 Paolo Borelli 2008-08-03 08:41:46 UTC
As discussed on irc, I 100% agree that it's time to get this API in sourceview.

The patch mostly looks good to me and also the detection logic seems ok... in gedit we relied a lot on mime types and in the end it just keeps biting our ass, so I think that favouring good old filenames is best... at least users have easy fixes: using a different filename or change the glob in the lang file.

That said I also wonder if we could try to be slightly smarter: the use case is "foo.xml" where we have a lang file for the specific xml dialect and content type for that dialect is properly matched.

Maybe the logic should be:
 - match mime
 - match glob
 - if just one matches use it
 - if they both match and the corresponding mime inside the lang files are one the anchestor of the other pick the more strict lang
 - if they both match and the corresponding mime inside the lang files are unrelated, pick the glob one


Now, on the specific of the patch:

in the docs:

+ * etc. Use gtk_source_language_get_mime_types() and gtk_source_language_get_globs()
+ * a better control over file -> language mapping.

 is missing a verb


about win32:

maybe we should use the content_type and match it agaist *.ext globs? (far from blocking and I am not sure if it is useful at all... it would be just for the case where there is no filename like unsaved files but you somehow managed to obtain a content type. I am not sure such case actually exists :)


about the API:
I am not 100% sure about "_for_file()" on one hand it's shorter than possible alternatives and clear enough, but on the other is pretty inaccurate, it doesn't take a GFile, it doesn't do the content type detection itself (I am not saying it should!) and the filename may even be null.
Comment 6 Paolo Maggi 2008-08-03 15:59:32 UTC
Comment on attachment 115761 [details] [review]
patch

I'm not sure I like the API, but if you guys already discussed it, and you like it, it is ok for me too.

May be we should simply have two functions

get_language_from_filename
get_language_from_content_type

so that people can use them as they prefer without imposing any given "discovery" logic.

It is true anyway that with the proposed API you can get the same effect.



<paolo> I think you should have something like
<paolo> g_return_val_if_fail ((filename == NULL || *filename  =='\0') && (content_type == NULL || *content_type == '\0'));
<paolo> I' not sure using g_filename_display_name is a good idea
<paolo> why aren't you using g_filename_to_utf8 ?


>+
>+	filename_utf8 = g_filename_display_name (filename);

I think you should use g_filename_to_utf8 here.

>+GtkSourceLanguage *
>+gtk_source_language_manager_get_language_for_file (GtkSourceLanguageManager *lm,
>+						   const gchar		    *filename,
>+						   const gchar		    *content_type)
>+{
>+	GtkSourceLanguage *lang = NULL;
>+
>+	g_return_val_if_fail (GTK_IS_SOURCE_LANGUAGE_MANAGER (lm), NULL);

Since calling the function without specifying neither a filename nor a content type is invalid, I think
we should add

  g_return_val_if_fail (filename != NULL || content_type != NULL, NULL);
Comment 7 Christian Dywan 2008-08-03 17:10:35 UTC
I would find such a function very useful. On first sight I thought there should really be a way to pass a MIME type instead of a content type. Then after a closer look at g_content_type_ functions I discovered g_content_type_guess, so I suppose that is meant as a bridge for the imaginary _get_language_for_mime_type case.
Comment 8 Christian Dywan 2008-08-03 17:12:45 UTC
And actually I second the doubt about the _for_file term. It really is misleading and suggests something that accepts a GFile.
Comment 9 Yevgen Muntyan 2008-08-04 03:13:43 UTC
Created attachment 115800 [details] [review]
patch

Note that nobody proposed a name for the function! I don't really like for_file(), but I couldn't make up anything else. The only alternative I can think of is for_filename_and_content_type() but it's super ugly.

Having two functions solves the naming problem, but it's complicated: a user should be able to simply say "get me language for this thing". A single function is simple, yet it's flexible enough: if one needs mime type only or filename only, he can use NULL arguments.

(In reply to comment #5)
...
> That said I also wonder if we could try to be slightly smarter: the use case is
> "foo.xml" where we have a lang file for the specific xml dialect and content
> type for that dialect is properly matched.
> 
> Maybe the logic should be:
>  - match mime
>  - match glob
>  - if just one matches use it
>  - if they both match and the corresponding mime inside the lang files are one
> the anchestor of the other pick the more strict lang
>  - if they both match and the corresponding mime inside the lang files are
> unrelated, pick the glob one

Made it a TODO. We need specific use cases to see if it's going to work.

> Now, on the specific of the patch:
> 
> in the docs:
> 
> + * etc. Use gtk_source_language_get_mime_types() and
> gtk_source_language_get_globs()
> + * a better control over file -> language mapping.
> 
>  is missing a verb

Done.

> about win32:
> 
> maybe we should use the content_type and match it agaist *.ext globs? (far from
> blocking and I am not sure if it is useful at all... it would be just for the
> case where there is no filename like unsaved files but you somehow managed to
> obtain a content type. I am not sure such case actually exists :)

I made it try to use content_type on windows, and removed the note from docs.

...

(In reply to comment #6)
> (From update of attachment 115761 [details] [review] [edit])
...
> May be we should simply have two functions
> 
> get_language_from_filename
> get_language_from_content_type
> 
> so that people can use them as they prefer without imposing any given
> "discovery" logic.

I do want to impose discovery logic. It's nice - a user doesn't have to invent that logic himself. And the end user gets same highlighting in gedit and meld.

> It is true anyway that with the proposed API you can get the same effect.

And in a single function we can do fancy things like pbor suggested.

> <paolo> I think you should have something like
> <paolo> g_return_val_if_fail ((filename == NULL || *filename  =='\0') &&
> (content_type == NULL || *content_type == '\0'));

OK.

> <paolo> I' not sure using g_filename_display_name is a good idea
> <paolo> why aren't you using g_filename_to_utf8 ?

g_filename_display_name() does the right thing - it replaces invalid characters with fancy ? unicode character. So if user has a filename in an encoding different from "glib filename encoding" (e.g. KOI-8 encoding on disk, UTF-8 locale), we won't fail.

...
> Since calling the function without specifying neither a filename nor a content
> type is invalid, I think
> we should add
> 
>   g_return_val_if_fail (filename != NULL || content_type != NULL, NULL);

OK.

(In reply to comment #7)
> I would find such a function very useful. On first sight I thought there should
> really be a way to pass a MIME type instead of a content type. Then after a
> closer look at g_content_type_ functions I discovered g_content_type_guess, so
> I suppose that is meant as a bridge for the imaginary
> _get_language_for_mime_type case.

Right. GIO pretends we are awesome cross-platform people, so we pretend too. Yet a user can call g_content_type_guess() and be happy, or ignore GIO altogether and use "mime types" and be happy too (I saw a end-user GUI mac program which parsed /etc/apache2/somecrap to get mime type!).
Comment 10 Paolo Borelli 2008-08-04 07:06:47 UTC
> Note that nobody proposed a name for the function!

"g_content_type_guess" gave me an idea: what about simply calling it _guess_language()? 


I agree with Yevgen that it really should be a single function, api users really just want an api that does "the right thing" without making them come up with an algorithm that reimplements the same thing over and over. If someone wants to have custom logic he can always do what gedit does now and not use the function at all. The only flexibility we may need to add in the future (but it's far from blocking is a way to let the user tell how to disambiguate if more that one language matches the glob), eg allowing with a _full() var of the function that takes a callback or by allowing the user to associate "priority points" to each language.
Comment 11 Yevgen Muntyan 2008-08-06 00:58:10 UTC
We discussed this with pbor on IRC, and so gtk_source_language_manager_guess_language() was added.
Comment 12 Yevgen Muntyan 2008-08-06 00:58:49 UTC
Created attachment 115949 [details] [review]
final patch

Here's what was committed, for reference.