After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 726108 - Improve Scanner Functions and add test cases.
Improve Scanner Functions and add test cases.
Status: RESOLVED FIXED
Product: easytag
Classification: Other
Component: general
master
Other All
: Normal normal
: 2.1
Assigned To: EasyTAG maintainer(s)
EasyTAG maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-03-11 14:39 UTC by Abhinav
Modified: 2014-04-22 18:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
This patch improves Scanner functions and also add some test cases (17.59 KB, patch)
2014-03-11 14:41 UTC, Abhinav
needs-work Details | Review
Applied all modifications (17.86 KB, patch)
2014-03-11 18:07 UTC, Abhinav
needs-work Details | Review
Applied all modifications (18.16 KB, patch)
2014-03-11 19:54 UTC, Abhinav
committed Details | Review
Benchmark for p20 to space implementations (3.70 KB, text/x-csrc)
2014-04-07 10:03 UTC, Santtu Lakkala
  Details

Description Abhinav 2014-03-11 14:39:07 UTC
Scanner Functions like Scan_Convert* should be improved. They are not compatible with current UTF-8 and are slow. Moreover, tests cases for them should also be added.
Comment 1 Abhinav 2014-03-11 14:41:53 UTC
Created attachment 271524 [details] [review]
This patch improves Scanner functions and also add some test cases
Comment 2 David King 2014-03-11 15:06:23 UTC
Review of attachment 271524 [details] [review]:

::: src/application_window.c
@@ +135,3 @@
 Convert_All_Uppercase (GtkWidget *entry)
 {
+    gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry)));

Do not put two declarations on a single line, especially if initialising one with the result of a function call.

@@ +139,1 @@
+    res = Scan_Process_Fields_All_Uppercase (string);

As this function already duplicates the string, what is the point of calling g_strdup() on the string from the GtkEntry?

@@ +146,3 @@
 Convert_All_Lowercase (GtkWidget *entry)
 {
+    gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry)));

Same comment as in Convert_All_Uppercase().

::: src/scan.c
@@ +69,2 @@
  * The function inserts a space before an uppercase letter
  * It is needed to realloc the memory!

Update comments if you change the implementation, otherwise the comment is incorrect.

@@ +79,3 @@
+	string1 = g_string_new ("");
+	g_string_append_c (string1, *string);
+	for (iter = g_utf8_next_char(string); *iter; iter = g_utf8_next_char(iter))

Leave a blank line before the for(). Inconsistent indentation.

@@ +93,3 @@
+    iter = string1->str;
+    g_string_free (string1, FALSE);
+    return iter;

Replace this with "return g_string_free (string1, FALSE);"

@@ +167,1 @@
     for (temp = string; *temp; temp = g_utf8_next_char(temp))

Leave a blank line before the for().

@@ +173,1 @@
         if (set_to_upper_case && g_unichar_islower(c))

Leave a blank line before the if().

@@ +202,3 @@
+    temp = string1->str;
+    g_string_free (string1, FALSE);
+    return temp;

You can replace this with "return g_string_free (string1, FALSE);"

::: src/scan.h
@@ +11,3 @@
 void Scan_Process_Fields_Remove_Space (gchar *string);
+gchar*
+Scan_Process_Fields_Insert_Space (gchar *string);

Do not reflow these lines, they should stay as before, all on one line.

::: tests/test-scan.c
@@ +16,3 @@
+scanTestUnderscoreToSpace (void)
+{
+    int total = 1;

This (and all other instance of iterating over arrays) should be a gsize.

@@ +19,3 @@
+    gchar *cases[1] = {" ်0STRING ်0_A_B"};
+    gchar *results[1] = {" ်0STRING ်0 A B"};
+    while (total)

Leave a blank line before the while() (and all other instances in this file).
Comment 3 Abhinav 2014-03-11 18:07:56 UTC
Created attachment 271543 [details] [review]
Applied all modifications

I also made those functions argument as "const gchar *", which don't change the string.
Comment 4 David King 2014-03-11 19:23:01 UTC
Review of attachment 271543 [details] [review]:

::: src/application_window.c
@@ +194,3 @@
     // FIX ME : we suppose that it will not grow more than 2 times its size...
     gsize string_length = 2 * strlen (gtk_entry_get_text (GTK_ENTRY (entry)));
+    gchar *res, *string = g_malloc (string_length + 1);

Put the new "res" declaration on a different line.

@@ +198,3 @@
     string[string_length] = '\0';
 
+    res = Scan_Process_Fields_Insert_Space (string);

If Scan_Process_Fields_Insert_Space() allocates a new string, why are you allocating a new string in this function?

::: src/scan.c
@@ +81,3 @@
+    GString *string1;
+
+	string1 = g_string_new ("");

Inconsistent indentation.

@@ +171,1 @@
     for (temp = string; *temp; temp = g_utf8_next_char(temp))

Add a space after g_utf_next_char.

@@ +175,2 @@
         c = g_utf8_get_char(temp);
+        l = g_unichar_to_utf8(c, temp2);

Add a space after g_unichar_to_utf8.

@@ +177,1 @@
         if (set_to_upper_case && g_unichar_islower(c))

Leave a blank line before the if().

@@ +194,1 @@
+        // Uppercase again the word 'I' in english

The comment needs to explain why this is necessary, after the transformation above. Also, use the C89-style for comments: /* */

::: src/scan_dialog.c
@@ +1141,3 @@
+    {
+        gchar *res;
+        res = Scan_Process_Fields_Insert_Space(*string);

Add a space between the function name and the opening bracket (and for all other cases in this file).
Comment 5 Abhinav 2014-03-11 19:54:55 UTC
Created attachment 271554 [details] [review]
Applied all modifications

I don't know the reason behind the following comment.
// Uppercase again the word 'I' in english

In previous code, it was done, so I did it here also. It may be possible because a single "I" is always written in capitals in English and that is what it do.
Comment 6 David King 2014-03-12 10:22:38 UTC
Comment on attachment 271554 [details] [review]
Applied all modifications

OK, this is good, and I made some minor changes and pushed it to the wip/application-window branch as a343ef90ec2b3957733ecb394cae29d4c460ab45. It might be possible to rebase the patch to master, with some work.
Comment 7 David King 2014-04-03 21:39:01 UTC
I split two scanner optimizations off and pushed them to master (and rebased the wip/application-window branch).
Comment 8 Santtu Lakkala 2014-04-07 10:03:25 UTC
Created attachment 273696 [details]
Benchmark for p20 to space implementations

I didn't like the look of Scan_Convert_P20_Into_Space, so I rewrote a couple of times, to see what kind of code would perform well.

Personally I like the _2 version most readable, and it (in my tests) performs as-well-or-better than the original when %20 appears 0 or 5+ times (which I think are quite tommon). The _1_1 version seems to be the most performant overall, but the code is a bit longer and somwehat more complex as well. Interestingly the more readable variant _1 performs similarily to _1_1 and _1_5 on low optimization levels (-O0), but gets a performance penalty from optimizations.

Anyway, this is pretty pointless optimization, but the current code just looked "wrong" =)
Comment 9 David King 2014-04-07 11:46:51 UTC
(In reply to comment #8)
> Personally I like the _2 version most readable, and it (in my tests) performs
> as-well-or-better than the original when %20 appears 0 or 5+ times (which I
> think are quite tommon).

I like _2 as well, primarily due to the readability.

> Anyway, this is pretty pointless optimization, but the current code just looked
> "wrong" =)

Agreed, but it was a bit too much to "decipher" in the original form. It is probably time that I rebased Abhinav's patch on wip/application-window to master. With some in-tree tests and benchmarks, it would be easier to experiment with both optimizations and readability improvements.
Comment 10 David King 2014-04-07 22:25:07 UTC
(In reply to comment #9)

I just pushed some patches to master which split out the scanner functions and apply Abhinav's fixes and tests that were on the wip/application-window branch. I also pushed some simple performance tests, and added a Makefile target to run the performance tests and generate a report. Execute "make full-report" and have a look at full-report.html to see the results.

If any optimizations or improvements are to be made to the scanner functions, they should be tested against the in-tree tests. The tests likely need a lot more sample data, but they are a good start.
Comment 11 David King 2014-04-10 19:21:55 UTC
I pushed the "_2" version to master, as it was close enough to the original performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu!
Comment 12 David King 2014-04-22 18:41:27 UTC
(In reply to comment #11)
> I pushed the "_2" version to master, as it was close enough to the original
> performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu!

I reverted that patch as a result of bug 728743.