Bug 726108 – Improve Scanner Functions and add test cases.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 726108 - Improve Scanner Functions and add test cases.


Summary:	Improve Scanner Functions and add test cases.


Status:	RESOLVED FIXED

Product:	easytag
Classification:	Other
Component:	general
Version:	master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	2.1
Assigned To:	EasyTAG maintainer(s)
QA Contact:	EasyTAG maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-03-11 14:39 UTC by Abhinav
Modified:	2014-04-22 18:41 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
This patch improves Scanner functions and also add some test cases (17.59 KB, patch) 2014-03-11 14:41 UTC, Abhinav	needs-work	Details \| Review
Applied all modifications (17.86 KB, patch) 2014-03-11 18:07 UTC, Abhinav	needs-work	Details \| Review
Applied all modifications (18.16 KB, patch) 2014-03-11 19:54 UTC, Abhinav	committed	Details \| Review
Benchmark for p20 to space implementations (3.70 KB, text/x-csrc) 2014-04-07 10:03 UTC, Santtu Lakkala		Details

Description Abhinav 2014-03-11 14:39:07 UTC

Scanner Functions like Scan_Convert* should be improved. They are not compatible with current UTF-8 and are slow. Moreover, tests cases for them should also be added.

Comment 1 Abhinav 2014-03-11 14:41:53 UTC

Created attachment 271524 [details] [review]
This patch improves Scanner functions and also add some test cases

Comment 2 David King 2014-03-11 15:06:23 UTC

Review of attachment 271524 [details] [review]:

::: src/application_window.c
@@ +135,3 @@
 Convert_All_Uppercase (GtkWidget *entry)
 {
+    gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry)));

Do not put two declarations on a single line, especially if initialising one with the result of a function call.

@@ +139,1 @@
+    res = Scan_Process_Fields_All_Uppercase (string);

As this function already duplicates the string, what is the point of calling g_strdup() on the string from the GtkEntry?

@@ +146,3 @@
 Convert_All_Lowercase (GtkWidget *entry)
 {
+    gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry)));

Same comment as in Convert_All_Uppercase().

::: src/scan.c
@@ +69,2 @@
  * The function inserts a space before an uppercase letter
  * It is needed to realloc the memory!

Update comments if you change the implementation, otherwise the comment is incorrect.

@@ +79,3 @@
+	string1 = g_string_new ("");
+	g_string_append_c (string1, *string);
+	for (iter = g_utf8_next_char(string); *iter; iter = g_utf8_next_char(iter))

Leave a blank line before the for(). Inconsistent indentation.

@@ +93,3 @@
+    iter = string1->str;
+    g_string_free (string1, FALSE);
+    return iter;

Replace this with "return g_string_free (string1, FALSE);"

@@ +167,1 @@
     for (temp = string; *temp; temp = g_utf8_next_char(temp))

Leave a blank line before the for().

@@ +173,1 @@
         if (set_to_upper_case && g_unichar_islower(c))

Leave a blank line before the if().

@@ +202,3 @@
+    temp = string1->str;
+    g_string_free (string1, FALSE);
+    return temp;

You can replace this with "return g_string_free (string1, FALSE);"

::: src/scan.h
@@ +11,3 @@
 void Scan_Process_Fields_Remove_Space (gchar *string);
+gchar*
+Scan_Process_Fields_Insert_Space (gchar *string);

Do not reflow these lines, they should stay as before, all on one line.

::: tests/test-scan.c
@@ +16,3 @@
+scanTestUnderscoreToSpace (void)
+{
+    int total = 1;

This (and all other instance of iterating over arrays) should be a gsize.

@@ +19,3 @@
+    gchar *cases[1] = {" ်0STRING ်0_A_B"};
+    gchar *results[1] = {" ်0STRING ်0 A B"};
+    while (total)

Leave a blank line before the while() (and all other instances in this file).

Comment 3 Abhinav 2014-03-11 18:07:56 UTC

Created attachment 271543 [details] [review]
Applied all modifications

I also made those functions argument as "const gchar *", which don't change the string.

Comment 4 David King 2014-03-11 19:23:01 UTC

Review of attachment 271543 [details] [review]:

::: src/application_window.c
@@ +194,3 @@
     // FIX ME : we suppose that it will not grow more than 2 times its size...
     gsize string_length = 2 * strlen (gtk_entry_get_text (GTK_ENTRY (entry)));
+    gchar *res, *string = g_malloc (string_length + 1);

Put the new "res" declaration on a different line.

@@ +198,3 @@
     string[string_length] = '\0';
 
+    res = Scan_Process_Fields_Insert_Space (string);

If Scan_Process_Fields_Insert_Space() allocates a new string, why are you allocating a new string in this function?

::: src/scan.c
@@ +81,3 @@
+    GString *string1;
+
+	string1 = g_string_new ("");

Inconsistent indentation.

@@ +171,1 @@
     for (temp = string; *temp; temp = g_utf8_next_char(temp))

Add a space after g_utf_next_char.

@@ +175,2 @@
         c = g_utf8_get_char(temp);
+        l = g_unichar_to_utf8(c, temp2);

Add a space after g_unichar_to_utf8.

@@ +177,1 @@
         if (set_to_upper_case && g_unichar_islower(c))

Leave a blank line before the if().

@@ +194,1 @@
+        // Uppercase again the word 'I' in english

The comment needs to explain why this is necessary, after the transformation above. Also, use the C89-style for comments: /* */

::: src/scan_dialog.c
@@ +1141,3 @@
+    {
+        gchar *res;
+        res = Scan_Process_Fields_Insert_Space(*string);

Add a space between the function name and the opening bracket (and for all other cases in this file).

Comment 5 Abhinav 2014-03-11 19:54:55 UTC

Created attachment 271554 [details] [review]
Applied all modifications

I don't know the reason behind the following comment.
// Uppercase again the word 'I' in english

In previous code, it was done, so I did it here also. It may be possible because a single "I" is always written in capitals in English and that is what it do.

Comment 6 David King 2014-03-12 10:22:38 UTC

Comment on attachment 271554 [details] [review]
Applied all modifications

OK, this is good, and I made some minor changes and pushed it to the wip/application-window branch as a343ef90ec2b3957733ecb394cae29d4c460ab45. It might be possible to rebase the patch to master, with some work.

Comment 7 David King 2014-04-03 21:39:01 UTC

I split two scanner optimizations off and pushed them to master (and rebased the wip/application-window branch).

Comment 8 Santtu Lakkala 2014-04-07 10:03:25 UTC

Created attachment 273696 [details]
Benchmark for p20 to space implementations

I didn't like the look of Scan_Convert_P20_Into_Space, so I rewrote a couple of times, to see what kind of code would perform well.

Personally I like the _2 version most readable, and it (in my tests) performs as-well-or-better than the original when %20 appears 0 or 5+ times (which I think are quite tommon). The _1_1 version seems to be the most performant overall, but the code is a bit longer and somwehat more complex as well. Interestingly the more readable variant _1 performs similarily to _1_1 and _1_5 on low optimization levels (-O0), but gets a performance penalty from optimizations.

Anyway, this is pretty pointless optimization, but the current code just looked "wrong" =)

Comment 9 David King 2014-04-07 11:46:51 UTC

(In reply to comment #8)
> Personally I like the _2 version most readable, and it (in my tests) performs
> as-well-or-better than the original when %20 appears 0 or 5+ times (which I
> think are quite tommon).

I like _2 as well, primarily due to the readability.

> Anyway, this is pretty pointless optimization, but the current code just looked
> "wrong" =)

Agreed, but it was a bit too much to "decipher" in the original form. It is probably time that I rebased Abhinav's patch on wip/application-window to master. With some in-tree tests and benchmarks, it would be easier to experiment with both optimizations and readability improvements.

Comment 10 David King 2014-04-07 22:25:07 UTC

(In reply to comment #9)

I just pushed some patches to master which split out the scanner functions and apply Abhinav's fixes and tests that were on the wip/application-window branch. I also pushed some simple performance tests, and added a Makefile target to run the performance tests and generate a report. Execute "make full-report" and have a look at full-report.html to see the results.

If any optimizations or improvements are to be made to the scanner functions, they should be tested against the in-tree tests. The tests likely need a lot more sample data, but they are a good start.

Comment 11 David King 2014-04-10 19:21:55 UTC

I pushed the "_2" version to master, as it was close enough to the original performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu!

Comment 12 David King 2014-04-22 18:41:27 UTC

(In reply to comment #11)
> I pushed the "_2" version to master, as it was close enough to the original
> performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu!

I reverted that patch as a result of bug 728743.