GNOME Bugzilla – Bug 726108
Improve Scanner Functions and add test cases.
Last modified: 2014-04-22 18:41:27 UTC
Scanner Functions like Scan_Convert* should be improved. They are not compatible with current UTF-8 and are slow. Moreover, tests cases for them should also be added.
Created attachment 271524 [details] [review] This patch improves Scanner functions and also add some test cases
Review of attachment 271524 [details] [review]: ::: src/application_window.c @@ +135,3 @@ Convert_All_Uppercase (GtkWidget *entry) { + gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry))); Do not put two declarations on a single line, especially if initialising one with the result of a function call. @@ +139,1 @@ + res = Scan_Process_Fields_All_Uppercase (string); As this function already duplicates the string, what is the point of calling g_strdup() on the string from the GtkEntry? @@ +146,3 @@ Convert_All_Lowercase (GtkWidget *entry) { + gchar *res, *string = g_strdup (gtk_entry_get_text (GTK_ENTRY (entry))); Same comment as in Convert_All_Uppercase(). ::: src/scan.c @@ +69,2 @@ * The function inserts a space before an uppercase letter * It is needed to realloc the memory! Update comments if you change the implementation, otherwise the comment is incorrect. @@ +79,3 @@ + string1 = g_string_new (""); + g_string_append_c (string1, *string); + for (iter = g_utf8_next_char(string); *iter; iter = g_utf8_next_char(iter)) Leave a blank line before the for(). Inconsistent indentation. @@ +93,3 @@ + iter = string1->str; + g_string_free (string1, FALSE); + return iter; Replace this with "return g_string_free (string1, FALSE);" @@ +167,1 @@ for (temp = string; *temp; temp = g_utf8_next_char(temp)) Leave a blank line before the for(). @@ +173,1 @@ if (set_to_upper_case && g_unichar_islower(c)) Leave a blank line before the if(). @@ +202,3 @@ + temp = string1->str; + g_string_free (string1, FALSE); + return temp; You can replace this with "return g_string_free (string1, FALSE);" ::: src/scan.h @@ +11,3 @@ void Scan_Process_Fields_Remove_Space (gchar *string); +gchar* +Scan_Process_Fields_Insert_Space (gchar *string); Do not reflow these lines, they should stay as before, all on one line. ::: tests/test-scan.c @@ +16,3 @@ +scanTestUnderscoreToSpace (void) +{ + int total = 1; This (and all other instance of iterating over arrays) should be a gsize. @@ +19,3 @@ + gchar *cases[1] = {" ်0STRING ်0_A_B"}; + gchar *results[1] = {" ်0STRING ်0 A B"}; + while (total) Leave a blank line before the while() (and all other instances in this file).
Created attachment 271543 [details] [review] Applied all modifications I also made those functions argument as "const gchar *", which don't change the string.
Review of attachment 271543 [details] [review]: ::: src/application_window.c @@ +194,3 @@ // FIX ME : we suppose that it will not grow more than 2 times its size... gsize string_length = 2 * strlen (gtk_entry_get_text (GTK_ENTRY (entry))); + gchar *res, *string = g_malloc (string_length + 1); Put the new "res" declaration on a different line. @@ +198,3 @@ string[string_length] = '\0'; + res = Scan_Process_Fields_Insert_Space (string); If Scan_Process_Fields_Insert_Space() allocates a new string, why are you allocating a new string in this function? ::: src/scan.c @@ +81,3 @@ + GString *string1; + + string1 = g_string_new (""); Inconsistent indentation. @@ +171,1 @@ for (temp = string; *temp; temp = g_utf8_next_char(temp)) Add a space after g_utf_next_char. @@ +175,2 @@ c = g_utf8_get_char(temp); + l = g_unichar_to_utf8(c, temp2); Add a space after g_unichar_to_utf8. @@ +177,1 @@ if (set_to_upper_case && g_unichar_islower(c)) Leave a blank line before the if(). @@ +194,1 @@ + // Uppercase again the word 'I' in english The comment needs to explain why this is necessary, after the transformation above. Also, use the C89-style for comments: /* */ ::: src/scan_dialog.c @@ +1141,3 @@ + { + gchar *res; + res = Scan_Process_Fields_Insert_Space(*string); Add a space between the function name and the opening bracket (and for all other cases in this file).
Created attachment 271554 [details] [review] Applied all modifications I don't know the reason behind the following comment. // Uppercase again the word 'I' in english In previous code, it was done, so I did it here also. It may be possible because a single "I" is always written in capitals in English and that is what it do.
Comment on attachment 271554 [details] [review] Applied all modifications OK, this is good, and I made some minor changes and pushed it to the wip/application-window branch as a343ef90ec2b3957733ecb394cae29d4c460ab45. It might be possible to rebase the patch to master, with some work.
I split two scanner optimizations off and pushed them to master (and rebased the wip/application-window branch).
Created attachment 273696 [details] Benchmark for p20 to space implementations I didn't like the look of Scan_Convert_P20_Into_Space, so I rewrote a couple of times, to see what kind of code would perform well. Personally I like the _2 version most readable, and it (in my tests) performs as-well-or-better than the original when %20 appears 0 or 5+ times (which I think are quite tommon). The _1_1 version seems to be the most performant overall, but the code is a bit longer and somwehat more complex as well. Interestingly the more readable variant _1 performs similarily to _1_1 and _1_5 on low optimization levels (-O0), but gets a performance penalty from optimizations. Anyway, this is pretty pointless optimization, but the current code just looked "wrong" =)
(In reply to comment #8) > Personally I like the _2 version most readable, and it (in my tests) performs > as-well-or-better than the original when %20 appears 0 or 5+ times (which I > think are quite tommon). I like _2 as well, primarily due to the readability. > Anyway, this is pretty pointless optimization, but the current code just looked > "wrong" =) Agreed, but it was a bit too much to "decipher" in the original form. It is probably time that I rebased Abhinav's patch on wip/application-window to master. With some in-tree tests and benchmarks, it would be easier to experiment with both optimizations and readability improvements.
(In reply to comment #9) I just pushed some patches to master which split out the scanner functions and apply Abhinav's fixes and tests that were on the wip/application-window branch. I also pushed some simple performance tests, and added a Makefile target to run the performance tests and generate a report. Execute "make full-report" and have a look at full-report.html to see the results. If any optimizations or improvements are to be made to the scanner functions, they should be tested against the in-tree tests. The tests likely need a lot more sample data, but they are a good start.
I pushed the "_2" version to master, as it was close enough to the original performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu!
(In reply to comment #11) > I pushed the "_2" version to master, as it was close enough to the original > performance in benchmarks, and a lot more readable. Thanks Abhinav and Santtu! I reverted that patch as a result of bug 728743.