After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 588233 - Import: Should be separate class with distcheck tests
Import: Should be separate class with distcheck tests
Status: RESOLVED FIXED
Product: glom
Classification: Other
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Murray Cumming
Murray Cumming
Depends on:
Blocks:
 
 
Reported: 2009-07-10 10:43 UTC by Murray Cumming
Modified: 2010-01-31 13:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Refactoring Glom::Dialog_Import_CSV, part 1 (3.80 KB, patch)
2009-08-28 18:21 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV, part 2 (37.04 KB, patch)
2009-08-28 18:24 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV, part 3 (12.94 KB, patch)
2009-09-03 16:01 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV, part 4 (9.11 KB, patch)
2009-09-03 16:02 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV, part 5 (2.57 KB, patch)
2009-09-03 16:02 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV, part 6 (ChangeLog update only) (1.98 KB, patch)
2009-09-03 16:02 UTC, Michael Hasselmann
none Details | Review
Refactoring Glom::Dialog_Import_CSV (complete patch set for convenience) (67.45 KB, patch)
2009-09-03 16:03 UTC, Michael Hasselmann
none Details | Review
A properly rebased patch (47.26 KB, patch)
2009-09-04 10:29 UTC, Michael Hasselmann
none Details | Review

Description Murray Cumming 2009-07-10 10:43:52 UTC
There is code in dialog_import_csv.cc to identify rows and and fields in .csv files. This should really be in a separate non-UI class that's used by that dialog class. And there should be some unit tests to check that it works on some sample data. Those unit tests should run during distcheck.
Comment 1 Michael Hasselmann 2009-08-28 18:21:49 UTC
Created attachment 141950 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 1
Comment 2 Michael Hasselmann 2009-08-28 18:24:20 UTC
Created attachment 141951 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 2

At this stage, most of the parser code now lives in Glom::CsvParser. It compiles but I had to deactive the tree model creation/insertions (which of course renders it useless). I need to think a bit more about this.
Comment 3 Michael Hasselmann 2009-09-03 16:01:41 UTC
Created attachment 142418 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 3
Comment 4 Michael Hasselmann 2009-09-03 16:02:01 UTC
Created attachment 142419 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 4
Comment 5 Michael Hasselmann 2009-09-03 16:02:23 UTC
Created attachment 142420 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 5
Comment 6 Michael Hasselmann 2009-09-03 16:02:57 UTC
Created attachment 142421 [details] [review]
Refactoring Glom::Dialog_Import_CSV, part 6 (ChangeLog update only)
Comment 7 Michael Hasselmann 2009-09-03 16:03:49 UTC
Created attachment 142422 [details] [review]
Refactoring Glom::Dialog_Import_CSV (complete patch set for convenience)
Comment 8 Michael Hasselmann 2009-09-03 16:11:14 UTC
Regarding the decoupling of parsing and dialog components in Glom::Dialog_Import_CSV the refactoring so far was only partially successful. More signals might be the appropiate answer, since the critical code itself proved to be quite resistant towards refactoring. The next step however is to cut down on the numbers of member variables, then have some tests for the parser (more accurate: scanner).
Comment 9 Murray Cumming 2009-09-03 19:39:42 UTC
(In reply to comment #6)
> Created an attachment (id=142421) [details]
> Refactoring Glom::Dialog_Import_CSV, part 6 (ChangeLog update only)

This single ChangeLog entry mentions the same file(s) repeatedly. If it's meant to be several ChangeLog entries then please write it as several entries. If not (if you want to push several smaller commits) then please write one ChangeLog entry correctly.
Comment 10 Murray Cumming 2009-09-03 19:48:20 UTC
(In reply to comment #7)
> Created an attachment (id=142422) [details]
> Refactoring Glom::Dialog_Import_CSV (complete patch set for convenience)

Some initial superficial comments:

+  Parser(const char* encoding)
+  : conv("UTF-8",
+    encoding),
+    input_position(0),
+    line_number(0)
+  {

Please 
1. initialize member variables in the .cc file, not the .h file,
2. Use m_ for member variables,
3. Put method implementations in the .cc file, not the .h file.
as elsewhere in glom.
Oh, you did most of that already - see the comment below about this weird multi-commit patch.


+// The CsvParser will parse a CSV file by calling token handlers whenver the scanner recognizes a token.
+class CsvParser

Please use Doxygens /// or /** */ comment syntax.


 m_parser.reset(NULL);
+  //m_parser.reset(NULL);

Maybe this is old code, but you should avoid NULL in C++ code.



I think it would be clearer for me to see one patch rather than several commits in one "patch". The intermediate stuff doesnt seem that useful to me, and slows down my review.

> The CsvParser class learned a new signal, SignalLineScanned

Maybe you mean "gained" instead of "learned".


+/**  Parse a row from a .cvs file.

You mean a .csv file.
Comment 11 Michael Hasselmann 2009-09-04 10:29:10 UTC
Created attachment 142462 [details] [review]
A properly rebased patch

The obvious thing that still needs fixing is to make the parser handle encoding changes from the UI. Apart from that, the import dialog should have the same functionality as before.
Comment 12 Murray Cumming 2009-09-04 12:50:14 UTC
+2009-09-04  Michael Hasselmann <michaelh@openismus.com>
+
+	* Makefile_glom.am, glom/dialog_import_csv.[h|cc], glom/import_csv.[h|cc],
+	glom/dialog_import_csv_progress.cc: Moved the inlined Parser class (in
+	Glom::Dialog_Import_CSV) to glom/import_csv.h. The goal is to split this
+	class up in an import model and import dialog.  The parsing code (more
+	accurate: the scanning code) now lives Glom::CsvParser now, which still has
+	an all-public interface until the requirements for its API are clear.
+	Building the tree model from the scanned tokens (the actual parsing) is
+	still a responsibility of the Glom::Dialog_Import_CSV class because this
+	part is too dependent on the import dialog. However, it now gets its tokens
+	by handling the Glom::CsvParser's new lineScanned signal.  The
+	Glom::Dialog_Import_CSV_Progress class should propably not depend on the
+	parser's state but be connected to some signals instead.

You've just mashed your commit messages again instead of talking about the actual change.
1. You talk about a goal, but you actually did something, so say what you did.
2. You should talk about the files (well, the classes) individually.
3. Also, the mention of the Glom:: namespace is superfluous.


CsvParser still has implementation (initialization and method implementations) in the .h file.

Otherwise it looks fine. Please correct and commit.
Comment 13 Murray Cumming 2009-09-10 15:34:32 UTC
The changes are committed, from Michael's github remote branch. Now we just need the tests.
Comment 14 Michael Hasselmann 2009-09-23 11:24:06 UTC
This bug now has its own feature branch: http://git.gnome.org/cgit/glom/log/?h=import_csv_refactored

The logic for CsvParser::signal_line_scanned() changed with the latest commits: Instead of letting the dialog tokenize the lines it gets from the parser, putting the tokens in its tree model, we now only signal that we scanned and tokenized (that is, a more sophisticated scanning than just finding lines/rows) a line. Thus, the signal will only be useful for the progress bar.

The dialog is supposed to grab the data after a full parse run it seems (which is a change of how it worked before, line-wise). Of course, returning a vector of parsed rows wouldn't make sense here, so we offer an API for the parser's internal model (basically, a (m,n) table): CsvParser::get_data(...).

I hope I got that change of application logic right? If so, I'll move forward to make it work again with the dialog (it currently doesn't import anything).
Comment 15 Murray Cumming 2009-09-23 11:43:41 UTC
The dialog should continue to read progressively - one row at a time. I think it can (and does) still do that because it is signalled by the CsvParser for each line. As I said, this is untested.

As we discussed elsewhere, it would be nice if the CsvParser did not need to store the entire data, but that can be a later optimization.

To make the tests usefule again you might want to add a save_to_temp_file_and_parse() utility function to your test code (not to the parser).
Comment 16 Michael Hasselmann 2009-09-24 10:24:54 UTC
Looking at the dialog's get_row_count() (http://git.gnome.org/cgit/glom/tree/glom/import_csv/dialog_import_csv.cc?h=import_csv_refactored#n212) I think the API exposed a race condition there (now and before, too). Imagine a big file, and the async read hasn't finished yet before the user chooses to import. Then, theoretically speaken, the row count would still be too small, and a lot of the dialog's logic is based on the exact row count.

Well, to be honest, with the new API we aim for, I do not think the dialog needs to know the exact row count. Perhaps a parsing_finished signal would be more helpful.

Also, the only remaining client of that method is the progess bar. But we could replace that by a simple heuristic: "size of file / avg(size of first n rows)". We need the first n rows for the preview feature anyway. The progress bar would then show a % value, instead of trying to be exact with the max row count (which it can't).

The only way to get an exact row count (mind the newlines issues with CSV data) I can imagine is to parse the file twice (surely we want to avoid that?).

(On another note: the save_to_temp_file_and_parse() utility function is a good idea.)
Comment 17 Murray Cumming 2009-09-24 12:04:24 UTC
That sounds sensible. Please do try to do that, though after you've fixed the tests.
Comment 18 Murray Cumming 2010-01-31 13:10:03 UTC
I think this is dealt with, apart from the hang in the tests mentioned in the latest bug.