GNOME Bugzilla – Bug 615858
Improve reading OASIS files
Last modified: 2010-04-15 16:50:54 UTC
As per bug #615765, the contents of the OASIS files are currently read in the following way: * Fork & spawn into a odt2txt process and wait for it to finish * Get the whole stdout of the child process, whatever big it is, in a string allocated in heap. * Normalize the contents of the whole string and limit it up to the max number of configured words. Currently, this can be improved in the following way: * Fork & spawn into a odt2txt process, without waiting for the child to finish * Buffered read the stdout of the child process, up to a max number of bytes predefined. * In each buffered read, perform the normalization, and count the number of normalized words * Stop the buffered read when either: a) No more contents to read from stdout b) Reached maximum number of bytes to read (1MByte for example) c) Reached maximum number of words to read (from conf)
Created attachment 158821 [details] [review] Improved OASIS extractor
Created attachment 158825 [details] [review] Updated patch Reindented and added non-hardcoded max bytes to read, as: 3 * max_words * max_word_length
Reviewed, please push to master
Pushed.