After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 420618 - Want to add threading to decode in pan2
Want to add threading to decode in pan2
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
unspecified
Other All
: Normal enhancement
: 1.0
Assigned To: Charles Kerr
Pan QA Team
: 353317 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-03-20 17:28 UTC by Calin Culianu
Modified: 2007-03-29 07:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
This patch adds a decoder thread to pan (79.50 KB, patch)
2007-03-20 17:29 UTC, Calin Culianu
none Details | Review
crashreport from OSX/PPC (18.00 KB, text/plain)
2007-03-21 18:11 UTC, SciFi
  Details
Latest version of my decoder thread patch -- apply against svn rev 205 (80.15 KB, patch)
2007-03-21 23:25 UTC, Calin Culianu
none Details | Review
Latest version of my decoder thread patch -- apply against svn rev 205 (80.09 KB, patch)
2007-03-22 04:41 UTC, Calin Culianu
none Details | Review
Latest version of my decoder thread patch -- apply against svn rev 205 (80.89 KB, patch)
2007-03-22 19:51 UTC, Calin Culianu
none Details | Review
Latest version of my decoder thread patch -- apply against svn rev 205 (83.34 KB, patch)
2007-03-24 07:05 UTC, Calin Culianu
none Details | Review
Latest version of my decoder thread patch -- apply against svn rev 205 (83.22 KB, patch)
2007-03-24 23:50 UTC, Calin Culianu
none Details | Review
Revised version of decoder thread patch (37.46 KB, patch)
2007-03-25 02:48 UTC, Charles Kerr
none Details | Review
Revised version of decoder thread patch - - apply against svn rev 205 (62.59 KB, patch)
2007-03-25 03:02 UTC, Charles Kerr
none Details | Review
Revised version of decoder thread patch - - apply against svn rev 206 (65.72 KB, patch)
2007-03-25 17:30 UTC, Charles Kerr
none Details | Review
Revised version of decoder thread patch - - apply against svn rev 206 (41.97 KB, patch)
2007-03-25 22:53 UTC, Charles Kerr
none Details | Review
Minor revision to fix "Queued 98%" message. Also, remember to svn add decoder/locking/workerpool before running the diff. (67.44 KB, patch)
2007-03-26 02:50 UTC, Charles Kerr
none Details | Review
Modified Charles' latest revision to add some UI stuff, fix potential bugs, etc (70.04 KB, patch)
2007-03-26 08:21 UTC, Calin Culianu
none Details | Review
Modified Charles' latest revision to add some UI stuff, fix potential bugs, etc (70.06 KB, patch)
2007-03-26 08:44 UTC, Calin Culianu
none Details | Review
Oops! Two more bugs fixed -- apply against svn 206 (71.40 KB, patch)
2007-03-26 09:31 UTC, Calin Culianu
none Details | Review
One more tweak -- fixed task pane window title counts (71.67 KB, patch)
2007-03-26 09:47 UTC, Calin Culianu
none Details | Review
another minor revision (71.91 KB, patch)
2007-03-26 20:49 UTC, Charles Kerr
committed Details | Review

Description Calin Culianu 2007-03-20 17:28:19 UTC
I want decode phase of pan2 when saving attachements to run in another thread to maximize network utilization and minimize pan's hanging effect on slow disks/large attachements.
Comment 1 Calin Culianu 2007-03-20 17:29:59 UTC
Created attachment 84990 [details] [review]
This patch adds a decoder thread to pan

Please use this patch against PAN2 svn v 205 (latest as of 3/20/2007).
 
It should apply cleanly and work on all possible platforms that pan2 builds on.

I tested it and am very sure it's stable and works well.
Comment 2 devchan1 2007-03-21 01:59:30 UTC
hello, 
I have compile pan2 rev 205 from svn, with your patch applied.
The compilation succeeded, but pan2 crashes on startup. Below is a bt.
The machine is an intel macpro, gcc-4.0.1, and OSX tiger.

(gdb) bt full
  • #0 std::_Rb_tree_decrement
  • #1 std::_Rb_tree<pan::WorkerPool::Worker*, pan::WorkerPool::Worker*, std::_Identity<pan::WorkerPool::Worker*>, std::less<pan::WorkerPool::Worker*>, std::allocator<pan::WorkerPool::Worker*> >::insert_unique
    at /usr/include/c++/4.0.0/bits/stl_tree.h line 196
  • #2 pan::WorkerPool::Worker::Worker
    at /usr/include/c++/4.0.0/bits/stl_set.h line 314
  • #3 pan::TaskArticle::Decoder::Decoder
    at task-article-decoder.cc line 86
  • #4 __static_initialization_and_destruction_0
    at task-article-decoder.cc line 43
  • #5 __dyld__ZN16ImageLoaderMachO16doInitializationERKN11ImageLoader11LinkContextE
  • #6 __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextE
  • #7 __dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextE
  • #8 __dyld__ZN4dyld24initializeMainExecutableEv
  • #9 _start
  • #10 start
(gdb) 

dc.
Comment 3 Rinaldi J. Montessi 2007-03-21 02:11:32 UTC
I can verify the patch applied cleanly after slight path modifications to fit
my setup.  Pan compiled, installed, and ran with no glitches.  

As to the performance enhancements, I'll leave that to someone who uses it for
large binaries.  I did download and decode some rar binaries in the 8-10 kb
range and saw the new thread open during the decode process.  

Appears to work as advertised.

I see someone just posted with a problem so I'll provide further information.

gcc (GCC) 3.4.6

Linux Senior 2.6.15 #2 PREEMPT Tue Feb 14 15:09:44 EST 2006 i686 prescott i386 GNU/Linux

Checked out revision 205.
SVN checkout finished Tue Mar 20 18:26:48 EDT 2007
Comment 4 SciFi 2007-03-21 18:11:05 UTC
Created attachment 85062 [details]
crashreport from OSX/PPC
Comment 5 SciFi 2007-03-21 18:13:42 UTC
(In reply to comment #4)
> Created an attachment (id=85062) [edit]
> crashreport from OSX/PPC

Hi,

I see an x86/OSX user has suffered a crash similar to mine on PPC/Dual-G5, so I will go ahead with using this bugreport to file my crashreport and attach it here.

Now that we see similar crashes no matter the CPU but on the same o.s. build, I'm beginning to suspect a pattern with Apple's pthread support and/or something not kosher with the calls being used.

(Yes I briefly reported this on the pan.devel list, but didn't know if posting a crashreport would be proper there.  btw for the record, yes I used the patch found here instead of the patch posted to the list.  On the list you mentioned wanting the binary executables to be filed, too, but hey our PPC code can't possibly be any good for you on x86 platforms. ;) )

Thanks for any help.

Comment 6 Charles Kerr 2007-03-21 20:25:15 UTC
*** Bug 353317 has been marked as a duplicate of this bug. ***
Comment 7 Avi Drissman 2007-03-21 21:14:10 UTC
The crash report:

3   pan::WorkerPool::Worker::Worker[not-in-charge]() + 76
4   pan::TaskArticle::Decoder::Decoder[in-charge]() + 64
5   _static_initialization_and_destruction_0(int, int) + 104

indicates that during static initialization something's dying. I don't have the code right now, but what's in those constructors, and who's unhappy?
Comment 8 Calin Culianu 2007-03-21 22:22:23 UTC
All,

You have been extremely helpful with these bug reports!  Thank you immensely!!

Avi: I think you are right in that a static initialization of the "TaskArticle::Decoder" singleton is the culprit.  I actually went to great lengths in other parts of the code to avoid statically initialize objects because of the problems they may create due to possible dependencies on initialization order for proper program function -- I forgot to do so with this object and with at least one other..

I am going to make all static singleton type objects rely on a pointer and a getInstance() style method rather than static initialization.  That shuold solve this crash bug.


Thanks!  I will post an updated patch today hopefully.

-Calin


Comment 9 Calin Culianu 2007-03-21 23:25:33 UTC
Created attachment 85085 [details] [review]
Latest version of my decoder thread patch -- apply against svn rev 205

I created a new patch against svn rev 205 (latest as of this writing) that should fix the OSX crash bug (which was a general issue due to global variables -- just happened to work right on Linux out of freak chance -- I apologize for the avoidable bug).  

Since I don't have access to an OSX machine, it would be greatly appreciated by me if you OSX bug reporters could try this patch again on your boxes?

Thanks!!



-Calin
Comment 10 Calin Culianu 2007-03-21 23:27:13 UTC
Comment on attachment 84990 [details] [review]
This patch adds a decoder thread to pan

Don't use this, use attachment #85085 [details] instead! ;)
Comment 11 devchan1 2007-03-22 00:28:12 UTC
Hello,
I applied patch #85085, to svn rv 205, sucessfully.
But during comilation, I received, the following link error:

g++  -gfull -mtune=nocona -O2  -L/sw/lib -bind_at_load -o nzb-test nzb-test.o ./libtasks.a ../data/libdata.a ../usenet-utils/libusenetutils.a ../general/libgeneralutils.a ../../uulib/libuu.a -L/sw/gtk2//lib -L/sw/lib -lgmime-2.0 -lz -lgobject-2.0 -lglib-2.0 -lintl -liconv   -L/sw/gtk2//lib -lgobject-2.0 -lgmodule-2.0 -lgthread-2.0 -lglib-2.0 -lintl -liconv   -L/sw/lib -lpcre   
/usr/bin/ld: Undefined symbols:
pan::TaskArticle::Decoder::enqueue_work_in_thread(pan::TaskArticle*, void*, pan::Quark const&, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, pan::TaskArticle::SaveMode const&)
pan::TaskArticle::Decoder::check_in(pan::TaskArticle::Decoder*)
pan::TaskArticle::Decoder::check_out()
collect2: ld returned 1 exit status
make[3]: *** [nzb-test] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
dave:[~/src/pan2/]$

The machine is intel-macpro,gcc-4.0.1, and OS X tiger.

dc.
Comment 12 Calin Culianu 2007-03-22 01:56:33 UTC
Hmm.  Sounds like either the build failed to link-in task-article-decoder.o.

Are you sure you applied the patch to a freshly-checked-out version of the source tree?  Did the patch apply cleanly?

There is no reason that those symbols should be undefined because they weren't touched at all between the two patches.  If you diff the patches themselves you can see that they haven't changed one iota.

Also be sure to re-run autogen.sh if you applied the patch to a tree that was not svn-fresh or that you previously had run ./configure in.  

I just reapplied the patched, rebuilt pan, and I definitely don't get any link errors (granted I am on Linu but this problem would not be OSX-specific).

Can you either do a make clean and run ./autogen.sh and/or re-apply the patch to a fresh svn tree?  


Comment 13 SciFi 2007-03-22 02:54:57 UTC
(In reply to comment #12)
> Can you either do a make clean and run ./autogen.sh and/or re-apply the patch
> to a fresh svn tree?  

Try a make distclean first.  ;)

btw I still can't run ./autogen.sh to make a proper build environment since I have upgraded to current GNU tools on my OSX box.  It'd really be helpful to produce another beta tarball rollout, -or- get svn updated to use those current GNU tools as well.  So I'm continuing to fake it by modifying corresponding Makefile.in's by hand based on any changed *.am files (yes I know how to do those tricks, but only enough to be dangerous ;) ).

Really, we need another beta tarball to be safest for these tests.

Comment 14 Calin Culianu 2007-03-22 04:41:35 UTC
Created attachment 85094 [details] [review]
Latest version of my decoder thread patch -- apply against svn rev 205

Tweaked my patch slightly so that the task UI pops "Queued for Decode" tasks off the top of the task list rather than at random spots in the task list (well not random -- but it gave priority to Task * objects with a lower location in memory due to my using std::set and not an AdaptableSet which has different ordering rules..).. Otherwise this patch is identical to attachment #85085 [details]
Comment 15 devchan1 2007-03-22 16:47:53 UTC
Hello,
for some reason the os x users always have the most bugs! 
I have applied patch #85094 [pan2_svn_205_add_decoder_thread_v4.diff], to svn rev 205, the patch applied cleanly, the compilation succeeded, and pan2 ran up to a point.

As I was retrieving headers from a binary group, pan2 crashed.
the machine is and intel macpro, gcc-4.0.1, and os x tiger.
CXXFLAGS="-gfull -mtune=nocona -O2"

Below is a bt:

** (pan:4392): CRITICAL **: static void PanTreeStore::sortable_set_sort_column_id(GtkTreeSortable*, gint, GtkSortType): assertion `tree->sort_info->count(sort_column_id) != 0' failed

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00000000 in ?? ()
(gdb)

(gdb) bt
  • #0 ??
  • #1 std::__introsort_loop<__gnu_cxx::__normal_iterator<PanTreeStore::Row**, std::vector<PanTreeStore::Row*, std::allocator<PanTreeStore::Row*> > >, int, PanTreeStore::RowCompareByColumn>
    at pan-tree.cc line 855
  • #2 PanTreeStore::insert_sorted
    at /usr/include/c++/4.0.0/bits/stl_algo.h line 2606
  • #3 PanTreeStore::insert_sorted
    at pan-tree.cc line 827
  • #4 pan::HeaderPane::on_tree_change
    at header-pane.cc line 679
  • #5 pan::Data::ArticleTree::fire_diffs
    at ../../pan/data/data.h line 447
  • #6 pan::DataImpl::MyTree::add_articles
    at my-tree.cc line 535
  • #7 pan::DataImpl::MyTree::apply_filter
    at my-tree.cc line 236
  • #8 pan::DataImpl::MyTree::add_articles
    at my-tree.cc line 352
  • #9 pan::DataImpl::on_articles_added
    at headers.cc line 1123
  • #10 pan::DataImpl::xover_add
    at xover.cc line 326
  • #11 pan::TaskXOver::on_nntp_line
    at task-xover.cc line 307
  • #12 pan::NNTP::on_socket_response
    at nntp.cc line 143
  • #13 pan::GIOChannelSocket::do_read
    at socket-impl-gio.cc line 391
  • #14 pan::GIOChannelSocket::gio_func
    at socket-impl-gio.cc line 502
  • #15 g_main_context_dispatch

Comment 16 Calin Culianu 2007-03-22 19:46:52 UTC
Devchan1:

Thanks so much for testing!

Looks like a random memory corruption.  I audited the code really carefully and yes, I think I introduced a situation in which a deleted Task object continued to be used for a few function calls.  

I modified the code to fix this.  I am not sure this is the reason for your crash, but the situation was pretty bug-prone and happened to 'just work' on Linux.

It also is the type of bug that would make pan work at first but possible crash later.


Comment 17 Calin Culianu 2007-03-22 19:51:07 UTC
Created attachment 85131 [details] [review]
Latest version of my decoder thread patch -- apply against svn rev 205

Fixed another bug.. ;)
Comment 18 devchan1 2007-03-23 04:51:53 UTC
Hello,
the crash in comment http://bugzilla.gnome.org/show_bug.cgi?id=420618#c15 has nothing to do with your patch. svn rev 205 crashes with and without your patch applied on os x.
I will have to file a seperate bug report for that.
dc.
Comment 19 Calin Culianu 2007-03-23 07:47:10 UTC
Cool!  That's good news!

Awesome!

I actually had pan stable crash on me sometimes -- I only ever noticed it while testing my patch.  

The only way I could occasionally reproduce it was to open up a group such that the headers appear in the header pane, and then to download all headers.  It appears that's what's happening in your crash trace as well!

If you want, it appeared to me a workaround was to not have the headers for a group show in the header pane when downloading headers.  The crash appears to be related to the UI displaying the article headers as they are coming in.

I can actually see about fixing this separate bug too -- I am becoming more and more familiar with the pan2 sources so I can give it a shot -- the hard part is I cannot reliably reproduce the crash.

Do you have some additional insight into how to reproduce the crash?

Comment 20 devchan1 2007-03-23 12:36:08 UTC
hello,
it seems to me, the crash is always reproducible when things are done in the following order:


1. Erase all headers from a large binary group.
2. Start getting all new headers from the group.
4. When the gui indicates that the ammount of headers received is greater than zero, enter the header pane.
4. Scroll around a bit insider the header pane, inside tabbed view.
5. pan2 usually crashes.
Comment 21 Darren Albers 2007-03-23 13:28:35 UTC
Is this crash the same as bug: 371405?
Comment 22 Calin Culianu 2007-03-23 13:51:41 UTC
Well maybe, maybe not.  The backtrace is in a different part of the code, though.
Comment 23 Charles Kerr 2007-03-23 17:49:30 UTC
Calin:

There's a lot to like about your patch: the overall quality of
the code is good, the thread decoupling between TaskArticle and
the decoder looks right, and you're actively testing & updating.
So the first and main thing I want to say is I'm glad you're
contributing to Pan. =)

However, I do have some suggestions that I think would make this
patch fit better into Pan:

(1) (very minor) Please use `self' instead of `thiz'

(2) Eliminating globals in Pan was a deliberate decision,
so I'd prefer the patch to leave SocketCreator's introduction
alone -- passed into Queue's constructor rather than a global.

(3) The patch adds a lot of unused code that it shouldn't:
* Lockable is never used
* RWLock is never used
* AutoLocker is never used
* Cond is never used
* globals::mutex() is never used
* WorkerPool::setMaxThreads() is never used
* Mutex::isLocked() is never used
* Mutex::tryLock() is never used
* WorkerPool::maxThreads(), numThreads(), unprocessed() can be private

(4) After (2) and (3), the only item remaining in globals.h is
the WorkerPool's singleton accessor.  Even if it stays as a singleton,
moving the accessor to WorkerPool.h would let use remove globals.h
altogether.

(5) The please_stop flag is not worth it because 99% of do_work's
time is spend in UUDecodeFile, where please_stop can't touch it.
Unless there's something I'm missing it would be better to remove
cancel(), please_stop, was_cancelled(), and on_work_cancelled().

(6) I'm not sure the Decoder status updates are worth the extra thread
complexity -- TaskArticle can set its Progress status to `waiting'
or `decoding' in the main thread based on Decoder::check_out()'s
return value.  If we get rid of Decoder's percentage updates, many of
its state variables go away and so does Mutex.
Comment 24 Charles Kerr 2007-03-23 17:57:28 UTC
devchan1: I agree with Darren that your backtrace looks related to #371405,
which has been bothering me for awhile but I've been unable to reproduce.
Could you attach your information to that ticket too please?
Comment 25 Charles Kerr 2007-03-23 18:14:16 UTC
re #19: Carin, I'd welcome the help on bug #371405 .... ;)
Comment 26 Calin Culianu 2007-03-23 20:09:58 UTC
Thanks for the compliments Charles, and for reviewing the code!  I agree for the most part with most of your suggestions and only resist strongly one of them.  Further discussion would be welcome!

(1) Self instead of thiz?  Ha ha.  Sure.   I was getting a bit 'creative' there, I suppose. Ok, no problem.

(2) I am curious about what the reasoning was for eliminating globals?  Okay, will re-add SocketCreator as a passed-in parameter.

(3) I wrote that code first when I arrogantly thought I was going to change all of Pan2 to use threads.  The code does no harm, it is just all wrappers to primitives in glib -- but okay I'll remove it if its existance is unwanted.  I will still argue in favor of the Mutex class for (6) below..

(4) WorkerPool is not really a singleton -- there's nothing preventing you from creating multiple worker pools, and in fact it might be a good idea in some programs.  A WorkerPool is pretty much a glib thread_pool but it is more C++-ey in how it accepts work and also has the added benefit of implicitly notifying a Listener in the main thread.  Nothing says there should be only 1 WorkerPool in the program.  In this program there is, and it is shared between the GIO SocketCreator and the Decoder task.  So given that reasoning, I am not sure where to put our application's *only* WorkerPool -- it is actually the whole reason I created that globals.h/.cc.  If you want I can easily put a default WorkerPool that's a global in worker-pool.cc/.h, but it sort of bothers my sensibilities.  Another approach is to do the same thing you did with SocketCreator -- that is, pass it around like a hot potato.  Anyway I consider this a minor point (and thus will do whatever you indicate on this one), I just wanted to clarify that the workerpool is not a singleton and as such probably belongs elsewhere. The whole reason I made it a global is that in pan2 we happen to have 1 of them that multiple classes are free to use.  I normally hate globals too -- but sometimes they just make sense.  Also, there is the additional problem that the worker-pool *HAS* to be destructed (or at least cancel has to be called on its workers) or else pan2 doesn't exit properly (more on that later)..  So you want there to be a default worker-pool that is accessed from a static method in class WorkerPool?  Hmm.  I sort of liked WorkerPool to be a standalone component that doesn't make any assumptions about how you use it -- pan2 happens to need a WorkerPool with 4 exclusive threads (actually I can argue that it doesn't even need the exclusivity of the threads).

Please get back to me on (4) above but if it's a big deal to you and your design sensibilites, I can acquiesce on this one.

(5) Actually, it's helpful to use a stop flag.  I am going to strongly argue against this suggestion.  I am not sure how much you tested this on large attachments, but on my machine with my really slow disk, a 50MB attachment can take a WHILE to decode (like 30 seconds to 1 minute)!  If you decide to cancel the task, waiting for UULib to finish is just AWFULLY painful.  It's really nice to be able to stop/cancel/delete tasks ASAP.  

Also, more importantly perhaps: exiting pan2 properly NEEDS this flag.  The reason is that on some OSs the main process won't exit until its threads are done.  This is especially true if you call g_thread_pool_free(), which implicitly waits for running threads to finish.  Being able to tell a thread to abort early via this flag is immensely helpful -- rather than waiting 2 minutes to exit pan properly, you just wait a few seconds.

(6) Yes, I agree that the code is complex (between the fact that you have a timer, a mutex, and a very complicated way of calculating percentage complete -- it's complicated, I know!).  I am, however, forced to put up really strong resistance to this one.  In fact I consider this a HUGE point:

You can try to comment out the status update code and see how pan looks/behaves.   For small tasks you won't notice it.  But for large decode jobs (again, try it on a 50 MB attachment which DO exist -- DL a Knoppix DVD from one of the binaries groups and see!!), it is PAINFUL to not know what's happening. You have this job that takes 10, 20, 30, seconds.. hell on my machine 120 seconds and your brain has this big question mark about whether its working, it failed, or it's just busy.  Initially I had no percentage updates and it was ugly.  You had no idea how long the decode was taking and it just made you unhappy.  Like I said, for small attachments maybe it doesn't matter -- but I suggest you try this on a slow computer with large binaries -- a task that can take upwards of 2 minutes with no status update hugely unsettling when you are not be notified of its status.


Charles, thanks again for the discussion and suggestions.  Any and all thoughts are appreciated.  Oh and my name is "Calin" (it's like Calvin with no V) and not Carin.  ;)

PS: I can look at bug 371405 too, suree -- I am however excited about also getting this feature out to SVN ASAP so please hurry back with your further discussion so we can come to some agreement.

:)






Comment 27 Charles Kerr 2007-03-23 22:04:52 UTC
Sorry about misspelling your name, Calin.

Re (2) to make it easier to pass in a stub class to create dummy sockets
if/when the unit tests grow to that point, to follow the law of demeter,
for a predictable order of destruction for these objects when Pan closes,
and to keep glib from being /too/ pervasive if I ever get around to writing
a native Windows version of the GUI directory.  For similar reasons,
I like your "hot potato" suggestion for (4).

Re (5) and (6) ... okay.  To be honest even very large files decode
quickly for me, so maybe I'm not sensitized to this enough.

Minor Nits
==========

(7) the "(void) data;" lines at the top of TaskArticle::on_work_complete()
and TaskArticle::on_work_cancelled() appear to be cut-and-paste leftovers.

(8) gcc says that item and nitems are unused on task-article-decoder.cc:294.

(9) Every severe error in TaskArticle::on_work_complete() will pop up
a separate error dialog.  It would be better to fold them into a
single message so that the user doesn't get multiple popups, or to
remove log_urgents since we don't need it now.

(10) Since about half of task-article-decoder.cc is from task-article.cc,
I think its copyright notice is misleading.  I'm not sure what the correct
etiquette here is because previous contributors have just put my name in
the copyright attribution.  Of course it's fine for you to keep the
copyright for your code.

(11) Is it alright with you if I relicense Pan under GPL3 at some point
after GPL3 is finalized?
Comment 28 Calin Culianu 2007-03-24 07:04:37 UTC
(1) done.

(2) done

(2) Fair enough, changed.

(3) done -- I still left the lockable.h/.cc file but it just has the Mutex and MutexLocker in it.  

(4) Ok done.  Note that I create a WorkerPool on the stack in main and then pass it around to interested objects.  This actually ended up being anything that creates a new TaskArticle instance, as well as the GIOChannelSocket.  It turns out a lot of things create new TaskArticles, so it did get passed around a bit... but it does follow the pattern of other things in this codebase, so I am assuming that's ok.

(5) and (6) -- yeah that was my whole motivation for this patch (aside from the glory of the credits) -- I really have a horribly slow but huge disk that I use for this and it killed me to wait for it. :)

(7) It's a pet peeve of mine, because I usually compile with -W -Wall.  With those flags gcc complains that data is unused.

(8) done

(9) Eek I didn't know that about Log :: add_urgent!  Yikes!  Ok, deleted.  All errors go to log_errors, which doesn't appear to pop up any dialogs.

(10) You are right and I apologize.  I'm not particularly anal about copyrights and credits, although to be honest I do like to get credit since it is something I can point to when applying for jobs, etc.  At any rate, I apologize and I added you to the copyright notice for the file since the meat of the decode routine is yours.

(11) Sure!  As long as it's GPL.  I actually even modified the copyright notice on the files I have my name on to say GPL v2 or (at your option) any later version.


Ok, my patch is coming up...


Comment 29 Calin Culianu 2007-03-24 07:05:37 UTC
Created attachment 85213 [details] [review]
Latest version of my decoder thread patch -- apply against svn rev 205

Addressed discussion points raised by Charles, tested and works at least as well as my last revision of this patch.
Comment 30 Calin Culianu 2007-03-24 23:50:16 UTC
Created attachment 85237 [details] [review]
Latest version of my decoder thread patch -- apply against svn rev 205

Slightly tweaked decoder update percentage to be more accurate.
Comment 31 Charles Kerr 2007-03-25 02:48:58 UTC
Created attachment 85240 [details] [review]
Revised version of decoder thread patch

Calin: I like your last patch and am submitting this revision for your review and further revision.  Please revise and/or critique as needed... it's nice to have someone to bounce ideas off of.

Main changes
============

- The WorkerPool gets to TaskArticle and NNTP via Queue now, so we don't have to pass it around into gui, nzb, TaskArchive, etc.

- Queue::upkeep() and Queue::get_all_task_states() regressed on bug #352170 (short summary: avoid foreach(_tasks) in frequently-called functions because huge queues bog them down) so instead I've made the WorkerPool analogous in Queue to NNTP -- a resource that a task requests, is loaned, and checks back into the queue, which triggers a handler to loan the resource to another task.  As a side-effect we can remove Queue's Progress::Listener, which seems to fix the crash that required the delete-task-on-idle workaround.

- I've tried to make the copyrights correct and understandable, after someone pointed out to me how the About dialog's "Copyright 2002-2006 Charles, Copyright 2007 Calin" reads. =)

- All the unused code and unneeded #includes I could find have been removed.  This doesn't mean I have anything against that code -- just that we don't need it yet.
Comment 32 Charles Kerr 2007-03-25 02:56:32 UTC
Whoops, I messed up on the diff that made the patch in comment #31.  New patch coming...
Comment 33 Charles Kerr 2007-03-25 03:02:17 UTC
Created attachment 85241 [details] [review]
Revised version of decoder thread patch - - apply against svn rev 205
Comment 34 Charles Kerr 2007-03-25 17:30:13 UTC
Created attachment 85268 [details] [review]
Revised version of decoder thread patch - - apply against svn rev 206

Minor cleanup in task-article's state management.  Fix some #inlcudes and remove some of my own dead code in task-article.
Comment 35 Charles Kerr 2007-03-25 19:28:14 UTC
Calin: Is it intentional that WorkerPool's dtor doesn't free the Work structs that are waiting in the threadpool queue?
Comment 36 Calin Culianu 2007-03-25 20:32:55 UTC
Yes, it is intentional, because the Work structs that are running are still active, so there is no graceful way to delete them (if you delete them in the d'tor you are likely to get a segfault).

What should hopefully be happening is that for all the active Work structs, 'finalize()' will end up running on them, which will delete them.

At least this is the hope -- I called g_thread_pool_free(tpool, false, true) which should wait for the pool to complete any pending work before destroying it.

However this made me realize that there is a potential bug -- the work->pool->my_workers.erase() in finalize (which I added later) is completely going to crash the program.  Probably what should happen is that work->pool should be set to null in the case where work was running while the workerpool was destructed.
 
-Calin
Comment 37 Calin Culianu 2007-03-25 20:55:44 UTC
Charles, thanks for modifying my patch.  I like th work you did with the Queue object as it reads easier and is probably more in line with the way you designed it.

Some questions/comments:

1. How do I apply this patch?  The patch program complains about its format.  I presume you generated it with svn.  I am not an svn power-user, so tell me how to apply it?  

2. I noticed you removed the code from the task-pane.cc which generated specific UI messaging for "Queued for Decode" and "Decoding" steps, and also made decoding tasks bold (so that they appear to the eye as active tasks, like the downloading tasks do).  That's somewhat unfortunate since I liked being able to see in the task queue which tasks were a. decoding, b. enqueue for decode and c. actually downloading.  Can you please add that code back?  I really liked that from a UI 
perspective.  Anyway since I can't use your patch yet, I have no idea what the UI actually *Looks* like but I am assuming it will be this enigmatic "Queued 98%" for "Queued for Decode" tasks...?


Comment 38 Charles Kerr 2007-03-25 22:53:51 UTC
Created attachment 85280 [details] [review]
Revised version of decoder thread patch - - apply against svn rev 206

I generated the patch this way:
% cd pan2
% svn diff > ~/decode.patch

Applying the patch:
% svn checkout http://svn.gnome.org/svn/pan2/trunk pan2
% cd pan2
% patch -p0 < ~/decode.patch

About the WorkerPool, I was thinking less about the active workers
than the ones that hadn't been given a thread yet.  When WorkerPool
is destroyed their `Work' structs are leaked their `worker' is never
deleted, and the clients are never given a chance to clean up via 
on_work_{cancelled,destroyed}.

Regarding the UI changes in TaskPane... I was going to ask you
about that.  The issue is bug #352170 again, where I learned the
hard way to not loop through _tasks in the GUI and Queue upkeep
functions.  The old queue states (stopped, running, etc.) are set
by the Queue (not the Task), so Queue::get_all_task_states() is
very fast.  The new state `decoding' is set by the Queue too now,
so that's easy -- changing get_all_task_state()'s "if (decode_task)"
line to `DECODING' and adding your TaskPane tweaks gets us there
with no trouble.  (This change is in the attached patch.)

The problem with QUEUED_FOR_DECODE is that is comes from the Task, not
the Queue, so there's (currently) no way for Queue to know that state
without walking through _tasks.  We ought to add a Task::Listener or
class so that Queue can listen to state changes.  This way Queue could 
have a "std::set<Task*> _queued_for_decode" field to speed up 
get_all_task_states() and find_first_task_needing_decoder().

I'm going to be offline for the rest of the night, so if you want
to dig into any of these issues that would be great. =)

cheers,
Charles
Comment 39 Calin Culianu 2007-03-25 23:37:41 UTC
Hi Charles, 

Regarding the WorkerPool and threads that haven't been given a thread yet: my understanding is that the two scenarios of: assigned work running currently in a thread, or, assigned work waiting for a thread are pretty much the same (from the perspective of glib)?  Enqueued work (in either scenario) will eventually run even even after g_thread_pool_free() was called (if you tell g_thread_pool_free to wait for the work to finish with the second and third parameters)..?

At least that was my assumption from my reading of the API specs to glib.  If it's the case that g_thead_pool_free(pool, 0, 1), despite claiming that it waits for enqueued work to complete, actually doesn't wait for enqueued work that hasn't yet gotten a chance to run, /then/ we have a problem.  

I will work on this and test it to see how it behaves, and address it any way I can. 

As for the UI changes, thanks for being open to updating task-pane.cc again to be more informative (I sort of like knowing more about the actual things the tasks are doing).

I understand about not walking through the task list -- from reading bug#352170 it looks like updating the Task Pan all the time was causing it to slow down?  Or also perhaps calling process_task() on each task?  I actually meant to ask you about that (having read the commends in the code about that) and sort of just hoped you'd notice it in the patch and suggest a way to avoid that scenario.  

As a side-note: why not walk through the task-list in the Queue::get_all_task_states() call?  Already calling the 'setme.tasks.insert()' call in there is O(n)... in effect the entire task list is already being walked, albeit inside libstdc++ and not inside this code.. but it's being walked as an O(n) operation.  Or am I misunderstanding something?  

But barring asking the Task object in Queue:get_all_task_states(), then we can also have the Task object fire off an event that Queue listens to when it is waiting for decode, and yes, maintain a set of tasks that Queue categorizes as  QUEUED_FOR_DECODE.

Thanks for being amenable to the UI changes and I'll work on these tonight and hopefully have a revised patch based on your latest patch (for rev 206) sometime today or tomorrow..

Thanks!




Comment 40 Charles Kerr 2007-03-26 02:13:17 UTC
Well, I wasn't /going/ to be online tonight, but here I am... :)

Anyway, looking over get_all_task_states(), I think you're right that
another walk through _tasks wouldn't be the end of the world, particularly
since we're just comparing an integer to get the work state, and it's
simpler than coding up more listeners.

Now that I look at the glib comments for g_thread_pool_free(), it looks
like you've got that right as well.  Looks good!
Comment 41 Charles Kerr 2007-03-26 02:50:55 UTC
Created attachment 85289 [details] [review]
Minor revision to fix "Queued 98%" message.  Also, remember to svn add decoder/locking/workerpool before running the diff.
Comment 42 Calin Culianu 2007-03-26 08:21:48 UTC
Created attachment 85302 [details] [review]
Modified Charles' latest revision to add some UI stuff, fix potential bugs, etc

- Added manual walk of tasks in Queue::get_all_task_states in order to create a vector of need_decode tasks (for purposes of more informative UI).  (This actually could be faster than the insert() method that was there previously because now I am making sure to reserve space in the output vector.)
- Modified task-pane.cc to use information from above to say "Queued for Decode" for tasks that are waiting for the decoder thread
- Fixed incorrect assertion in ~TaskArticle()  (sometimes _decoder is not NULL if pan is being quit and the decoder thread didn't have a chance to stop yet)
- Made WorkerPool not crash if it is deleted while workers are running
- Guarded against division by zero in case of UULib giving strange answers to p->numparts in decoder.cc (this is more paranoia than a real bug)
- Made taskbar progress view of decoder task also contain filename that is being decoded


Otherwise this is pretty much Charles' patch to my patch and it looks pretty solid!!
Comment 43 Calin Culianu 2007-03-26 08:29:24 UTC
Ok Charles, I like the way you reorganized the relationship between the Queue and the decoder task.  The Decoder::Source idea is pretty elegant... I am jealous it didn't occur to me to do that initially, especially in light of the already-existing NNTP::Source... it really is a good solution to the problem of ownership of the decoder, checking it in and out, etc.

Anyway, I modified your patch slightly, as outlined above.

Also, I did actually go ahead and verify that our reading of the glib API is correct and yes, calling g_thead_pool_free(pool, 0, 1) on a running pool will in fact make the calling thread wait for all threads, currently running /and/ enqueued, to complete before returning from the call.

Cheers and I hope that's all she wrote for this patch!  Thanks so much for working with me on this (and as a result, for helping to get it in the mainline code quickly).

If this is done.. then I probably can move on and help fix some other bugs in pan, free time permitting

-Calin


Comment 44 Calin Culianu 2007-03-26 08:44:07 UTC
Created attachment 85303 [details] [review]
Modified Charles' latest revision to add some UI stuff, fix potential bugs, etc

Oops!  I forgot to clear Decoder::current_file in Decoder::init() which is a stateful variable used for ProgressView update..  This fixes that.
Comment 45 Calin Culianu 2007-03-26 09:31:45 UTC
Created attachment 85306 [details] [review]
Oops!  Two more bugs fixed -- apply against svn 206

Oops!  Two more bugs:  

- IMPORTANT: Queue::find_first_task_needing_decoder() erroneously forgot to check if the the candidate task it is selecting is stopped or removing before returning.  

This led to, on certain slow systems (such as mine, where decoder tasks are visible for a long time) "stopped" decoder tasks at the beginning of the Queue not allowing any further decoder tasks to run!

- PARANOIA BUG: Fixed potential crash (if code is ever reorganized) because decoder worker might actually call Listener::was_cancelled() on a deleted TaskArticle *.  To avoid this, added the more forceful WorkerPool::Worker::gracelessly_quit() method, called from ~TaskArticle, which should only be called in extreme circumstances and causes WorkerPool::Worker::Listeners to not be notified when the worker finishes.  (Note that this bug can't happen currently as the main_loop exits before the TaskArticles are deleted).
Comment 46 Calin Culianu 2007-03-26 09:47:10 UTC
Created attachment 85307 [details] [review]
One more tweak -- fixed task pane window title counts

Sorry to keep spamming with patches.  I keep finding minor issues and bugs as I am actually using pan actively with this patch.  :)

This fixes the task pane to properly show the right task counts based on Queue::TaskState
Comment 47 Charles Kerr 2007-03-26 20:49:21 UTC
Created attachment 85347 [details] [review]
another minor revision

looking good.  a couple of minor changes:
- added "is this task being removed?" test to find_first_task_needing_decode().
- removed unused Queue::get_worker_pool()
- removed unused WorkerPool::cancelAllWorkers()
Comment 48 Calin Culianu 2007-03-26 21:32:40 UTC
Acknowledged.  Let me know if anything else comes up esp. with this patch... Cheers!
Comment 49 Calin Culianu 2007-03-26 21:44:29 UTC
Wow.  New patch refuses to compile.  Apparently g_mime_utils_8bit_header_decode() takes a const unsigned char * as its argument, not a const char *.  

utf8-utils.cc: In function 'std::string pan::header_to_utf8(const pan::StringView&, const char*, const char*)':
utf8-utils.cc:189: error: invalid conversion from 'const char*' to 'const unsigned char*'
utf8-utils.cc:189: error:   initializing argument 1 of 'char* g_mime_utils_8bit_header_decode(const unsigned char*)'


Apparently on my version of glib (2.12.9) it won't compile.  Do I need to revert to an older version?

Why did you change that call anyway?  Was it causing problems with older glibs?



Should I fix this or shall you?

Comment 50 Charles Kerr 2007-03-26 22:52:57 UTC
I changed it because the old code wouldn't compile for me after I upgraded to GMime 2.2.5 this morning.  Looks like this change in GMime in February is the culprit: http://svn.gnome.org/viewcvs/gmime/trunk/gmime/gmime-utils.h?view=diff&r1=1075&r2=1076

I'll talk to fejj about the change.  In the meantime just clip that part out of the diffs or add the cast back in by hand.
Comment 51 Calin Culianu 2007-03-26 23:15:44 UTC
Acknowledged.. worst case is if they have two different type signatures for the function amongst different versions you may have to resort to #ifdefs, etc.  :(

As far as what gmime is doing.. *shrugs*.. I personally am all about being strict with types but sometimes it's best to let sleeping dogs lie... We probably won't be the first project to stop compiling due to this change.



Comment 52 Charles Kerr 2007-03-26 23:21:36 UTC
(As an aside, I've filed a ticket (bug #423147) on GMime for this.  If fejj wants to stick with the change, we can wrap the line in a GMIME_CHECK_VERSION macro.  If he backs his change out for 2.2.6 then we'll leave the line alone.)
Comment 54 Rinaldi J. Montessi 2007-03-27 01:16:25 UTC
Something broke.  I installed the latest patch to svn 206 and could neither view images nor read messages.  Dumped the tree, svn'd 207, rebuilt with no patch and things work as designed.  Any info I might provide to help?

Slackware 11.0 (kernel 2.6.15)
gmime-2.2.4

Note:  I use the tabbed view.  If I double clicked a message in the header pane, the view pane opened blank.  Clicking Article > Read Article would render the image/text but neither spacebar nor Next Article icon would display the next message.  Additionally the Queue count would raise accordingly showing 96-98% complete.
Comment 55 Darren Albers 2007-03-27 02:08:26 UTC
Rinaldi, if you have svn 207 that should include the patch already so you shouldn't have to do anything.
Comment 56 Calin Culianu 2007-03-28 23:58:06 UTC
Hey Rinaldi,

I just happened to run into fejj (Jeffrey Stedfast) of gmime and he says he just caught a nasty bug that appears in gmime-2.2.4 and 2.2.5 only -- which exactly explains your inablity to open messages.  I created a ticket for it: bug #423951.

This bug is in gmime entirely, and there's not even a bugzilla ticket for it!  

From your symptoms I think this is the reason for the bug.

Either switch gmime versions (anything but 2.2.4 and 2.2.5) *or* wait for Jeff to release 2.2.6 of gmime (which he says is to be imminent due to the severity of this bug).

Note: switching gmime versions may require a rebuild of pan?


Comment 57 Rinaldi J. Montessi 2007-03-29 00:18:15 UTC
Thanks, Calin.  Since it occurred here in svn 206 with the decode patch applied but not in 207 which contains the patch, a look at the code before and after inclusion may help him determine what causes it.
Comment 58 Calin Culianu 2007-03-29 00:26:30 UTC
Oh wait I didn't understand that it works sometimes.  

But to clarify -- svn 207 actually already contains the threaded decoder patch!  So it's probably not due to this patch (open another ticket for this if you like and cc me on it?)

Well, since it works sometimes, maybe it's not related to gmime -- although I still suggest you change gmime versions (anything other than 2.2.4 or 2.2.5) as soon as you can, and try it again.


Comment 59 Rinaldi J. Montessi 2007-03-29 01:30:24 UTC
Are you saying the patch to 206 was incorporated without change into svn 207?  If so I cannot explain the random occurrence.
Comment 60 Calin Culianu 2007-03-29 02:54:15 UTC
Yes, rev. 207 most definitely contains the threaded decoding patch.

Yes, the random occurrence is problematic.  Perhaps some specific usage pattern trips this bug?   

I am still partial to the gmime theory -- which is why it would be nice to rule it out or to confirm it either way -- can you try a different gmime? (Other than 2.2.4 or 2.2.5)










Comment 61 SciFi 2007-03-29 07:17:45 UTC
(In reply to comment #60)
> can you try a different gmime? (Other
> than 2.2.4 or 2.2.5)

fwiw I stayed with 2.2.4 all this time (for these decoder thread tests).

OH, gmime-2.2.6 was released late this afternoon.
your bugreport #423951 mentioned it, sho'nuf it's there at his d/l site.
I'm fixin' to give it a try...  ;)