GNOME Bugzilla – Bug 357981
improve multitasking, task queuing, few bugs
Last modified: 2006-09-28 00:53:33 UTC
Bacause the lack of documentation, this post is a crossover between a bug report and enhancement proposal. I tested using pan with about 20 newsservers, subscribing to 5 binary and 5 text groups. I noticed the following: 1 [feature] tasks list doesnt display what tasks are actually running (i.e. the status bar says that 8 tasks are running, but only 3 are displayed in the statusbar and the tasks bar doesnt really refresh its data) 2 [bug] tasks dont seem to react when moving them to the top in the tasks window (one would expect that tasks moved to the top will be performed earlier) 3 [feature] multipart messages are probably pulled from one server only 4 [feature] not sure how pan handles messages that are available on more then one server. but to optimize throughput i would assume that the tasks scheduler should choose what serer to pull the article from based on server's average speed (and average speed in say last 30 minutes of usage), article age and number of free connections to the server. 5 [feature] parallelizing the getting new headers. again not sure how this is handeled by pan, but i would expect that on 20 servers, 10 newsgroups and usually 4 connections the download of new articles should really hit the bandwidth but all i have seen was a few kB increase in load. 6 [bug] not every of the 20 newsservers carries each of the 10 groups. pan doesnt seem to recognize this because in the eventlog i had about 300 events "unable to set group" 411 no such group errors. 7 [bug] pan doesnt seem to care if the server requires authorisation. instead of disabling access (or trying to inteligently throttle access) to the server and giving a warning to the user, it happily hammers the server trying over and over to download stuff. 8 [feature] the left pane with newsgroups: - subscribed newsgroups could be an expandable menu as well with a listing of servers carrying the group + no. of articles of the group on the particular server. servers not carrying this newsgroup wouldnt be displayed or would be grayed out 9 [feature] left pane could have a switch allowing to switch to a newsgroup list (such as now), task scheduler, event list, server list (each server might be again an expandable menu which would show what subscribed groups the server carries) 10 [feature] event list could do with some more deetails (what server was the error caused on, newsgroupt, article id... whatever you think of) 11 [feature] event list/task list and bandthwidth meter could have some graphical representation (progress bar for running tasks), speed meter ..theese would be a few i could think of in a short while. while its a lot, pan still seems to be the best newgroup client for linux around. even in its beta stage. thanks! ps: sorry that i didnt submit this in one post but the textflow is hard to keep if you have to break text into several tiny texts.
I'm closing this ticket as `invalid' not because your comments are invalid (some are very good) but because bugzilla is ill-suited for lists like this. I'd like to see some of these items moved into their own tickets: * Please open a new bug ticket for #10 with examples of the error messages you got that were insufficient and what you think they should've said instead. I think your suggestion is a good idea. * Please also open tickets for #6 and #7 if they're repeatable behavior for you, since I'm not sure what's causing them and we can look into it if you like. The remaining points are addressed below. Short answer: what are you doing with 20 news servers?? ;) Most commercial servers can fill your bandwidth with 4-8 connections; the rest is just wasted resources. This will slow down the task queue, connection management, and so on. This is likely the cause of any slowdown you're seeing. > 1 [feature] tasks list doesnt display what tasks are actually running > (i.e. the status bar says that 8 tasks are running, but only 3 are > displayed in the statusbar and the tasks bar doesnt really refresh > its data) The statusbar is horizontal space, so 8 tasks would be unreadable. The tasks that list how many KiB/sec they're getting are the ones running. > 2 [bug] tasks dont seem to react when moving them to the top in the > tasks window (one would expect that tasks moved to the top will be > performed earlier) They will be, eventually. There's no way to interrupt an NNTP command, so Pan has to wait for the server to finish its current task before returning the connection to the pool for reuse. It's likely that the overhead of 80 connections slows this down. > 3 [feature] multipart messages are probably pulled from one server only > > 4 [feature] not sure how pan handles messages that are available on more > then one server. but to optimize throughput i would assume that the tasks > scheduler should choose what serer to pull the article from based on > server's average speed (and average speed in say last 30 minutes of usage), > article age and number of free connections to the server. These are two sides of the same coin. Pan pulls multipart messages from as many servers as it can use. Multiserver is handled by a connection pool. Whenever a connection is returned to the queue it's instantly snatched up by whatever task needs a connection. This is simpler than averaging things and works well since each connection gets used as much as the server will allow. > 5 [feature] parallelizing the getting new headers. again not sure how > this is handeled by pan, but i would expect that on 20 servers, > 10 newsgroups and usually 4 connections the download of new articles > should really hit the bandwidth but all i have seen was a few kB > increase in load. Pan does parallelize getting new headers. It's likely that the overhead of 80 connections slows this down. > 8 [feature] the left pane with newsgroups: > - subscribed newsgroups could be an expandable menu as well with a > listing of servers carrying the group + no. of articles of the > group on the particular server. servers not carrying this newsgroup > wouldnt be displayed or would be grayed out This would be insanely complex for the average case of [1..3] servers. You do realize that 20 news servers is extremely unusual? > 9 [feature] left pane could have a switch allowing to switch to a > newsgroup list (such as now), task scheduler, event list, server list > (each server might be again an expandable menu which would show > what subscribed groups the server carries) See 8. > 11 [feature] event list/task list and bandthwidth meter could have some > graphical representation (progress bar for running tasks), speed meter This would be nice. I did try this out, but it made the task pane very slow. Showing % done and overall speed as text, as it does now, gives the same info without bogging down the system. > ..theese would be a few i could think of in a short while. > while its a lot, pan still seems to be the best newgroup > client for linux around. even in its beta stage. thanks! Thanks. :)
Hi, i will look into opening the tickets you asked for during tomorrow if you dont mind. for now I would like to react to the other points you commented (not all of them, some of your replies setteled everything): The remaining points are addressed below. > Short answer: what are you doing with 20 news servers?? ;) > Most commercial servers can fill your bandwidth with 4-8 > connections; the rest is just wasted resources. This will > slow down the task queue, connection management, and so on. > This is likely the cause of any slowdown you're seeing. good question :) well i'm a scientist, one of my side hobbies concerns internet traffic analysis, search engines and the "invisible web". for most of my work i use custom written scripts but sometimes they are too unconvinient. appart from pan i also use a windows newsreader under quemu because i usually connect to some 70 commertial as well as free servers, in some occasions a few hundred of servers. I still keep using the win software mainly because it easily scales its connection to the extend i need without crashing the system. i didnt look at pan's source but if there is so much overhead with each server, maybe then something could be done better (i wont really be too much of a help, i dont really know c/c++ that well to be able to help you directly). oh. the connecton ... i usually use either a 11Mbps or a 2Gbps backbone connection, the later is somewhat hard to fill. > The statusbar is horizontal space, so 8 tasks would be unreadable. > The tasks that list how many KiB/sec they're getting are the ones running. yup, well thats why it might be more interesting to move the content of the status bar to the left pane and have some buttons to swich to between the list of groups and the status stuff. this would save you rendering time and you could use the status for some global statistics graphs (i.e. a connection meter, no. of tasks running, blablabla) >> 2 [bug] tasks dont seem to react when moving them to the top in the >> tasks window (one would expect that tasks moved to the top will be >> performed earlier) > They will be, eventually. There's no way to interrupt an NNTP command, > so Pan has to wait for the server to finish its current task before > returning the connection to the pool for reuse. > It's likely that the overhead of 80 connections slows this down. true. on the other hand, usually servers respond in a few seconds. i dont know how long pan waits before it considers that the request timed out. fter a brief look into settings i suggest a n.1 [feature] set timeout for a nntp request in preferences. if answer arrives after the timeout, send packets to /dev/null this timeout would really help in point 3 and 4 while preserving the current polling setup. maybe it would be a good thing to separate running and queued tasks in the list. running tasks just have to run till the finish or timeout. but queued tasks should be easily re-sortable, shouldnt they? >> 5 [feature] parallelizing the getting new headers. again not sure how >> this is handeled by pan, but i would expect that on 20 servers, >> 10 newsgroups and usually 4 connections the download of new articles >> should really hit the bandwidth but all i have seen was a few kB >> increase in load. > Pan does parallelize getting new headers. > It's likely that the overhead of 80 connections slows this down. hmm, i really cant help, this doesnt seem to work well. naturally i cant compare my blunt php/python scripts that fork easily to hundreds. but the windows software i mentioned that i am using doesnt seem to have a too big problem getting many newsgroups at once. maybe there could be another n.2 [feature] settings: have possibilty to set how many - new headers can be fetched at once - new newsgroups/list newsgroups can run at once - articles can be pulled at once >> 8 [feature] the left pane with newsgroups: >> - subscribed newsgroups could be an expandable menu as well with a >> listing of servers carrying the group + no. of articles of the >> group on the particular server. servers not carrying this newsgroup >> wouldnt be displayed or would be grayed out > This would be insanely complex for the average case of [1..3] servers. > You do realize that 20 news servers is extremely unusual? yes i do :) well the reason i asked for this is especially because of the group that was downloaded over and over again even if it didnt exist on the server. having a possibility to see what servers are used to access a group would allow for easy manual override. >> 9 [feature] left pane could have a switch allowing to switch to a >> newsgroup list (such as now), task scheduler, event list, server list >> (each server might be again an expandable menu which would show >> what subscribed groups the server carries) > See 8. i really cant tell how much compexity this would add. as i mentioned, i am rather poor at programming in c/c++ >> 11 [feature] event list/task list and bandthwidth meter could have some >> graphical representation (progress bar for running tasks), speed meter > This would be nice. I did try this out, but it made the task > pane very slow. Showing % done and overall speed as text, as > it does now, gives the same info without bogging down the system. i'd guess its all about the refresh rate. you dont really need to update the stats every milisecond. every 2, 3 seconds should do pretty well and you would really need an arachaic machine to make the app very slow. ... the win soft i was speaking about is usenet explorer. basically its pretty ugly because it has an overkill of features. but some of them i really consider nice. esp. the abbility to scale well :) ... overhead problem: if you can give me details on what exactly does the overhead consist of (if its usenet/network related), maybe i can have a look if there isnt an easier way of doing things (if its c/c++ related, forget it:)) anyways, if you want to, feel free email me if you dont like using bugzilla for this type of talk. good luck, Pavel