GNOME Bugzilla – Bug 579159
Add pidgin log store
Last modified: 2010-08-07 18:27:15 UTC
http://git.collabora.co.uk/?p=user/jonny/empathy.git;a=shortlog;h=refs/heads/log-sources <jonnylamb> cassidy: Not merged, not updated, needs a bit of work. <jonnylamb> cassidy: I should split out the pidgin log store and get that merged as it appears to work.
The right branch is: http://git.collabora.co.uk/?p=user/jonny/empathy.git;a=shortlog;h=refs/heads/log-stores
Created attachment 132762 [details] [review] diff configure.ac | 24 + libempathy-gtk/empathy-chat.c | 9 - libempathy/Makefile.am | 16 + libempathy/empathy-log-manager.c | 947 ++++++++++---------------------- libempathy/empathy-log-manager.h | 107 ++-- libempathy/empathy-log-store-empathy.c | 714 ++++++++++++++++++++++++ libempathy/empathy-log-store-empathy.h | 66 +++ libempathy/empathy-log-store-pidgin.c | 713 ++++++++++++++++++++++++ libempathy/empathy-log-store-pidgin.h | 64 +++ libempathy/empathy-log-store-sqlite.c | 625 +++++++++++++++++++++ libempathy/empathy-log-store-sqlite.h | 63 +++ libempathy/empathy-log-store.c | 156 ++++++ libempathy/empathy-log-store.h | 94 ++++ src/empathy.c | 8 + 14 files changed, 2899 insertions(+), 707 deletions(-)
Marking the patch as need-works as its author said the branch wasn't ready yet.
*** Bug 562343 has been marked as a duplicate of this bug. ***
*** Bug 586930 has been marked as a duplicate of this bug. ***
Jonny Lamb: Does your branch support importing the pidgin logs into Empathy, or only viewing them while leaving them in pidgin? I'd prefer to import old logs and move/convert them into Empathy's standard log storage.
My branch allows viewing of pidgin logs, not importing them. I don't see the point in importing the logs.
(In reply to comment #7) > My branch allows viewing of pidgin logs, not importing them. I don't see the > point in importing the logs. I have around 2 years worth of logs in pidgin that is stopping me from moving to empathy because I refer to them regularly. They are a log (like email) of conversations I have with customers that are there for historical reference. I wouldn't move to empathy until I can have them all in the same place.
Peter: The point here is if we import logs, we get lots of sync problems. What if we import the first time empathy is started, then the user continue using pidgin sometimes? We'll have to import and merge new logged message, etc... Which really hard and will be bogus for sure. Our solution is to let pidgin logs in place, but add support to read them from empathy. That's really what you need ;)
Oh. I wouldn't expect it synced. Just an initial import as part of the account migration so that i have the history and it would be a clean swap. I don't think something that syncs is realistic.
(In reply to comment #8) > (In reply to comment #7) > > My branch allows viewing of pidgin logs, not importing them. I don't see the > > point in importing the logs. > > I have around 2 years worth of logs in pidgin that is stopping me from moving > to empathy because I refer to them regularly. They are a log (like email) of > conversations I have with customers that are there for historical reference. I > wouldn't move to empathy until I can have them all in the same place. Exactly the same situation here. I don't want to have to keep .purple/logs around for empathy to read; I want all my logs in Empathy's native storage mechanism.
(In reply to comment #10) > Oh. I wouldn't expect it synced. Just an initial import as part of the account > migration so that i have the history and it would be a clean swap. I don't > think something that syncs is realistic. Agreed. I would only need this as part of a one-time migration away from pidgin.
There is no point in doing one-time import if we can just read from pidgin's native format. I doesn't matter if you have to keep .purple/ that's hidden directory anyway. That will be transparent to the user ;)
(In reply to comment #13) > There is no point in doing one-time import if we can just read from pidgin's > native format. I doesn't matter if you have to keep .purple/ that's hidden > directory anyway. > > That will be transparent to the user ;) If I run Empathy, and don't run Pidgin, I don't expect to have to keep the Pidgin log directory around. I don't consider that transparent at all.
You could offer the user a choice on what to do. Something to this effect (not this exact wording, necessarily): "You have Pidgin logs. Would you like to move these files to the Empathy folder? If you are switching to Empathy, you probably want to move the files. Either way, you will be able to view the logs in Empathy." If the user answers "move", ~/.purple/logs would be rename()d to somewhere in Empathy's directory and a symlink would be created so Pidgin would still work. If the rename() fails, you should tell the user that they need to manually copy the folder. (This would happen if the directories were on different partitions.) For the record, I think converting files is a terrible idea unless the conversion is 100% lossless. Also, there is a good chance some people will continue to use Pidgin for a while (on older systems sharing/syncing a home directory or on Windows). For example, if I switched to Empathy, I'd still have to run Pidgin on Windows. Richard (the Pidgin logging code "maintainer")
From what I can see (using Empathy 2.26.1 on Ubuntu Jaunty), the transformation would NOT be lossless. For example, the Pidgin logs store the timezone of the conversation, where Empathy stores UTC and displays in local time. This means if you travel to a new timezone, all of your old logs display with incorrect times.
(In reply to comment #16) > the Pidgin logs store the timezone of the > conversation, where Empathy stores UTC and displays in local time. This means > if you travel to a new timezone, all of your old logs display with incorrect > times. Based on that description, it sounds like Empathy would display the *correct* time for the old logs, just translated into your local timezone, which seems useful; it just wouldn't tell you *where* the conversation occurred. :)
We obviously differ on which behavior is correct, but storing the timezone information (as Pidgin does) allows for either behavior. Empathy's current data storage format does not, so the transformation would be lossy.
(In reply to comment #18) > We obviously differ on which behavior is correct, but storing the timezone > information (as Pidgin does) allows for either behavior. Empathy's current data > storage format does not, so the transformation would be lossy. Either way, I can live with Empathy's behavior, and I'd still rather have logs in the native format. If that native format needs improving for other reasons, that seems like a separate issue.
I would prefer to move my logs so i no longer have to keep backing up my .purple folder....
(In reply to comment #20) > I would prefer to move my logs so i no longer have to keep backing up my > .purple folder.... Exactly.
http://git.collabora.co.uk/?p=user/jonny/empathy.git;a=shortlog;h=refs/heads/pidgin-store I separated out my branches so this one only has the pidgin store in it. The SQLite one is far from ready. review plz
Created attachment 145656 [details] [review] Pidgin store
I just added basic ugly support for reading HTML files to the above branch.
Here is my review: - empathy-log-store-pidgin.h is missing from makefile - search_hit_cmp is wrong, if is_chatroom is != and strcmp returns +1, it results to 0 which is not right. - empathy_log_manager_get_chats: I'm wondering what's the faster: g_list_insert_sorted or g_list_prepend and then sort it all once done. - Empathy uses FIXME: instead of TODO: libempathy/empathy-log-store-pidgin.c - #include <time.h> you should insert empathy-time.h instead. What's the #define before? - log_store_pidgin_get_dir: You can simplify the butterfly hack: dup_id = NULL; if (chat_id != NULL && g_str_has_suffix (chat_id, "#1")) chat_id = dup_id = g_strndup (chat_id, (strlen (chat_id) - 2)); ... g_free (dup_id); - log_store_pidgin_get_dir: IRC also needs special case I think. - log_store_pidgin_get_time: if it does not have .txt suffix, it could check for html format. I don't think it's wise to just take the filename as it. - Should the returned data be UTC or local? Maybe add a comment telling which tz uses pidgin - log_store_pidgin_get_filenames_for_date: You could add a comment explaining what you are doing with that regex. - log_store_pidgin_search_hit_new: It can be triggered from 1) log_store_pidgin_search_new: In that case it is right to lookup for the account, but you should do it only once for all hits in the same dir. 2) log_store_pidgin_get_messages_for_date: The account is already provided, so you should not lookup it. So some refactoring is needed to avoid unnessary account lookup. - log_store_pidgin_search_hit_new: For the account lookup, probably more special cases are needed, for IRC at least. - log_store_pidgin_get_messages_for_files: I don't like the goto next_item, you should just unref the account an use continue; IMO. - log_store_pidgin_get_messages_for_files: /* TODO: There's no way to tell which user is you. */ I don't understand what's the problem here... - log_store_pidgin_get_messages_for_files: You can avoid the g_strdup ("") with message = empathy_message_new (body ? body : ""); - log_store_pidgin_search_new: g_mapped_file_free is deprecated - log_store_pidgin_get_chats_for_dir: pidgin does not escape name at all? Maybe we should have a reverse function in some way?
I opened bug #599186 about the sqlite store.
Do we have any heuristic to associate logs from Pidgin with existing Empathy accounts?
For those, who want to convert their logs, I've started a little project, called Jalli. I don't know, but you could take a look at it. http://jalli.berlios.de Sorry for this 'advertisement'.
(In reply to comment #27) > Do we have any heuristic to associate logs from Pidgin with existing Empathy > accounts? Logs are stored by pidgin in the following format: ~/.purple/logs/protocol/username[@server]/contactid[@server]/date.{html,txt} From there we've already got all this information which I think is pretty good to link Pidgin and Empathy accounts together. I guess it falls down currently when you have multiple accounts with mainly the same settings. For example, two jabber accounts where the only difference is that one connects on a different port from the other. Without parsing ~/.purple/accounts.xml, we can't know that though. I personally think this solution is acceptable. Do you not?
(In reply to comment #29) > I guess it falls down currently when you have multiple accounts with mainly the > same settings. For example, two jabber accounts where the only difference is > that one connects on a different port from the other. Pidgin (well, really libpurple) would put the logs in the same folder in this case, so that's fine. Richard
(In reply to comment #30) > Pidgin (well, really libpurple) would put the logs in the same folder in this > case, so that's fine. Empathy doesn't, and wouldn't know which account to assign the found pidgin logs with. It would just use the first one it found. I think this is a pretty corner case.
(In reply to comment #25) > - empathy-log-store-pidgin.h is missing from makefile Fixed. > - search_hit_cmp is wrong, if is_chatroom is != and strcmp returns +1, it > results to 0 which is not right. Fixed. > - Empathy uses FIXME: instead of TODO: meh, fixed. > - #include <time.h> you should insert empathy-time.h instead. What's the > #define before? We need strptime, or is there an alternative? > - log_store_pidgin_get_dir: You can simplify the butterfly hack: > dup_id = NULL; > if (chat_id != NULL && g_str_has_suffix (chat_id, "#1")) > chat_id = dup_id = g_strndup (chat_id, (strlen (chat_id) - 2)); > ... > g_free (dup_id); Done. > - log_store_pidgin_get_dir: IRC also needs special case I think. Fixed. > - log_store_pidgin_get_time: if it does not have .txt suffix, it could check > for html format. I don't think it's wise to just take the filename as it. Why not? > - Should the returned data be UTC or local? Maybe add a comment telling which > tz uses pidgin Added comment. > - log_store_pidgin_get_filenames_for_date: You could add a comment explaining > what you are doing with that regex. Done. > - log_store_pidgin_search_hit_new: It can be triggered from > 1) log_store_pidgin_search_new: In that case it is right to lookup for the > account, but you should do it only once for all hits in the same dir. > 2) log_store_pidgin_get_messages_for_date: The account is already provided, > so you should not lookup it. > So some refactoring is needed to avoid unnessary account lookup. Done. > - log_store_pidgin_search_hit_new: For the account lookup, probably more > special cases are needed, for IRC at least. Done. > - log_store_pidgin_get_messages_for_files: I don't like the goto next_item, > you > should just unref the account an use continue; IMO. Done. > - log_store_pidgin_get_messages_for_files: > /* TODO: There's no way to tell which user is you. */ I don't understand > what's the problem here... I've updated the comment. > - log_store_pidgin_get_messages_for_files: You can avoid the g_strdup ("") > with > message = empathy_message_new (body ? body : ""); Done. > - log_store_pidgin_search_new: g_mapped_file_free is deprecated Fixed. > - log_store_pidgin_get_chats_for_dir: pidgin does not escape name at all? I'd not looked into it. Fixed now. Branch is in the same place.
Cool, new review round: - It does not build because of lot if shadow declaration. - time.h is already included from empathy-time.h with the needed #define ;) - log_store_pidgin_get_time(): You don't have to parse hour/min/sec with sscanf, you don't use them. - log_store_pidgin_get_time(): /* FIXME: Returns time in local only. */ --> you don't return time, just a date, so there is no tz involved afaik. - log_store_pidgin_get_filenames_for_date(): date is always in the format 2009-11-01, no need of a regex here. - log_store_pidgin_get_messages_for_files(): lenght should be gsize, that's the type used by g_file_get_contents. You can even drop it, you don't use its value. - log_store_pidgin_search_hit_new(): You shoud remove ".chat" suffix from hit->chat_id - log_store_pidgin_get_chats_for_dir(): shouldn't it check for ".chat" suffix and skit if it doesn't make the is_chatroom boolean? - log_store_pidgin_get_messages_for_files(): Should take the account in args instead of calling log_store_pidgin_dup_account. - log_store_pidgin_get_messages_for_files(): You set the timestamp of the conversation on the EmpathyMessage, instead of the one of the message. - log_store_pidgin_get_messages_for_files(): That function is a mess. I think you should split it between html and plain text parsers. That's all for now :)
Created attachment 146683 [details] [review] little cleanup to add in your branch
Bug assigned to Jonny as is branch has been reviewed by Xavier.
(In reply to comment #33) > - It does not build because of lot if shadow declaration. Hm, did you rebase my branch? Fixed, anyway. > - time.h is already included from empathy-time.h with the needed #define ;) No, it's not. > - log_store_pidgin_get_time(): You don't have to parse hour/min/sec with > sscanf, you don't use them. > - log_store_pidgin_get_time(): /* FIXME: Returns time in local only. */ --> > you don't return time, just a date, so there is no tz involved afaik. Done. > - log_store_pidgin_get_filenames_for_date(): date is always in the format > 2009-11-01, no need of a regex here. Done. > - log_store_pidgin_get_messages_for_files(): lenght should be gsize, that's > the type used by g_file_get_contents. You can even drop it, you don't use its > value. Done. > - log_store_pidgin_search_hit_new(): You shoud remove ".chat" suffix from > hit->chat_id Done. > - log_store_pidgin_get_chats_for_dir(): shouldn't it check for ".chat" suffix > and skit if it doesn't make the is_chatroom boolean? Done. > - log_store_pidgin_get_messages_for_files(): Should take the account in args > instead of calling log_store_pidgin_dup_account. Done. > - log_store_pidgin_get_messages_for_files(): You set the timestamp of the > conversation on the EmpathyMessage, instead of the one of the message. Good catch. Done. (In reply to comment #34) > Created an attachment (id=146683) [details] [review] > little cleanup to add in your branch Applied. Next time, please use git format-patch though.
We are moving to use the telepathy-logger. This branch should probably be ported to it. http://telepathy.freedesktop.org/wiki/Logger
I have a two questions: - Will use empathy 2.30 the telepathy-logger? - What is the formats of the logs? Plain text (like pidgin)? XML?
- yes (see bug #518414) - XML atm
(In reply to comment #39) > - yes (see bug #518414) > - XML atm thanks for your answer :)
Xavier: even if this branch will never be merged as it and should be ported to tp-logger, it's still probably worst to review it from a code pov. It would be cool if you could check if all your review comments have been fixed.
Closing this bug as this should now be done in tp-logger. Please continue review/comments on https://bugs.freedesktop.org/show_bug.cgi?id=26907