GNOME Bugzilla – Bug 575572
gnome-settings-daemon/libgdu notification support (S.M.A.R.T. warnings, RAID array degradation)
Last modified: 2009-04-12 01:26:35 UTC
This patch adds libgdu (gnome-disk-utility) integration to Gnome desktop. It monitors hard drives and RAID arrays and notifies user when something goes wrong (hardware failure, mdadm kicking out mirror from array, etc.) TODO: - delayed write alert (how can we detect that?) - the "Investigate" button should spawn palimpsest with the selected (failing) device -- needs support in palimpsest
Created attachment 130757 [details] [review] gnome-settings-daemon-2.25.92-gdu-2.patch Proposed patch against final g-s-d-2.26.0 Requires libgdu-0.2
Changing to gnome-disk-utility component, we agreed to make this as external g-s-d plugin. Will post new patch soon.
Created attachment 130835 [details] [review] 0001-gnome-settings-daemon-plugin.patch Ported to gnome-disk-utility source tree, g-s-d is an optional dependency now.
+ GsdGDUManager *manager; /* TODO: I don't like circular references */ +} GsdGduWatch; Minor nit: inconsistent capitalization, GsdGDU vs GsdGdu
+ if (program) + g_free (program); + if (icon_text) + g_free (icon_text); Another pedantic comment: no need for the branches here, g_free handles NULL fine.
+static void +do_investigate (GsdGduWatch *watch) +{ + g_spawn_command_line_async (DISK_MANAGEMENT_UTILITY, NULL); + + if (watch->status_icon != NULL) { + g_object_unref (watch->status_icon); + watch->status_icon = NULL; + } +} Longer-term, it would be nice to launch the ui in a way that focuses the right disk/partition/whatever was causing the notification in the first place.
+static void +monitored_list_populate (GsdGDUManager *manager) +{ + GsdGDUManagerPrivate *p = manager->priv; + GList *presentables, *l; + + presentables = gdu_pool_get_presentables (p->pool); + + for (l = presentables; l != NULL; l = l->next) { + if (l->data) + add_monitor (manager, l->data); + } + + g_list_foreach (presentables, (GFunc) g_object_unref, NULL); + g_list_free (presentables); +} Is this something that takes time ? If so, we might consider delaying it somewhat to not slow down login...
Created attachment 130964 [details] [review] 0001-gnome-settings-daemon-plugin.patch (In reply to comment #6) > Longer-term, it would be nice to launch the ui in a way that focuses the right > disk/partition/whatever was causing the notification in the first place. This was on my TODO list. Filed bug 575948 for tracking. (In reply to comment #7) > Is this something that takes time ? If so, we might consider delaying it > somewhat to not slow down login... Not necessarily, no signs of that in documentation (libgdu should keep this list cached). Anyway, it's a good idea to postpone monitoring start. Forgot to mention last time: when plugin starts, it checks devices health upon start. So any present failures will show up one minute after desktop session starts. (In reply to comment #4) > Minor nit: inconsistent capitalization, GsdGDU vs GsdGdu Fixed. (In reply to comment #5) > Another pedantic comment: no need for the branches here, g_free handles NULL > fine. Fixed.
You could use g_timeout_add_seconds for the STARTUP_DELAY The 10 seconds seem a bit short for the duplicate warning check ? We probably need to work on the wording of these messages a bit: "S.M.A.R.T. subsystem reported hardware failure. There's high risk of data loss." "The array is running in degraded mode. There's a chance of data loss when device fails." I don't have a perfect wording right now, but maybe something like: "One of your drives is reporting hardware probems. There is a high risk of data loss." "Your RAID array is running in degraded mode. There is a chance of data loss when a drive fails." Other than that, the patch looks good enough for an initial commit to me. David ?
Hi, A few comments - coding style doesn't match the rest of the module and also appear to be inconsistent (in -plugin.c vs. -manager.c) - see src/gdu/*.[ch] for details and please copy the Modeline too, e.g. /* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 8 -*- */ - hmm, maybe this is a stupid question but how do you actually get the code to run? Does it involve logging in/out all the time? Sending a HUP signal to the gnome-settings-daemon process took down all apps in my session. So maybe we just want this to be a separate daemon instead since - libgdu may be slightly unstable in the next six months and we don't want to crash sessions; and - it's going to make hacking on this much harder. So let's go with a separate daemon for now. Suggest to put the code in the src/notification/ sub-directory. - I'm not a big fan of libnotify and I don't think it should be the primary UI for this. I think instead we want a notification icon instead (the base user experience) and then *maybe* add libnotify support later. The user experience we want should be - no failing devices => no icon - failing devices -> icon. Tooltip should say something like "One or more devices are failing" or maybe even special case it depending on what failures we have. I don't know. User can right or left click the icon and get a menu with "[Presentable_Icon] <Presentable_Name> - Disk is failing" "[Presentable_Icon] <Presentable_Name> - Array is degraded" and clicking the icon should launch Palimpsest. That's bug 575948 right? Note that we already have icons for this (gdu-error, see data/icons/*) - Do we need a way for the user to ignore errors/devices? I don't know. But it seems like something we should at least consider... You ask in the patch whether the smart data is automatically updated. Yes, it is automatically updated; we poll the disk every 30 minutes (might change that to 10 minutes) if the disk isn't sleeping. Ditto for RAID array status. So the logic in your code is right. Also note that gnome-disk-utility HEAD and DeviceKit-disks HEAD has seen some changes to use libatasmart instead of parsing smartctl output. So the API has changed. This patch should fix it. $ diff -up gsd-gdu-manager.c.orig gsd-gdu-manager.c --- gsd-gdu-manager.c.orig 2009-03-23 20:07:57.000000000 -0400 +++ gsd-gdu-manager.c 2009-03-23 20:11:37.000000000 -0400 @@ -181,7 +181,6 @@ watch_check_state (GsdGduWatch *watch) { GduDevice *device; char *title; - GduSmartData *smart; g_return_if_fail (watch != NULL); g_return_if_fail (watch->presentable != NULL); @@ -194,15 +193,12 @@ watch_check_state (GsdGduWatch *watch) /* S.M.A.R.T. */ if (time (NULL) > watch->last_smart_warning + NOTIFICATION_DUPLICATION_TIMEOUT && - gdu_device_drive_smart_get_is_capable (device) && gdu_device_drive_smart_get_is_enabled (device)) - { - smart = gdu_device_get_smart_data (device); - if (smart) { - if (gdu_smart_data_get_is_failing (smart)) { - gdu_show_notification (watch, title, _("S.M.A.R.T. subsystem reported hardware failure. There's high risk of data loss.")); - watch->last_smart_warning = time (NULL); - } - g_object_unref (smart); + gdu_device_drive_ata_smart_get_is_available (device)) { + if ((gdu_device_drive_ata_smart_get_is_failing (device) && gdu_device_drive_ata_smart_get_is_failing_valid (device)) || + gdu_device_drive_ata_smart_get_has_bad_sectors (device) || + gdu_device_drive_ata_smart_get_has_bad_attributes (device)) { + gdu_show_notification (watch, title, _("S.M.A.R.T. subsystem reported hardware failure. There's high risk of data loss.")); + watch->last_smart_warning = time (NULL); } }
> So let's go with a separate daemon for now. Suggest to put the code > in the src/notification/ sub-directory. FWIW, I disagree with that idea. Yes, gnome-settings-daemon is not as stable as one would hope for right now. But putting our head in the sand and doing our own thing instead is not the right answer to that. I'm going to build a fixed PackageKit in rawhide in a bit that will make killing gnome-settings-daemon work without taking down everything. Unfortunately, even after that, restarting g-s-d still kills gnome-session, since it can't handle loading the bug-buddy module for some reason. That can easily worked around by gconftool-2 --set /apps/gnome_settings_daemon/gtk-modules/gnomebreakpad --type boolean false bug-buddy is useless anyway
(In reply to comment #9) > You could use g_timeout_add_seconds for the STARTUP_DELAY Makes sense, we could save a little power by reduce polling, especially with such always running things. > The 10 seconds seem a bit short for the duplicate warning check ? The original issue was to prevent burst of "changed" events spawned at the same time, e.g. when DK-disks daemon refreshed SMART data. Thinking about it now, we can extend this interval to say one minute.
(In reply to comment #10) > - hmm, maybe this is a stupid question but how do you actually get the > code to run? Does it involve logging in/out all the time? Sending a > HUP signal to the gnome-settings-daemon process took down all apps > in my session. > > So maybe we just want this to be a separate daemon instead since > > - libgdu may be slightly unstable in the next six months and > we don't want to crash sessions; and > > - it's going to make hacking on this much harder. > > So let's go with a separate daemon for now. Suggest to put the code > in the src/notification/ sub-directory. The reason why gnome-settings-daemon was chosen is to have the notification ability without needing to run extra command within gnome-session. As for XFCE, you can choose to start Gnome services on startup and this thing would be started automatically within g-s-d. But I understand your concern about stability, it's all in a single process. For debugging/hacking, we can easily add a simple #ifdef around main() function, just like Vuntz did in http://bugzilla.gnome.org/attachment.cgi?id=124020&action=view The plugin is quite independent, the only difference is how is it spawned. > - I'm not a big fan of libnotify and I don't think it should be the > primary UI for this. I think instead we want a notification icon > instead (the base user experience) and then *maybe* add libnotify > support later. Status icon (stock exclamation mark) is already in the code and libnotify can be easily disabled. Do you think having status icon only would be enough for user to register that his disks are failing? I mean I often have about ten icons in the notification area, two blinking for incoming messages (e-mail, jabber, icq), three icons from PackageKit (a big red circle for critical updates, another exclamation mark that my session needs restart, ...). I'm not usability expert though... Otherwise I agree with your proposal about the icon menu. > You ask in the patch whether the smart data is automatically updated. Yes, it > is automatically updated; we poll the disk every 30 minutes (might change that > to 10 minutes) if the disk isn't sleeping. Ditto for RAID array status. So the > logic in your code is right. Thanks for the confirmation, we should note this to documentation (I'll file separate bug for doc issues later).
(In reply to comment #13) > The reason why gnome-settings-daemon was chosen is to have the notification > ability without needing to run extra command within gnome-session. As for XFCE, > you can choose to start Gnome services on startup and this thing would be > started automatically within g-s-d. But I understand your concern about > stability, it's all in a single process. > > For debugging/hacking, we can easily add a simple #ifdef around main() > function, just like Vuntz did in > http://bugzilla.gnome.org/attachment.cgi?id=124020&action=view > The plugin is quite independent, the only difference is how is it spawned. For the time being I'd like this to be a separate daemon; reasons include - it's much easier to hack on; things like running gdb, debug output and so forth - libgdu is going to be unstable for at least the next six months and I don't want to deal with bringing down gnome-settings-daemon (neither for users nor when I'm break API in libgdu and do 'make install') since that is going to cause a lot of trouble When things are baked and all, we can make this into a gnome-settings-daemon plug-in. But for now I'd like a separate daemon. Simply because it makes things a lot easier for us as developers. > > - I'm not a big fan of libnotify and I don't think it should be the > > primary UI for this. I think instead we want a notification icon > > instead (the base user experience) and then *maybe* add libnotify > > support later. > Status icon (stock exclamation mark) is already in the code and libnotify can > be easily disabled. Do you think having status icon only would be enough for > user to register that his disks are failing? I mean I often have about ten > icons in the notification area, two blinking for incoming messages (e-mail, > jabber, icq), three icons from PackageKit (a big red circle for critical > updates, another exclamation mark that my session needs restart, ...). I'm not > usability expert though... > > Otherwise I agree with your proposal about the icon menu. Sounds good. David
I added a notification daemon in this commit http://git.gnome.org/cgit/gnome-disk-utility/commit/?id=f5c0e92affa5ec515d03eb1d8abfcccd2b4fe3eb since I just implemented the "show dialog for slow unmount operations" feature (bug 578722). I'll look into making it show ATA SMART warnings etc.
Note that Tomas was also working on the smart warnings again, recently. You may want to check with him to avoid duplicate work.
I've implemented a very simple status icon for ATA SMART status, see http://people.freedesktop.org/~david/gdu-ata-smart-warning.png http://git.gnome.org/cgit/gnome-disk-utility/commit/?id=f6c622352767cdd7ffe94b78f58749b9e76ba71a There's room for improvement in what I committed (note the TODO item) but let's deal with such improvements in separate bugs.