After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 771442 - gdm should fallback to Xorg on laptops with 2 gpu-s
gdm should fallback to Xorg on laptops with 2 gpu-s
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: wayland
3.21.x
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2016-09-14 17:12 UTC by Hans de Goede
Modified: 2016-10-29 02:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
native: only match drm subsystem devices (2.62 KB, patch)
2016-10-18 20:58 UTC, Ray Strode [halfline]
none Details | Review
native: fail on systems with connectors spread across multiple gpus (4.34 KB, patch)
2016-10-18 20:58 UTC, Ray Strode [halfline]
none Details | Review
native: only match drm subsystem devices (2.75 KB, patch)
2016-10-19 15:21 UTC, Ray Strode [halfline]
committed Details | Review
native: shore up matching of card device (4.70 KB, patch)
2016-10-19 15:21 UTC, Ray Strode [halfline]
committed Details | Review
native: fail on systems with connectors spread across multiple gpus (6.68 KB, patch)
2016-10-19 15:21 UTC, Ray Strode [halfline]
committed Details | Review
native: don't call steal_pointer prematurely (3.78 KB, patch)
2016-10-19 19:23 UTC, Ray Strode [halfline]
committed Details | Review

Description Hans de Goede 2016-09-14 17:12:42 UTC
Hi,

Since wayland does not yet support prime and esp. not slave outputs, gdm really should fallback to Xorg if there are 2 /dev/dri/card# devices. Otherwise external connectors on laptops which are only connected to the discrete gpu will not be usable (without manually switching to a gnome on xorg session).

I guess this has been a problem wrt the login screen not showing on such external displays for a while now, but although for the login screen this typically is not a big problem, this is a problem with the actual gnome-session, think e.g. not being able to use a projector on conferences because of this.

Regards,

Hans
Comment 1 Ray Strode [halfline] 2016-10-18 20:57:48 UTC
So I think this workaround should be done in mutter, since mutter is the place that will ultimately need to be fixed.
Comment 2 Ray Strode [halfline] 2016-10-18 20:58:45 UTC
Created attachment 337980 [details] [review]
native: only match drm subsystem devices

Despite g_udev_client_new taking a list of subsystems, it doesn't
implicitly filter results to those subsystems.

This commit explicitly adds a subsystem match to make sure sound cards
don't end up in the resulting list of video cards.
Comment 3 Ray Strode [halfline] 2016-10-18 20:58:50 UTC
Created attachment 337981 [details] [review]
native: fail on systems with connectors spread across multiple gpus

We don't support using GPUs as slave devices yet, so we should fail
if we encounter that situation, so GDM will fall back to X.
Comment 4 Rui Matos 2016-10-19 13:13:55 UTC
Review of attachment 337981 [details] [review]:

::: src/backends/native/meta-launcher.c
@@ +299,3 @@
+
+      /* filter out the real card devices, we only care about the connectors */
+      if (g_udev_device_get_device_type (dev) != G_UDEV_DEVICE_TYPE_NONE)

Are we sure only connectors are TYPE_NONE ?

@@ +310,3 @@
+        continue;
+
+      parent = g_udev_device_get_parent (dev);

Should we check that parent is a drm char device?

@@ +313,3 @@
+      g_hash_table_insert (cards, (gpointer) g_udev_device_get_name (parent), parent);
+    }
+  num_devices = g_hash_table_size (cards);

Doesn't matter much here but this returns an unsigned int. Maybe this whole function should as well.

@@ +348,3 @@
+      num_devices = count_devices_with_connectors (seat_name, devices);
+      if (num_devices != 1)
+        goto out;

This leaks the devices list.
Comment 5 Rui Matos 2016-10-19 13:15:10 UTC
Review of attachment 337980 [details] [review]:

::: src/backends/native/meta-launcher.c
@@ +295,2 @@
   g_udev_enumerator_add_match_name (enumerator, "card*");
+  g_udev_enumerator_add_match_subsystem (enumerator, "drm");

GUdevEnumerator should do this internally IMO, it's what I'd expect from the API at least.

Anyway, it never hurts to be explicit either.
Comment 6 Bastien Nocera 2016-10-19 13:21:11 UTC
FWIW, it seems to support Prime on my machine (T430s with Intel and NVidia cards). We need to make sure that Wayland is still usable on those machines, when the discrete GPU is only used for render offloading.
Comment 7 Ray Strode [halfline] 2016-10-19 13:23:38 UTC
> GUdevEnumerator should do this internally IMO, it's what I'd expect from the
> API at least.
Yea, I should probably file a bug about that, and link to the bug in a comment above the add_match_subsystem call.
Comment 8 Bastien Nocera 2016-10-19 13:26:27 UTC
(In reply to Ray Strode [halfline] from comment #7)
> > GUdevEnumerator should do this internally IMO, it's what I'd expect from the
> > API at least.
> Yea, I should probably file a bug about that, and link to the bug in a
> comment above the add_match_subsystem call.

Patches welcome, says the defacto maintainer (ie. me).
Comment 9 Ray Strode [halfline] 2016-10-19 13:28:44 UTC
(In reply to Bastien Nocera from comment #6)
> FWIW, it seems to support Prime on my machine (T430s with Intel and NVidia
> cards). We need to make sure that Wayland is still usable on those machines,
> when the discrete GPU is only used for render offloading.

Well, "support" on some level.  We currently completely ignore the card that's not used at boot up, which is sufficient for now, as long as there aren't connectors hardwired to it. Keeping that case from regressing is the main reason the patch isn't just seeing if there is more than one card (the other complication is multiseat)
Comment 10 Bastien Nocera 2016-10-19 14:06:57 UTC
(In reply to Ray Strode [halfline] from comment #9)
> (In reply to Bastien Nocera from comment #6)
> > FWIW, it seems to support Prime on my machine (T430s with Intel and NVidia
> > cards). We need to make sure that Wayland is still usable on those machines,
> > when the discrete GPU is only used for render offloading.
> 
> Well, "support" on some level.  We currently completely ignore the card
> that's not used at boot up, which is sufficient for now, as long as there
> aren't connectors hardwired to it. Keeping that case from regressing is the
> main reason the patch isn't just seeing if there is more than one card (the
> other complication is multiseat)

I can make apps render on the 2nd GPU, displayed on the first. That's how I implemented (and tested) https://bugzilla.gnome.org/show_bug.cgi?id=773213 which worked on both X.org and Wayland.
Comment 11 Ray Strode [halfline] 2016-10-19 14:08:26 UTC
(In reply to Bastien Nocera from comment #8)
> Patches welcome, says the defacto maintainer (ie. me).
Okay I therw a quick patch in bug 773224.
Don't push it until I try it and report that it works !
Comment 12 Ray Strode [halfline] 2016-10-19 14:11:06 UTC
(In reply to Bastien Nocera from comment #10)
> I can make apps render on the 2nd GPU, displayed on the first. That's how I
> implemented (and tested) https://bugzilla.gnome.org/show_bug.cgi?id=773213
> which worked on both X.org and Wayland.
Oh yea, that's true. That's all handled magically under the hood by mesa.
Comment 13 Ray Strode [halfline] 2016-10-19 14:22:14 UTC
(In reply to Rui Matos from comment #4)
> Review of attachment 337981 [details] [review] [review]:
> 
> ::: src/backends/native/meta-launcher.c
> @@ +299,3 @@
> +
> +      /* filter out the real card devices, we only care about the
> connectors */
> +      if (g_udev_device_get_device_type (dev) != G_UDEV_DEVICE_TYPE_NONE)
> 
> Are we sure only connectors are TYPE_NONE ?
Well, the device list is only devices that start with card* glob.
At the moment, that's only connectors and the char devices.

I don't think that's likely to change, but if you like I can explicitly add a 

g_udev_device_has_sysfs_attr (dev, "modes");

call, too.

> 
> @@ +310,3 @@
> +        continue;
> +
> +      parent = g_udev_device_get_parent (dev);
> 
> Should we check that parent is a drm char device?
Wouldn't hurt. i'll do it.
Comment 14 Ray Strode [halfline] 2016-10-19 15:21:37 UTC
Created attachment 338035 [details] [review]
native: only match drm subsystem devices

Despite g_udev_client_new taking a list of subsystems, it doesn't
implicitly filter results to those subsystems.

This commit explicitly adds a subsystem match to make sure sound cards
don't end up in the resulting list of video cards.
Comment 15 Ray Strode [halfline] 2016-10-19 15:21:41 UTC
Created attachment 338036 [details] [review]
native: shore up matching of card device

Right now we accept any character device that matches the glob card*.

That's fine, but we can be a little more specific by checking that
the devtype is what we expect.

This commit does that.
Comment 16 Ray Strode [halfline] 2016-10-19 15:21:46 UTC
Created attachment 338037 [details] [review]
native: fail on systems with connectors spread across multiple gpus

We don't support using GPUs as slave devices yet, so we should fail
if we encounter that situation, so GDM will fall back to X.
Comment 17 Rui Matos 2016-10-19 15:37:56 UTC
Review of attachment 338035 [details] [review]:

ok
Comment 18 Rui Matos 2016-10-19 15:40:50 UTC
Review of attachment 338036 [details] [review]:

looks good
Comment 19 Rui Matos 2016-10-19 15:45:48 UTC
Review of attachment 338037 [details] [review]:

lgtm
Comment 20 Ray Strode [halfline] 2016-10-19 17:10:12 UTC
Attachment 338035 [details] pushed as ef20000 - native: only match drm subsystem devices
Attachment 338036 [details] pushed as f1e1a5f - native: shore up matching of card device
Attachment 338037 [details] pushed as e2bfaf0 - native: fail on systems with connectors spread across multiple gpus
Comment 21 Ray Strode [halfline] 2016-10-19 19:23:41 UTC
Created attachment 338045 [details] [review]
native: don't call steal_pointer prematurely

commit e2bfaf07514ed633f8721b5f521577685b6cccc0 does this:

g_hash_table_insert (cards,
                     g_udev_device_get_name (parent_device),
                     g_steal_pointer (&parent_device));

The problem is the g_steal_pointer call may happen before the
g_udev_device_get_name call leading to a crash.

This commit does the get_name call on an earlier line
Comment 22 Ray Strode [halfline] 2016-10-19 19:24:00 UTC
Comment on attachment 338045 [details] [review]
native: don't call steal_pointer prematurely

Attachment 338045 [details] pushed as d491063 - native: don't call steal_pointer prematurely
Comment 23 Hubert Figuiere (:hub) 2016-10-29 01:37:48 UTC
FWIW, this now makes me regress highly (on my machine) since on Fedora 25 that has these patches, I can't disable this as I get hit by this bug https://bugzilla.redhat.com/show_bug.cgi?id=1356115

Sadness of side effects.

(this is just a FYI, nothing else, I'd love to have the whole discrete GPU switcheroo works)
Comment 24 Ray Strode [halfline] 2016-10-29 01:46:12 UTC
Theres a variable you can put in etc environment to get old behavior.  Check patch. (i would but im one handed on a phone right now)
Comment 25 Hubert Figuiere (:hub) 2016-10-29 02:24:07 UTC
I found MUTTER_ALLOW_HYBRID_GPUS. I'll try that. Thanks for the pointer!
Comment 26 Hubert Figuiere (:hub) 2016-10-29 02:30:36 UTC
Workaround works ! Thanks !
(blacklisting nouveau didn't, but that's another story)