Bug 706293 – platform: fix linux platform tests and run them whenever uid is zero

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 706293 - platform: fix linux platform tests and run them whenever uid is zero


Summary:	platform: fix linux platform tests and run them whenever uid is zero


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Thomas Haller
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:	nm-0.9.10

Reported:	2013-08-19 11:41 UTC by Pavel Simerda
Modified:	2014-06-06 14:28 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
run linux platform tests whenever uid is zero, print a message otherwise (2.77 KB, patch) 2014-04-17 08:03 UTC, Pavel Simerda	none	Details \| Review

Description Pavel Simerda 2013-08-19 11:41:16 UTC

While the kernel-based tests for nm-platform are by no means complete, they require root privileges and are disabled by default, they are an important part of the testsuite that is there to ensure individual NetworkManager modules behave according to our expectations.

Reproducer:

./autogen.sh --enable-tests=root && make check

Currently all kernel-based tests fail.

I partly suspect:

commit eba74843671b494700f9b108e68f422187f851c9
Author: Dan Williams <dcbw@redhat.com>
Date:   Sun Aug 4 22:08:48 2013 -0500

    core: don't wait for udev to find created software devices
    
    NM knows software devices it creates exist immediately, and it knows
    it can use them immediately instead of waiting for udev to find them.
    Ideally we'd wait for udev to find all devices, to allow tags and
    other rules to run, but when creating software devices they must
    currently be available for use immediately.
    
    Bridges and bonds are required to be IFF_UP before they can be
    activated (see their is_available() implementation).  But during
    activation code paths, specifically in nm_manager_activate_connection(),
    the interface is created and immediately checked for availability
    to activate (since the creation was a result of the user requesting
    device activation).  That availability check fails, because the
    device is not visible outside NMPlatform (because it hasn't been
    found by udev yet, because we haven't gotten back to the event
    loop) and thus nm_platform_link_is_up() fails, failing the
    availability check.  Thus the device activation ends in error.
    
    What should really happen is that device activation shouldn't
    be synchronous; a new NMActiveConnection should be created for
    any activation request, which can then wait until all the
    resources it needs for activation are available (device found,
    master devices created, etc).  That's going to take more work
    though.

AFAIK before the change, nm-platform required *waiting* for software devices but after the change the signal emission is immediate. The tests make a strict difference between those behaviers (which I believe is good).

Also, nm-platform.c says:

/**
 * nm_platform_link_add:
 * @name: Interface name
 * @type: Interface type
 *
 * Add a software interface. Sets platform->error to NM_PLATFORM_ERROR_EXISTS
 * if interface is already already exists.  Any link-added signal will be
 * emitted from an idle handler and not within this function.
 */
static gboolean
nm_platform_link_add (const char *name, NMLinkType type)
{

I'm curious whether this documentation still holds true.

There may be other changes that break the testsuite or this one can even be innocent in this case.

Comment 1 Pavel Simerda 2014-04-17 08:03:00 UTC

Created attachment 274578 [details] [review]
run linux platform tests whenever uid is zero, print a message otherwise

Comment 2 Pavel Simerda 2014-04-17 08:07:17 UTC

(In reply to comment #1)
> Created an attachment (id=274578) [details] [review]
> run linux platform tests whenever uid is zero, print a message otherwise

Also published as pavlix/tests.

Comment 3 Pavel Simerda 2014-04-17 14:31:31 UTC

My fixes for nm-platform tests are in 'pavlix/tests', thomas has more to say here so looking forward to his update.

Comment 4 Thomas Haller 2014-04-17 17:37:12 UTC

Pushed branch th/bgo706293_platform_tests for review.

This branch obsoletes the patches from pavlix/tests and attachment 1 [details].


First add some functionality to nm-test-utils.h for later use:
  e42a613 platform/test: use nm-test-utils.h in test-common
  9a66a3d nmtst: add functions nmtst_platform_ip[46]_routes_equal
  af380cb nmtst: add functions nmtst_static_SIZE_NUM() to create a copy of a string returned from an internal buffer
  4f9bd8b nmtst: make all test functions "inline static"
  cea10ec nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
  4a9c2c5 nmtst: add nmtst_initiated()
  d0eebeb nmtst: add parameter to nmtst_init to setup nm-logging


fix to skip bonding-tests if no system-support:
  88943cd platform/test: disable bonding tests if no bonding module is available


Then fix the tests so that linux-platform tests work again. Most of the time fake-platform (and the tests) did the wrong thing. Change the test and fake-platform to behave like linux-platform.
Thereby also fix a bug in platform that causes duplicate/inconsistent link-added signals (and causes the linux-platform test to break).
  150bc1d platform/test: fix expecting NM_PLATFORM_SOURCE_KERNEL in platform tests
  162c368 platform: fix software device handling when announcing links
  990ea4f platform/test: fix fake platform to simulate synchronous signals


Finally, enable root tests unconditionally. --enable-test=root still exists, and it causes `make check` to fail of run as non-root (instead of silently skipping linux-platform tests). @pavlix, this is modified from what we had before, where --enable-test=root was identical to --enable-test=yes.
  88c804b platform/test: always run linux platform tests (will be skipped as non-root)
  8169722 platform/test: skip linux platform tests if run as non-root without error

Comment 5 Dan Winship 2014-04-18 14:57:06 UTC

>   d0eebeb nmtst: add parameter to nmtst_init to setup nm-logging

I don't like having a file in libnm-util referring to stuff in src/.

Actually, given that nm-test-utils.h doesn't get installed anyway, there's no reason to have it in libnm-util/; just move it to include/ instead. And actually, merge it with the existing include/nm-test-helpers.h.

I also don't like the fact that it only works if you include nm-logging.h *before* nm-test-utils.h (and it still compiles, but doesn't work right, if you do it the other way). I guess a lame but sufficient fix would be to have nm-logging.h #error out if nm-test-utils.h has already been included.

>   4a9c2c5 nmtst: add nmtst_initiated()

There is already g_test_initialized(), which checks if you've called g_test_init(). I feel like we should use that rather than adding our own additional check.

>   cea10ec nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG

Likewise, there is g_test_verbose() / --verbose (although there is not a corresponding environment variable, but maybe there should be).

>   af380cb nmtst: add functions nmtst_static_SIZE_NUM() to create a copy of a
> string returned from an internal buffer

There's a "g_message()" in the definition which is probably left over from debugging?

>   88943cd platform/test: disable bonding tests if no bonding module is
> available

Do:

    if (...) {
        g_test_skip ("bonding module not available");
        return;
    }

rather than g_message(). That way it gets logged as "skipped" and shows up as such in the test results, making it more noticeable that you didn't manage to test everything.

>   162c368 platform: fix software device handling when announcing links

It looks like this patch causes software devices to never get added to priv->udev_devices, meaning among other things that they will not get an NM_DEVICE_UDI.

>+	/* Announce device removal if it is no longer announcable. */

"announceable"

>   8169722 platform/test: skip linux platform tests if run as non-root without
> error

>+	if (SETUP == nm_linux_platform_setup && getuid() != 0) {
>+		g_message("Skipping test: requires root privileges (%s)", program);
>+		return EXIT_SUCCESS;
>+	}

If this is going to happen at the top level, it should "return 77;" which is a special code telling automake's test harness that all tests were skipped. Alternatively, you could do the check in each test where it matters, and call g_test_skip() within each test (and g_test_run() will then return 77 if every test calls g_test_skip()).

>   88c804b platform/test: always run linux platform tests (will be skipped as
> non-root)

"RUN_ROOT_TESTS" is no longer an accurate name. Maybe REQUIRE_ROOT_TESTS?

Comment 6 Thomas Haller 2014-04-22 16:32:53 UTC

Repushed branch after reworking and rebasing.

(previous version is th/bgo706293_platform_tests-1)


(In reply to comment #5)
> >   d0eebeb nmtst: add parameter to nmtst_init to setup nm-logging
> 
> I don't like having a file in libnm-util referring to stuff in src/.
> 
> Actually, given that nm-test-utils.h doesn't get installed anyway, there's no
> reason to have it in libnm-util/; just move it to include/ instead. And
> actually, merge it with the existing include/nm-test-helpers.h.
> 
> I also don't like the fact that it only works if you include nm-logging.h
> *before* nm-test-utils.h (and it still compiles, but doesn't work right, if you
> do it the other way). I guess a lame but sufficient fix would be to have
> nm-logging.h #error out if nm-test-utils.h has already been included.

done



> >   4a9c2c5 nmtst: add nmtst_initiated()
> 
> There is already g_test_initialized(), which checks if you've called
> g_test_init(). I feel like we should use that rather than adding our own
> additional check.

I renamed nmtst_initiated() to nmtst_initialized() but kept it.

nmtst_initialized() is there to check that the user called nmtst_init() -- which he must do before calling some of the functions (currently just nmtst_get_rand0()).

Actually, nmtst_init() does not call g_test_init() if you do
  nmtst_init (0, NULL, ...);
So, their meaning is orthogonal.

OkOk, if you say, that you don't actually need this distinction, just don't bother and do:
  nmtst_init (&argc, &argc, ...);
and use g_test_initialized() only.



> >   cea10ec nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
> 
> Likewise, there is g_test_verbose() / --verbose (although there is not a
> corresponding environment variable, but maybe there should be).

nmtst_is_debug() defaults now to g_test_verbose().

Setting it via environment variable I think is really neat. I think this justifies to prefer nmtst_is_debug() over g_test_verbose().


> >   af380cb nmtst: add functions nmtst_static_SIZE_NUM() to create a copy of a
> > string returned from an internal buffer
> 
> There's a "g_message()" in the definition which is probably left over from
> debugging?

True.



> >   88943cd platform/test: disable bonding tests if no bonding module is
> > available
> 
> Do:
> 
>     if (...) {
>         g_test_skip ("bonding module not available");
>         return;
>     }
> 
> rather than g_message(). That way it gets logged as "skipped" and shows up as
> such in the test results, making it more noticeable that you didn't manage to
> test everything.

Done. It needed however a wrapper for pre-2.38 glib. See commit:

    nmtst: add compatibility wrapper for g_test_skip()



> >   162c368 platform: fix software device handling when announcing links
> 
> It looks like this patch causes software devices to never get added to
> priv->udev_devices, meaning among other things that they will not get an
> NM_DEVICE_UDI.

Fixed. Seems correct to me now.


> >+	/* Announce device removal if it is no longer announcable. */
> 
> "announceable"

done


> >   8169722 platform/test: skip linux platform tests if run as non-root without
> > error
> 
> >+	if (SETUP == nm_linux_platform_setup && getuid() != 0) {
> >+		g_message("Skipping test: requires root privileges (%s)", program);
> >+		return EXIT_SUCCESS;
> >+	}
> 
> If this is going to happen at the top level, it should "return 77;" which is a
> special code telling automake's test harness that all tests were skipped.
> Alternatively, you could do the check in each test where it matters, and call
> g_test_skip() within each test (and g_test_run() will then return 77 if every
> test calls g_test_skip()).
> 
> >   88c804b platform/test: always run linux platform tests (will be skipped as
> > non-root)
> 
> "RUN_ROOT_TESTS" is no longer an accurate name. Maybe REQUIRE_ROOT_TESTS?

I combined the previous two commits and now the test programs fail if REQUIRE_ROOT_TESTS was defined.



please re-review. Thanks

Comment 7 Dan Winship 2014-04-22 21:33:23 UTC

> Done. It needed however a wrapper for pre-2.38 glib. See commit:
> 
>     nmtst: add compatibility wrapper for g_test_skip()

That should go in nm-glib-compat.h, not nm-test-utils.h. Keep all the glib compat functions in the same place.

> > >+	/* Announce device removal if it is no longer announcable. */
> > 
> > "announceable"
> 
> done

Actually, you deleted the space between "device" and "removal" and left the spelling mistake in... :-}

Comment 8 Thomas Haller 2014-04-23 15:32:17 UTC

(In reply to comment #7)
> > Done. It needed however a wrapper for pre-2.38 glib. See commit:
> > 
> >     nmtst: add compatibility wrapper for g_test_skip()
> 
> That should go in nm-glib-compat.h, not nm-test-utils.h. Keep all the glib
> compat functions in the same place.

fixed.



New commits are:

"nmtst: add nmtst_free()"
"nmtst: add nmtst_exec_sudo() function"
"core: add compatibility wrapper for g_test_skip() to nm-glib-compat.h"
"platform/test: allow running root tests as sudo"


Try:

  NMTST_DEBUG=no-debug,sudo-cmd=$PWD/tools/test-sudo-wrapper.sh make -C src/platform/tests/ check

:D

Comment 9 Thomas Haller 2014-04-23 18:31:21 UTC

Rebased to current master and repushed.
Two minor fixup! commits added:

fixup! platform/test: use nm-test-utils.h in test-common
fixup! core: add compatibility wrapper for g_test_skip() to nm-glib-compat.h

Comment 10 Dan Williams 2014-04-23 21:15:58 UTC

> platform: add nm_platform_link_get()

Maybe code-doc for this function?  It took me a minute to figure out why it wasn't just returning NMPlatformLink*.  Which is because the Linux platform doesn't keep NMPlatformLink structure around, but creates them on-the-fly when they are required....

> nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG

What about simplifying it with g_option_context_parse()?  ie, something like:

char **tst_argv;
int tst_argc;
<define option context>

nmtst_debug = getenv ("NMTST_DEBUG");
if (nmtst_debug) {
    is_debug = TRUE;
    tst_args = g_strsplit_set (nmtst_debug, " ;\t", -1);
    tst_argc = g_strv_length ();

    g_option_context_get_ignore_unknown_options (ctx, TRUE);
    if (!g_option_context_parse (ctx, &tst_argc, &tst_argv, &error)) {
        <error becuase some known argument doesn't match the type>
    }

    g_strfreev (tst_args);
}

/* Handle command-line options like --debug */
g_option_context_parse (ctx, &argc, &argv, &error);


More review bits to come...

Comment 11 Thomas Haller 2014-04-24 10:35:04 UTC

I repushed the branch and added several fixup! commits.
This is non-ff, but I did not change the previous commits (except reordering
the new fixup! commits)

new are:
561d1d3 fixup! platform: add nm_platform_link_get()
51cfd7d fixup! nmtst: add nmtst_exec_sudo() function
fd21ddf fixup! nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
2f05914 fixup! nmtst: add parameter to nmtst_init to setup nm-logging



(In reply to comment #10)
> > platform: add nm_platform_link_get()
> 
> Maybe code-doc for this function?  It took me a minute to figure out why it
> wasn't just returning NMPlatformLink*.  Which is because the Linux platform
> doesn't keep NMPlatformLink structure around, but creates them on-the-fly when
> they are required....

Done.


> > nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
> 
> What about simplifying it with g_option_context_parse()?  ie, something like:
> 
> char **tst_argv;
> int tst_argc;
> <define option context>
> 
> nmtst_debug = getenv ("NMTST_DEBUG");
> if (nmtst_debug) {
>     is_debug = TRUE;
>     tst_args = g_strsplit_set (nmtst_debug, " ;\t", -1);

This does not allow escaping with \. The previous version had a bug with escaping, I fixed it and extracted the splitting of the string to nmtst_str_split().


>     tst_argc = g_strv_length ();
> 
>     g_option_context_get_ignore_unknown_options (ctx, TRUE);
>     if (!g_option_context_parse (ctx, &tst_argc, &tst_argv, &error)) {
>         <error becuase some known argument doesn't match the type>
>     }
> 
>     g_strfreev (tst_args);
> }
> 
> /* Handle command-line options like --debug */
> g_option_context_parse (ctx, &argc, &argv, &error);

Can g_object_context_parser parse options without leading dashes?
For the environment variable it should be:

NMTST_DEBUG='no-debug,log-level=DEBUG,log-domains=ALL'
and not
NMTST_DEBUG='--no-debug,--log-level=DEBUG,--log-domains=ALL'

It would be great to use the same g_object_context to parse NMTST_DEBUG and the command line, but I don't think that is possible(?). Personally, I don't care about command line options too much.

Also, how to do the toggling with debug/no-debug?

Comment 12 Pavel Simerda 2014-04-24 19:26:00 UTC

(In reply to comment #11)
> It would be great to use the same g_object_context to parse NMTST_DEBUG and the
> command line, but I don't think that is possible(?). Personally, I don't care
> about command line options too much.

If you're still keeping them because I wanted to keep them and if it makes any difficulty, feel free to drop them. I'll just learn the new way.

Comment 13 Thomas Haller 2014-04-24 19:43:51 UTC

I was looking at branch th/nm-logging-glib (bug 728178), which has different ideas about logging.

So I incorporated what was useful from that other bug. The rest does not go together with the changes from danw/error2 and this branch. The other bug is therefore obsoleted.


The previous version of nmtst_init() also opened syslog, which is wrong for tests.

Rebased on master and rebushed.

New commits:

b846f7f core/logging: only log messages from glog, if LOGD_GLIB is enabled
4579177 nmtst: add new mode where tests want to test against logged messages (assert_messages)
615d1fb core/logging: add new logging domain "GLIB" for glib logging
8b303f5 nmtst: call nm_utils_init() in nmtst_init()
c86078c core/logging: when comparing glib log levels, ignore non-level flags



ok, I stop reworking this branch. really!! :)

Comment 14 Dan Williams 2014-04-24 22:47:41 UTC

You're right, g_option_context does require --.  So ignore that :)

------

Also, and this isn't your problem, but I now get failures in test_read_wired_aliases_bad() because directory reads are not alphabetical in Unix, and so in my case:

	g_test_expect_message ("NetworkManager", G_LOG_LEVEL_WARNING,
	                       "*aliasem1:1*has no DEVICE*");
	g_test_expect_message ("NetworkManager", G_LOG_LEVEL_WARNING,
	                       "*aliasem1:2*has invalid DEVICE*");

my aliasem1:2 gets read before aliasem1:1, and thus the error messages are out-of-order, and the testcase fails.  I wonder how many more places this might be an issue at.

Comment 15 Dan Williams 2014-04-24 23:05:53 UTC

> platform: fix software device handling when announcing links

if (ifindex) {
   ...
}

g_hash_table_remove (priv->udev_devices, GINT_TO_POINTER (ifindex));


If feel like this should be:

if (ifindex <= 0)
    return;

If we don't have an ifindex, then we don't have an 'rtnllink', and if we don't have that, then we don't want to call link_is_announceable(NULL).

Yes, was_announceable *should* be FALSE because the first link_is_announceable() won't be called, but I think we shoudl be more explicit about this.

There will never be a case where something that has an invalid ifindex should get as far as the bottom half of this function.  (which also means we should enforce (ifindex > 0) in udev_device_added(), right?)

Comment 16 Dan Williams 2014-04-24 23:15:59 UTC

I looked into LOG_COMPILER, which could be used along with a standard wrapper script to run the necessary tests as root, *without* having to specify the wrapper script in NMTST_DEBUG.  My theory was that we could simply do NMTST_DEBUG="root-tests" and then the tests would be run as root, using LOG_COMPILER=$(abs_src_dir)/tools/sudo-wrapper.sh in the makefiles that have root-enabled tests.  I liked that better than having to specify the wrapper script, since I think running root tests should be as simple as possible.

Doing the sudo stuff with the built-in automake framework means we wouldn't add any code to NetworkManager for it, and thus nm-test-utils.h would be smaller and simpler.  That's my main argument here: why add code to NM to do something that is already done in utilities we use.

To do this, we would:

1) ensure all tests were put into TESTS where possible
2) set LOG_COMPILER=$(abs_src_dir)/tools/sudo-wrapper.sh in each makefile that required root tests
3) make sudo-wrapper.sh look in NMTST_DEBUG for "root-tests" and if set, run "sudo $@" and do any magic libtool junk required
4) remove the serial-tests bits from configure.ac

Then to run root-enabled tests, we would simply do this:

NMTST_DEBUG=root-tests make check

But use of LOG_COMPILER requires using parallel-tests in automake, and we explicitly require serial tests in configure.ac, presumably because the root tests that touch virtual interfaces cannot be run simultaneously.  But we should ensure the tests *can* be run in parallel anyway, to ensure they don't end up with stale information from each other.

Thoughts?

Comment 17 Pavel Simerda 2014-04-25 06:52:55 UTC

(In reply to comment #15)
> If we don't have an ifindex, then we don't have an 'rtnllink', and if we don't
> have that, then we don't want to call link_is_announceable(NULL).

We can receive a udev device before the rtnllink. In that case we first gather the udev information and then wait for rtnllink stuff. The basic idea is that link is announceable when both udev and rtnllink are there but they can come and leave in any order.

Comment 18 Pavel Simerda 2014-04-25 07:06:26 UTC

(In reply to comment #16)
> I looked into LOG_COMPILER, which could be used along with a standard wrapper
> script to run the necessary tests as root, *without* having to specify the
> wrapper script in NMTST_DEBUG.

Wasn't NMTST_DEBUG there to specify the debug levels and stuff like that? Overloading it for all sorts of settings doesn't seem right here. Also, the debugging could be easily handled by NM_DEBUG that would work for NetworkManager as well instead.

> LOG_COMPILER=$(abs_src_dir)/tools/sudo-wrapper.sh in the makefiles that have
> root-enabled tests.

The 'src/platform/tests/Makefile.am' includes both fake tests and root tests and they are compiled from the same sources to maintain equivalence between nm-linux-platform and nm-fake-platform.

If I understand you correctly, you would have to create a separate 'src/platform/tests/root/Makefile.am' referencing the sources from the above directory, or even better 'src/platform/root-tests/Makefile.am' to keep the directory depth relative to the 'src/platform' directory.

> I liked that better than having to specify the wrapper
> script, since I think running root tests should be as simple as possible.

Also please be careful when working with wrapper script so that *valgrind* and *coverity* continue to work.

> But use of LOG_COMPILER requires using parallel-tests in automake, and we
> explicitly require serial tests in configure.ac, presumably because the root
> tests that touch virtual interfaces cannot be run simultaneously.

The current versions of the tests could theoretically run simultaneously if each of them used different interface names. They're already selecting only the information they actually need, I see no other reason they couldn't be run in parallel.

> But we
> should ensure the tests *can* be run in parallel anyway, to ensure they don't
> end up with stale information from each other.

There's no such problem of ending up with stale information from each other. It's solved as the side effect of clearing out any stale configuration from the last run of the same test before performing any other actions as well as before creating any SignalData structures belonging to the tests.

Comment 19 Thomas Haller 2014-04-25 10:15:37 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > It would be great to use the same g_object_context to parse NMTST_DEBUG and the
> > command line, but I don't think that is possible(?). Personally, I don't care
> > about command line options too much.
> 
> If you're still keeping them because I wanted to keep them and if it makes any
> difficulty, feel free to drop them. I'll just learn the new way.

No, it's fine, if "somebody" deems it useful, it should be there. I just prefer environment variables, so I would add options preferably to NMTST_DEBUG, but not add them as comment line arguments too -- unless "somebody" thinks it's a good idea.

IOW, I don't require NMTST_DEBUG and command line to be idempotent.

Comment 20 Thomas Haller 2014-04-25 10:25:50 UTC

TODO items:



before I got an error, that I cannot reproduce ATM:

NMTST_DEBUG=no-debug,sudo-cmd=$PWD/tools/test-sudo-wrapper.sh make -C src/platform/tests/ check

/link/external:
(/home/data/src/NetworkManager/src/platform/tests/.libs/lt-test-link-linux:13686): NetworkManager-ERROR **: Attempted to free received but not accepted signal 'link-changed'.
FAIL: test-link-linux




Also note, that whenever I run the root tests, NM will assume a new connection on my em1... that is certainly wrong too...

Comment 21 Thomas Haller 2014-04-25 10:26:50 UTC

(In reply to comment #15)
> > platform: fix software device handling when announcing links
> 
> if (ifindex) {
>    ...
> }
> 
> g_hash_table_remove (priv->udev_devices, GINT_TO_POINTER (ifindex));
> 
> 
> If feel like this should be:
> 
> if (ifindex <= 0)
>     return;
> 

I agree. Pushed fixup! commit.

Comment 22 Thomas Haller 2014-04-25 14:29:23 UTC

(In reply to comment #14)
>
> Also, and this isn't your problem, but I now get failures in
> test_read_wired_aliases_bad() because directory reads are not alphabetical in
> Unix, and so in my case:
> 
>     g_test_expect_message ("NetworkManager", G_LOG_LEVEL_WARNING,
>                            "*aliasem1:1*has no DEVICE*");
>     g_test_expect_message ("NetworkManager", G_LOG_LEVEL_WARNING,
>                            "*aliasem1:2*has invalid DEVICE*");
> 
> my aliasem1:2 gets read before aliasem1:1, and thus the error messages are
> out-of-order, and the testcase fails.  I wonder how many more places this might
> be an issue at.

For the record: danw opened bug 728954

Comment 23 Dan Williams 2014-04-25 14:38:37 UTC

(In reply to comment #18)
> (In reply to comment #16)
> > I looked into LOG_COMPILER, which could be used along with a standard wrapper
> > script to run the necessary tests as root, *without* having to specify the
> > wrapper script in NMTST_DEBUG.
> 
> Wasn't NMTST_DEBUG there to specify the debug levels and stuff like that?
> Overloading it for all sorts of settings doesn't seem right here. Also, the
> debugging could be easily handled by NM_DEBUG that would work for
> NetworkManager as well instead.

It was originally for that, and Thomas added "sudo-cmd" to it to allow running some tests as root with sudo instead of running 'sudo make check', which could potentially rebuild some sources as root.  Ideally, only the *tests* get run with 'sudo', but all the other stuff happens as the normal user.

> > LOG_COMPILER=$(abs_src_dir)/tools/sudo-wrapper.sh in the makefiles that have
> > root-enabled tests.
> 
> The 'src/platform/tests/Makefile.am' includes both fake tests and root tests
> and they are compiled from the same sources to maintain equivalence between
> nm-linux-platform and nm-fake-platform.

Right, and that's exactly correct.

> If I understand you correctly, you would have to create a separate
> 'src/platform/tests/root/Makefile.am' referencing the sources from the above
> directory, or even better 'src/platform/root-tests/Makefile.am' to keep the
> directory depth relative to the 'src/platform' directory.

Not quite correct actually; using LOG_COMPILER as I describe would simply run all tests under the sudo wrapper script, which should be the same thing as Thomas' NMTST_DEBUG=sudo-cmd functionality, but using the built-in automake stuff instead of some custom code.

If you want to selectively run some tests as root, and other tests as !root, then that's pretty easy, we could do:

TESTS = test-link test-link.root
TEST_EXTENSIONS = .root
ROOT_LOG_COMPILER = $(abs_src_dir)/sudo-wrapper.sh

This would run "test-link" as a normal user, but run "test-link.root" under sudo-wrapper.sh which would run the test as root.  We could keep our root and !root tests in the same directory with the same sources and simply make the test runs different depending on the file extension.

https://www.gnu.org/software/automake/manual/html_node/Parallel-Test-Harness.html

> > I liked that better than having to specify the wrapper
> > script, since I think running root tests should be as simple as possible.
> 
> Also please be careful when working with wrapper script so that *valgrind* and
> *coverity* continue to work.

Yeah, I'd hope that would still work correctly, but I don't think that would would change between Thomas' "sudo-cmd" and the automake stuff, since they are both just using a wrapper script to do the sudo.

> > But use of LOG_COMPILER requires using parallel-tests in automake, and we
> > explicitly require serial tests in configure.ac, presumably because the root
> > tests that touch virtual interfaces cannot be run simultaneously.
> 
> The current versions of the tests could theoretically run simultaneously if
> each of them used different interface names. They're already selecting only the
> information they actually need, I see no other reason they couldn't be run in
> parallel.

Ah, excellent.  I think we should make sure we can run them in parallel anyway.

> > But we
> > should ensure the tests *can* be run in parallel anyway, to ensure they don't
> > end up with stale information from each other.
> 
> There's no such problem of ending up with stale information from each other.
> It's solved as the side effect of clearing out any stale configuration from the
> last run of the same test before performing any other actions as well as before
> creating any SignalData structures belonging to the tests.

I was thinking more about the root tests touching eg 'bond0' or some sysctl, and then if they didn't ensure to clean stuff out, the next test that touched 'bond0' would have the stale data.  As long as the tests only touch things that are independent (like different interface names) then we're fine, but that does require some hand-holding.

Comment 24 Thomas Haller 2014-04-25 14:58:24 UTC

hm. I still like "nmtst: add nmtst_exec_sudo() function". The patch is small (I might even make it smaller by removing nmtst_get_sudo_cmd(), and some streamlining).
But it's just that, parsing the sudo-cmd option and then calling nmtst_exec_sudo(). I think it's super slick.

It also works when running the binary directly without make-check.

With the LOG_COMPILER solution you still have to specify an environment variable, so the effect is similar.

With the current solution I could also call another wrapper script instead (maybe making some preparations, needed for valgrind, etc.)

Comment 25 Thomas Haller 2014-05-03 02:31:40 UTC

Rebased to current master and squashed all commits.

Previous version is now 'th/bgo706293_platform_tests-2' for reference

Comment 26 Dan Williams 2014-05-08 22:38:51 UTC

> nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG

It might be clearer to call g_array_set_clear_func (debug_messages, g_free);, then at the end you can just do g_array_unref (debug_messages) and everything will get freed for you.

> core/logging: add new logging domain "GLIB" for glib logging

I think this is fine for tests, but I'm not wild on exposing this to the D-Bus API.  Could we isolate it from the D-Bus API documentation at least?  I know it would be somewhat hard to hide it from the actual API itself.

Comment 27 Thomas Haller 2014-05-12 17:29:14 UTC

(In reply to comment #26)

Rebased to master. Added fixup commits.

> > nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
> 
> It might be clearer to call g_array_set_clear_func (debug_messages, g_free);,
> then at the end you can just do g_array_unref (debug_messages) and everything
> will get freed for you.

Done.


> > core/logging: add new logging domain "GLIB" for glib logging
> 
> I think this is fine for tests, but I'm not wild on exposing this to the D-Bus
> API.  Could we isolate it from the D-Bus API documentation at least?  I know it
> would be somewhat hard to hide it from the actual API itself.

I don't agree with that, because it's confusing to hide it. Why would we want that anyway?. Also, deprecating a logging domain is not hard, just update the documentation and remove all uses in logging statements. Also, we already accept unknown logging domains.

Anyway, I added a second fixup! commit (2b329aa), does that what you asked?

Comment 28 Thomas Haller 2014-05-12 17:40:37 UTC

(In reply to comment #27)
> (In reply to comment #26)
> 
> Rebased to master. Added fixup commits.
> 
> > > nmtst: add nmtst_is_debug() and interpret environment NMTST_DEBUG
> > 
> > It might be clearer to call g_array_set_clear_func (debug_messages, g_free);,
> > then at the end you can just do g_array_unref (debug_messages) and everything
> > will get freed for you.
> 
> Done.

I had to revert this, because g_array_set_clear_func() passes the pointer to the array element, so, it would need something like:

static void
free_indirect (gpointer data)
{
    g_free (*((char **)data));
}


...

    g_array_set_clear_func (debug_messages, free_indirect);
    g_array_unref (debug_messages);



I kept the previous version.

Comment 29 Dan Williams 2014-05-16 23:13:11 UTC

(In reply to comment #27)
> > > core/logging: add new logging domain "GLIB" for glib logging
> > 
> > I think this is fine for tests, but I'm not wild on exposing this to the D-Bus
> > API.  Could we isolate it from the D-Bus API documentation at least?  I know it
> > would be somewhat hard to hide it from the actual API itself.
> 
> I don't agree with that, because it's confusing to hide it. Why would we want
> that anyway?. Also, deprecating a logging domain is not hard, just update the
> documentation and remove all uses in logging statements. Also, we already
> accept unknown logging domains.

The main problem I have with the GLIB log patch is that the "GLIB" log domain isn't actually a logging domain at all.  It simply hides any logging (no matter what log domain the message is in!) that gets routed through glib's logging infrastructure.  That includes both glib-specific logging from glib itself, *and* NetworkManager log messages if syslog is not opened.

So if you turn off the GLIB logging domain in NM, and you're not using syslog, then NM will log nothing at all, despite the fact that you have enabled other logging domains.  That doesn't seem very consistent to me.

The second problem is that when running NetworkManager normally, *all* messages from GLib are interesting, because they indicate a programmer error or a critical system error.  There's no great reason for users to turn these messages off, because we want to see them for debugging.

So if the actual point (as I understand it) is to turn off GLib internal messages here so we don't screw up testcases, then lets just do that specifically instead of letting a test-only log domain leak out into our userspace API.

Something like:

static void
nm_log_handler (const gchar *log_domain,
                GLogLevelFlags level,
                const gchar *message,
                gpointer ignored)
{
    ...

    if (!(logging[nm_level] & LOGD_GLIB)) {
        if (   (log_domain && g_str_has_prefix (log_domain, "GLib"))
            || g_strcmp0 (log_domain, "GModule") == 0
            || g_strcmp0 (log_domain, "GThread") == 0)
            return;
    }

    ...
}

should do that.  But for the API considerations, we'd also ned to remove the log domain from the introspection XML, the nmcli examples manpage, bash completion, and the NM config file manpage too.

Comment 30 Thomas Haller 2014-05-19 12:01:19 UTC

(In reply to comment #29)
> (In reply to comment #27)
> > > > core/logging: add new logging domain "GLIB" for glib logging
> > > 
> > > I think this is fine for tests, but I'm not wild on exposing this to the D-Bus
> > > API.  Could we isolate it from the D-Bus API documentation at least?  I know it
> > > would be somewhat hard to hide it from the actual API itself.
> > 
> > I don't agree with that, because it's confusing to hide it. Why would we want
> > that anyway?. Also, deprecating a logging domain is not hard, just update the
> > documentation and remove all uses in logging statements. Also, we already
> > accept unknown logging domains.
> 
> The main problem I have with the GLIB log patch is that the "GLIB" log domain
> isn't actually a logging domain at all.  It simply hides any logging (no matter
> what log domain the message is in!) that gets routed through glib's logging
> infrastructure.  That includes both glib-specific logging from glib itself,
> *and* NetworkManager log messages if syslog is not opened.
> 
> So if you turn off the GLIB logging domain in NM, and you're not using syslog,
> then NM will log nothing at all, despite the fact that you have enabled other
> logging domains.  That doesn't seem very consistent to me.
> 
> The second problem is that when running NetworkManager normally, *all* messages
> from GLib are interesting, because they indicate a programmer error or a
> critical system error.  There's no great reason for users to turn these
> messages off, because we want to see them for debugging.
> 
> So if the actual point (as I understand it) is to turn off GLib internal
> messages here so we don't screw up testcases, then lets just do that
> specifically instead of letting a test-only log domain leak out into our
> userspace API.

The point is consistency, as we redirect glib logging through nm-logging we should apply the same logging principles.
Yes, a glib-domain is not the same as nm-domains, hence the new "GLIB" domain. But the priorities pretty much correspond.


By default glib does not even log g_debug() messages, unless you specify G_MESSAGES_DEBUG. So, with the change, you need to specify G_MESSAGES_DEBUG (a) and now additionally [GLIB:]DEBUG (b) -- normal users don't do that, and in heavy debugging modus this adds the requirement to enable [GLIB:]DEBUG
-- which seems not too much to ask, for example I never even set G_MESSAGES_DEBUG, but I always have DEBUG-nm-logging.


> The second problem is that when running NetworkManager normally, *all* 
> messages
> from GLib are interesting, because they indicate a programmer error or a
> critical system error.  There's no great reason for users to turn these
> messages off, because we want to see them for debugging.

If you have INFO-nm-logging enabled (which is default), you will continue to see g_messages(). So, users by default will still see g_message, g_warn, g_error. And they will continue not to see g_debug().
If they even disable INFO logging, the user obviously does not want to debug anything. For example, if somebody reports a problem with a logfile without INFO, it's acceptable to ask reproducing with more verbose logging.




An additional idea would be discriminate glib-log-domains more:

- LOGD_GLIB_BASE (which filters something like:
        if (   (log_domain && g_str_has_prefix (log_domain, "GLib"))
            || g_strcmp0 (log_domain, "GModule") == 0
            || g_strcmp0 (log_domain, "GThread") == 0)
- LOGD_GLIB_NM (the rest)

- LOGD_GLIB = alias for LOGD_GLIB_BASE | LOGD_GLIB_NM

This is overkill for now. however, we can add it later without breaking backward-compatibility (by transforming the current "GLIB" domain into an alias for LOGD_GLIB_*).

Comment 31 Thomas Haller 2014-05-19 12:16:51 UTC

(In reply to comment #29)

> So if the actual point (as I understand it) is to turn off GLib internal
> messages here so we don't screw up testcases, then lets just do that
> specifically instead of letting a test-only log domain leak out into our
> userspace API.

Partly. As I said one point is "consistent handling of logging".
Regarding tests, this only affects tests under /src/*. There the point is not to "avoid-screw-up", but "quiet-tests".

nmtst brings two modes, with/without assert_logging.

in assert_logging, the users must configure any log-level himself and he must explicitly assert against any g_message logging. In such a case, the log message does not even reach the glib handler -- unless the assertion fails, in which case you probably should rerun with "NMTST_DEBUG=debug,no-expect-message" anyway.


The other mode ("normal tests", or with "no-expect-message"), we don't assert at all against glib messages. But with the change we still can control the verbosity of the tests.

With this change, glib messages are also silenced during:

  NMTST_DEBUG=no-debug make -C ./src/ check 

(which is the default and similar to plain `make -C ./src/ check`)

Comment 32 Thomas Haller 2014-05-27 17:20:40 UTC

>> platform: improve tracking of route sources

in src/platform/tests/test-route.c, there are several no_error() calls at the end of the line. This is not introduced by your patch, but could you move them to a separate line?




Looking at source_to_rtprot()/source_to_rtprot(), I think we could encode all or NMPlatformSource types to internal numbers. E.g. also NM_PLATFORM_SOURCE_VPN.

include/uapi/linux/rtnetlink.h says:
 /* Values of protocol >= RTPROT_STATIC are not interpreted by kernel;
    they are just passed from user and back as is.
    It will be used by hypothetical multiple routing daemons.
    Note that protocol values should be standardized in order to
    avoid conflicts.
 */



+    switch (rtprot) {
+    case RTPROT_UNSPEC:
+         return NM_PLATFORM_SOURCE_UNKNOWN;
+    case RTPROT_REDIRECT:
+    case RTPROT_KERNEL:
+    case RTPROT_BOOT:
+         return NM_PLATFORM_SOURCE_KERNEL;
+    case RTPROT_RA:
+         return NM_PLATFORM_SOURCE_RDISC;
+    case RTPROT_DHCP:
+         return NM_PLATFORM_SOURCE_DHCP;
+
+    default:
+         return NM_PLATFORM_SOURCE_USER;
+    }

when you manually add a route with iproute2, it will set the protocol to RTPRO_BOOT (unless you specify something else). So, this function maps RTPROT_BOOT to NM_PLATFORM_SOURCE_KERNEL and "default" to NM_PLATFORM_SOURCE_USER. I am not saying that this is wrong, just pointing out.

Or how about:

    switch (rtprot) {
    case RTPROT_REDIRECT:
    case RTPROT_KERNEL:
    case RTPROT_BOOT:
         return NM_PLATFORM_SOURCE_KERNEL;
    case RTPROT_RA:
         return NM_PLATFORM_SOURCE_RDISC;
    case RTPROT_DHCP:
         return NM_PLATFORM_SOURCE_DHCP;
    case RTPROT_STATIC:
         return NM_PLATFORM_SOURCE_USER;
    case RTPROT_UNSPEC:
    default:
         return NM_PLATFORM_SOURCE_UNKNOWN;
    }


+    if (source != NM_PLATFORM_SOURCE_UNKNOWN)
+         rtnl_route_set_protocol (rtnlroute, source_to_rtprot (source));

if we don't set it, it will default to RTPROT_STATIC anyway... is that really what we want? How about RTPROT_UNSPEC?

Also, in the kernel (fib_table_delete()) it seem to me then when deleting a IPv4 route, the protocol must correspond:
                    (!cfg->fc_protocol ||
                     fi->fib_protocol == cfg->fc_protocol) &&
maybe, in case of delete, we must set the protocol to zero?
If that is the case, I think http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=019bf7512d0c8ece3e9051f609f10ecc1df7223a could have added a regression -- by not correctly setting the protocol to zero.

Comment 33 Thomas Haller 2014-05-27 17:21:49 UTC

(In reply to comment #32)

sorry, wrong bug :$

please ignore

Comment 34 Dan Williams 2014-05-28 23:27:28 UTC

(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #27)
> > > > > core/logging: add new logging domain "GLIB" for glib logging
> > > > 
> > > > I think this is fine for tests, but I'm not wild on exposing this to the D-Bus
> > > > API.  Could we isolate it from the D-Bus API documentation at least?  I know it
> > > > would be somewhat hard to hide it from the actual API itself.
> > > 
> > > I don't agree with that, because it's confusing to hide it. Why would we want
> > > that anyway?. Also, deprecating a logging domain is not hard, just update the
> > > documentation and remove all uses in logging statements. Also, we already
> > > accept unknown logging domains.
> > 
> > The main problem I have with the GLIB log patch is that the "GLIB" log domain
> > isn't actually a logging domain at all.  It simply hides any logging (no matter
> > what log domain the message is in!) that gets routed through glib's logging
> > infrastructure.  That includes both glib-specific logging from glib itself,
> > *and* NetworkManager log messages if syslog is not opened.
> > 
> > So if you turn off the GLIB logging domain in NM, and you're not using syslog,
> > then NM will log nothing at all, despite the fact that you have enabled other
> > logging domains.  That doesn't seem very consistent to me.
> > 
> > The second problem is that when running NetworkManager normally, *all* messages
> > from GLib are interesting, because they indicate a programmer error or a
> > critical system error.  There's no great reason for users to turn these
> > messages off, because we want to see them for debugging.
> > 
> > So if the actual point (as I understand it) is to turn off GLib internal
> > messages here so we don't screw up testcases, then lets just do that
> > specifically instead of letting a test-only log domain leak out into our
> > userspace API.
> 
> The point is consistency, as we redirect glib logging through nm-logging we
> should apply the same logging principles.
> Yes, a glib-domain is not the same as nm-domains, hence the new "GLIB" domain.
> But the priorities pretty much correspond.

Do we all agree that the reason for a GLIB logging domain is to "specifically isolate glib-originated log messages" so that we can turn them on/off at will?

If that's the case, then lets certainly do that, but lets make sure this doesn't have odd side-effects as a result of NetworkManager using some glib logging infrastructure.

> By default glib does not even log g_debug() messages, unless you specify
> G_MESSAGES_DEBUG. So, with the change, you need to specify G_MESSAGES_DEBUG (a)
> and now additionally [GLIB:]DEBUG (b) -- normal users don't do that, and in
> heavy debugging modus this adds the requirement to enable [GLIB:]DEBUG
> -- which seems not too much to ask, for example I never even set
> G_MESSAGES_DEBUG, but I always have DEBUG-nm-logging.
> 
> 
> > The second problem is that when running NetworkManager normally, *all* 
> > messages
> > from GLib are interesting, because they indicate a programmer error or a
> > critical system error.  There's no great reason for users to turn these
> > messages off, because we want to see them for debugging.
> 
> If you have INFO-nm-logging enabled (which is default), you will continue to
> see g_messages(). So, users by default will still see g_message, g_warn,
> g_error. And they will continue not to see g_debug().
> If they even disable INFO logging, the user obviously does not want to debug
> anything. For example, if somebody reports a problem with a logfile without
> INFO, it's acceptable to ask reproducing with more verbose logging.

That is true.

I don't particularly like leaking the base library logging details out to clients that I don't think will ever need them, but I don't really want to hold this branch up for that, especially since it's not core API.

So, I would simply ask that:

1) we ensure that the new GLIB logging domain *only* isolate glib messages, and ensure that it doesn't suppress all NM logging too, as I mentioned above for the case where syslog is not opened.  GLIB should *only* affect glib-originated  messages

2) ideally, we ensure that our NM logging levels also gate the GLIB levels.  So if you set WARN, you don't see any glib-originated g_message() statements, but you still see ERROR statements.  etc.  I haven't specifically checked if the patch does this, so forgive me if it's already implemented.

> An additional idea would be discriminate glib-log-domains more:
> 
> - LOGD_GLIB_BASE (which filters something like:
>         if (   (log_domain && g_str_has_prefix (log_domain, "GLib"))
>             || g_strcmp0 (log_domain, "GModule") == 0
>             || g_strcmp0 (log_domain, "GThread") == 0)
> - LOGD_GLIB_NM (the rest)
> 
> - LOGD_GLIB = alias for LOGD_GLIB_BASE | LOGD_GLIB_NM
> 
> This is overkill for now. however, we can add it later without breaking
> backward-compatibility (by transforming the current "GLIB" domain into an alias
> for LOGD_GLIB_*).

I think you're right, overkill for now.

Comment 35 Thomas Haller 2014-06-03 10:29:48 UTC

(In reply to comment #34)

> Do we all agree that the reason for a GLIB logging domain is to "specifically
> isolate glib-originated log messages" so that we can turn them on/off at will?

Not really. The reason is to apply the same logging principles to messages that come via the glib-logging mechanism g_log() -- yes, which means "turning them on/off" via our log-levels.

But not only to the ones that really originate from glib. Apply the same principles to all messages, regardless of the component. Discriminating the glib-log-domain, doesn't make much sense, because it doesn't correspond well with our nm-log-domains. 


> If that's the case, then lets certainly do that, but lets make sure this
> doesn't have odd side-effects as a result of NetworkManager using some glib
> logging infrastructure.

The only side effect is, that we might suppress some log messages.

-- and as detailed above, g_debug() messages are already now suppressed by default (see G_MESSAGES_DEBUG). And later we will only actively suppress something, if the user sets the level to WARN or higher -- which already indicates that he doesn't care about log messages (because a WARN-only log already significantly hinders debugging).


> > By default glib does not even log g_debug() messages, unless you specify
> > G_MESSAGES_DEBUG. So, with the change, you need to specify G_MESSAGES_DEBUG (a)
> > and now additionally [GLIB:]DEBUG (b) -- normal users don't do that, and in
> > heavy debugging modus this adds the requirement to enable [GLIB:]DEBUG
> > -- which seems not too much to ask, for example I never even set
> > G_MESSAGES_DEBUG, but I always have DEBUG-nm-logging.
> > 
> > 
> > > The second problem is that when running NetworkManager normally, *all* 
> > > messages
> > > from GLib are interesting, because they indicate a programmer error or a
> > > critical system error.  There's no great reason for users to turn these
> > > messages off, because we want to see them for debugging.
> > 
> > If you have INFO-nm-logging enabled (which is default), you will continue to
> > see g_messages(). So, users by default will still see g_message, g_warn,
> > g_error. And they will continue not to see g_debug().
> > If they even disable INFO logging, the user obviously does not want to debug
> > anything. For example, if somebody reports a problem with a logfile without
> > INFO, it's acceptable to ask reproducing with more verbose logging.
> 
> That is true.
> 
> I don't particularly like leaking the base library logging details out to
> clients that I don't think will ever need them, but I don't really want to hold
> this branch up for that, especially since it's not core API.

Exposing a log-domain and (possibly) deprecating/retiring it later is rather light weight. Already now, unknown domains are ignored.


> So, I would simply ask that:
> 
> 1) we ensure that the new GLIB logging domain *only* isolate glib messages, and
> ensure that it doesn't suppress all NM logging too, as I mentioned above for
> the case where syslog is not opened.  GLIB should *only* affect glib-originated
>  messages

That is where I disagree --

because the glib logging domain does not really tell us whether we are interested in a message or not.
E.g. if you pass an invalid destination to g_strlcpy() from NM, you want to see the g_return_* (although it originates from glib internally).

and because of 2), the change won't hardly affect current use cases. So, no need to treat it any differently. But if we want that, then the via something like LOGD_GLIB, LOGD_GLIB_BASE, LOGD_GLIB_NM, LOGD_GLIB_xyz.



> 2) ideally, we ensure that our NM logging levels also gate the GLIB levels.  So
> if you set WARN, you don't see any glib-originated g_message() statements, but
> you still see ERROR statements.  etc.  I haven't specifically checked if the
> patch does this, so forgive me if it's already implemented.

Sure, that is what I detailed in comment 30.

Comment 36 Thomas Haller 2014-06-03 11:17:37 UTC

Rebased branch on master.


Added one new commit:

>> core/logging: set G_MESSAGES_DEBUG depending on GLIB:DEBUG

and one new fixup!



I still would like to drop the fixup that removes GLIB from introspection/nm-manager.xml.

Comment 37 Thomas Haller 2014-06-03 14:20:16 UTC

Pushed yet another commit (and rebased everything), which fixes platform-tests on RHEL-7.

>> platform: downgrade logging error for delete_ip6_address()

Comment 38 Thomas Haller 2014-06-03 14:43:18 UTC

(In reply to comment #37)
> Pushed yet another commit (and rebased everything), which fixes platform-tests
> on RHEL-7.
> 
> >> platform: downgrade logging error for delete_ip6_address()

was wrong subject line. Should be:

>> platform: improve logging on error in delete_object()
>> platform: downgrade logging error for delete_object() for IPv6 address

Comment 39 Dan Williams 2014-06-04 04:24:00 UTC

(In reply to comment #35)
> (In reply to comment #34)
> 
> > Do we all agree that the reason for a GLIB logging domain is to "specifically
> > isolate glib-originated log messages" so that we can turn them on/off at will?
> 
> Not really. The reason is to apply the same logging principles to messages that
> come via the glib-logging mechanism g_log() -- yes, which means "turning them
> on/off" via our log-levels.
> 
> But not only to the ones that really originate from glib. Apply the same
> principles to all messages, regardless of the component. Discriminating the
> glib-log-domain, doesn't make much sense, because it doesn't correspond well
> with our nm-log-domains. 

Ok, I think this is the source of our misunderstanding.  But I'm still not clear exactly what messages are controlled by the GLIB logging domain.  Can you give me an example of which messages are controlled by the domain?

From what I can tell, if syslog is not opened, then LOGD_GLIB affects *all* log messages that NM might print, including ones from nm_log().  This is due to _nm_log() calling g_log() when syslog is not open.  Thus forgetting to add LOGD_GLIB to the log domains will suppress all messages.  Correct?

If that is correct, then doesn't LOGD_GLIB duplicate the functionality of all existing NetworkManager log domains?  If a testcase wants to turn off all log messages (which in this branch they would do by leaving LOGD_GLIB out of the logging domains), then why not just suppress all log messages with a new logging level called SILENT?

Comment 40 Thomas Haller 2014-06-04 07:40:25 UTC

(In reply to comment #39)
> (In reply to comment #35)
> > (In reply to comment #34)
> > 
> > > Do we all agree that the reason for a GLIB logging domain is to "specifically
> > > isolate glib-originated log messages" so that we can turn them on/off at will?
> > 
> > Not really. The reason is to apply the same logging principles to messages that
> > come via the glib-logging mechanism g_log() -- yes, which means "turning them
> > on/off" via our log-levels.
> > 
> > But not only to the ones that really originate from glib. Apply the same
> > principles to all messages, regardless of the component. Discriminating the
> > glib-log-domain, doesn't make much sense, because it doesn't correspond well
> > with our nm-log-domains. 
> 
> Ok, I think this is the source of our misunderstanding.  But I'm still not
> clear exactly what messages are controlled by the GLIB logging domain.  Can you
> give me an example of which messages are controlled by the domain?
>

To clarify:

1.) Whenever you log something with glib directives (g_warn*, g_return*, g_message, g_debug, g_error, etc.), it ends up in the g_log() function. So, we are talking about messages that eventually end up in g_log().

2.) nm-logging is only part of NM-Core. So no other component (libnm-util, nmcli, etc) can use nm-logging functions directly. Instead they would use g_log() for logging messages of all kinds.

3.) Together with 2.), unless your application also links nm-logging.o, nm-logging is not involved at all. This includes nmcli, unit-tests outside of :/src/, some unit-tests inside of :/src/, libnm-util linked by not-NM-core, etc...
On the other hand, glib and libnm-util linking with NM-core, is affected (see below).

4.) If you link nm-logging.o, unless you call nm_logging_syslog_openlog(), g_log() messages are not affected at all and are not routed to nm-logging. On the contrary, this mode, all messages logged via _nm_log() are forwarded to g_log() instead -- provided that they are enabled depending on their own nm-log-level and nm-log-domain.
In this mode, the LOGD_GLIB is completely irrelevant because no message are routed from g_log() to nm-logging.
This is mainly the case for unit-tests inside :/src/. So, I was wrong before, LOGD_GLIB will not silence any unit-tests (unless they call nm_logging_syslog_openlog(), which they currently do not).

5.) Currently, only NM's main() function calls nm_logging_syslog_openlog(). In this mode nm_logging_syslog_openlog() will install a glog handler so that all messages from g_log() are redirected to nm-logging. Only in this case LOGD_GLIB has effect and it suppresses *all* g_log() messages depending on the glog-log-level.




> From what I can tell, if syslog is not opened, then LOGD_GLIB affects *all* log
> messages that NM might print, including ones from nm_log().  This is due to
> _nm_log() calling g_log() when syslog is not open.  Thus forgetting to add
> LOGD_GLIB to the log domains will suppress all messages.  Correct?

No, see 4.)


> If that is correct, then doesn't LOGD_GLIB duplicate the functionality of all
> existing NetworkManager log domains?  If a testcase wants to turn off all log
> messages (which in this branch they would do by leaving LOGD_GLIB out of the
> logging domains), then why not just suppress all log messages with a new
> logging level called SILENT?

Comment 41 Thomas Haller 2014-06-04 07:54:12 UTC

Pushed another fixup.


I suggest 

$ git log --graph --pretty=format:'%Cred%h%Creset - %Cgreen(%ai)%Creset [%C(yellow)%an%Creset] %s%C(yellow)%d%Creset' --abbrev-commit --date=local origin/master..origin/th/bgo706293_platform_tests | grep -C 30 '(2014-06-\|(2014-05-[23].'

to keep track which patches are new (because I did not modify old commits, beyond rebasing).

Comment 42 Thomas Haller 2014-06-04 09:03:02 UTC

Ok, 

I merged two not-so-related, uncontroversial commits to master:

3b3e33f8ec73fa45c9e050f7a96cfd3a9046aff7 core/logging: initialize level_names array with explicit array initilizer (trivial)
6b5fb892e195e7aaab2828b3826e30a7bcc3b9eb core/logging: when comparing glib log levels, ignore non-level flags.







Since the GLIB log domain is unrelated to the tests (and only affects NM-core), I move them to a separate branch "th/bgo706293_nm_log_glib".

Also, I just realized, that G_MESSAGES_DEBUG is not relevant when installing our own glib-handler. So, the patch "core/logging: set G_MESSAGES_DEBUG depending on GLIB:DEBUG" is not useful and got dropped entirely.

Also, I reworded the commit message of "core/logging: add new logging domain "GLIB" for glib logging".





The rest of th/bgo706293_platform_tests is rebased to. No relevant changes.

Comment 43 Jiri Klimes 2014-06-04 13:35:50 UTC

I have pushed a fixup to fix compilation with glib.32 (g_test_initialized is in 2.36).

Running 'make check':
/link/software/team: **
NetworkManager:ERROR:test-link.c:186:test_slave: assertion failed: (!nm_platform_link_is_connected (master))
/bin/sh: line 5: 18147 Aborted                 ${dir}$tst
FAIL: test-link-linux

(/home/jklimes/work/NetworkManager/src/platform/tests/.libs/lt-test-address-linux:18168): NetworkManager-WARNING **: (../nm-platform.c:383):nm_platform_link_get_all: runtime check failed: (g_hash_table_contains (unseen, GINT_TO_POINTER (item->master)))
/bin/sh: line 5: 18168 Trace/breakpoint trap   ${dir}$tst
FAIL: test-address-linux

> platform: downgrade logging error for delete_object() for IPv6 address
"downgrad" -> "downgrade" in the comment. Anyway I would not use "downgrade", but just say we want print a debug message.
I usually use "()" when logging function names
  debug("delete_object() for address...

Comment 44 Dan Williams 2014-06-04 15:43:53 UTC

I did another quick review of the branch after squashing the fixups, and I'm happy with the current state, though obviously if there are some bugs like Jirka says we should fix those.  We can discuss the LOGD_GLIB stuff independently.

Comment 45 Thomas Haller 2014-06-04 17:57:40 UTC

(In reply to comment #43)
> I have pushed a fixup to fix compilation with glib.32 (g_test_initialized is in
> 2.36).

Thanks


> Running 'make check':
> /link/software/team: **
> NetworkManager:ERROR:test-link.c:186:test_slave: assertion failed:
> (!nm_platform_link_is_connected (master))
> /bin/sh: line 5: 18147 Aborted                 ${dir}$tst
> FAIL: test-link-linux
> 
> (/home/jklimes/work/NetworkManager/src/platform/tests/.libs/lt-test-address-linux:18168):
> NetworkManager-WARNING **: (../nm-platform.c:383):nm_platform_link_get_all:
> runtime check failed: (g_hash_table_contains (unseen, GINT_TO_POINTER
> (item->master)))
> /bin/sh: line 5: 18168 Trace/breakpoint trap   ${dir}$tst
> FAIL: test-address-linux

Interesting, I didn't experience this one. Will look into this tomorrow.


> > platform: downgrade logging error for delete_object() for IPv6 address
> "downgrad" -> "downgrade" in the comment. Anyway I would not use "downgrade",
> but just say we want print a debug message.
> I usually use "()" when logging function names
>   debug("delete_object() for address...

Did not adjust the "()", because that is already now. Would be a separate commit (and for now I just want to make it behave identical). 
Addressed the rest.

Comment 46 Dan Winship 2014-06-04 20:40:20 UTC

I'd squash these two together:

    nmtst: move header file to include/ directory
    nmtst: combine files nm-test-helpers.h and nm-test-utils.h



> nmtst: add nmtst_exec_sudo() function

Maybe "nmtest_reexec_sudo()" ? With just "exec" it sounds like it takes a program name, like execv()...



> nmtst: call nm_utils_init() in nmtst_init()

>+#ifdef NM_UTILS_H
>+	if (!nm_utils_init (&error))

any test program is going to need nm-utils. You can just include it from nm-test-utils.h

>+	g_assert (!error);

not clear why this is where it is... the nm_utils_init() call is the only place error could get set.



> platform/test: fix fake platform to simulate synchronous signals

Would it be better to make nm-linux-platform emit the signals asynchronously, rather than making nm-fake-platform emit them synchronously? If we guarantee that newly-added links are emitted synchronously from nm_platform_link_add(), then that means we're potentially committing ourselves to blocking on slow kernel/helper-binary operations.

(Of course we could just change it later if it becomes a problem...)

Also, the commit summary is wrong: it's not "simulating" synchronous signals, it's actually implementing them. "fix fake platform to emit signals synchronously".

> When adding a link, the platform implementation raises the link-added
> signal synchronously. Fix the fake platform to behave identical and also
> fix all the tests.

(1) "the platform implementation" should be "the Linux platform implementation", (2) it's the "link-changed" signal now, (3) "identically"

(The comment in the docs of nm_platform_link_add() also still references "link-added". There are probably other references to the old signal names too...)



> nmtst: add new mode where tests want to test against logged messages (assert_messages)

This is the behavior that all tests should eventually have. I don't think they should have to opt in to it. If anything, tests that don't want this behavior (yet) should have to opt out.

The test infrastructure should just always use g_print() or g_printerr() for its own messages so as to not interfere. (This is what gtestutils does.)



> nmtst: turn on g_debug() messages in nmtst_init() when debug logging is enabled

The comment about g_test_init() calling g_log_set_always_fatal() doesn't seem to go with this commit...



> nmtst: add NMTST_DEBUG=no-expect-message for debugging assert_message tests

this commit also adds nmtst use to several tests, which seems to be unrelated...

Comment 47 Thomas Haller 2014-06-04 22:02:15 UTC

(In reply to comment #46)
> I'd squash these two together:
> 
>     nmtst: move header file to include/ directory
>     nmtst: combine files nm-test-helpers.h and nm-test-utils.h
> 

I'd rather not, because the first commit (as it is) is easily verifiable to be correct (because it hardly doesn't do anything beside moving the file).

IMO both things are independent re-factoring, which is better comprehensible by having them separate.



> > nmtst: add nmtst_exec_sudo() function
> 
> Maybe "nmtest_reexec_sudo()" ? With just "exec" it sounds like it takes a
> program name, like execv()...

Done



> > nmtst: call nm_utils_init() in nmtst_init()
> 
> >+#ifdef NM_UTILS_H
> >+	if (!nm_utils_init (&error))
> 
> any test program is going to need nm-utils. You can just include it from
> nm-test-utils.h
> 
> >+	g_assert (!error);
> 
> not clear why this is where it is... the nm_utils_init() call is the only place
> error could get set.

Done.



> > platform/test: fix fake platform to simulate synchronous signals
> 
> Would it be better to make nm-linux-platform emit the signals asynchronously,
> rather than making nm-fake-platform emit them synchronously? If we guarantee
> that newly-added links are emitted synchronously from nm_platform_link_add(),
> then that means we're potentially committing ourselves to blocking on slow
> kernel/helper-binary operations.
> 
> (Of course we could just change it later if it becomes a problem...)

Hm. At the moment, linux platform works for NM (mostly). So I would rather not change linux-platform, but instead make fake-platform behave identically.

I agree, it might make sense, to do it always async (especially, because AFAIK all subscribers to those events currently process the event in a g_idle_add() handler anyway). But let's to that later (and/or when needed).



> Also, the commit summary is wrong: it's not "simulating" synchronous signals,
> it's actually implementing them. "fix fake platform to emit signals
> synchronously".

Done


> > When adding a link, the platform implementation raises the link-added
> > signal synchronously. Fix the fake platform to behave identical and also
> > fix all the tests.
> 
> (1) "the platform implementation" should be "the Linux platform
> implementation", (2) it's the "link-changed" signal now, (3) "identically"

Done


> (The comment in the docs of nm_platform_link_add() also still references
> "link-added". There are probably other references to the old signal names
> too...)

Now: 

- * ...  Any link-added signal will be
+ * ...  Any link-changed ADDED signal will be





> > nmtst: add new mode where tests want to test against logged messages (assert_messages)
> 
> This is the behavior that all tests should eventually have. I don't think they
> should have to opt in to it. If anything, tests that don't want this behavior
> (yet) should have to opt out.

I don't think the API suggests a default mode. The user has to specify whatever mode they want. Either via nmtst_init_with_logging() vs. nmtst_init_assert_logging(), or via the @assert_logging parameter of nmtst_init().

But yes, we might move more tests to this mode (step by step).


> The test infrastructure should just always use g_print() or g_printerr() for
> its own messages so as to not interfere. (This is what gtestutils does.)

In assert_logging mode, this is already what happens.
Otherwise it continues to use the g_log() functions. Hm, I cannot think of a problem that could cause but it seems more correct to me to use g_log() as sole logging mechanism in non-assert-logging mode.



> > nmtst: turn on g_debug() messages in nmtst_init() when debug logging is enabled
> 
> The comment about g_test_init() calling g_log_set_always_fatal() doesn't seem
> to go with this commit...

Moved to next commit.


> > nmtst: add NMTST_DEBUG=no-expect-message for debugging assert_message tests
> 
> this commit also adds nmtst use to several tests, which seems to be
> unrelated...

Split in two commits.

nmtst: add NMTST_DEBUG=no-expect-message for debugging assert_message tests
test: convert several tests to nmtst assert_message mode

Comment 48 Thomas Haller 2014-06-05 07:59:53 UTC

(In reply to comment #47)
> (In reply to comment #46)
> > > nmtst: add NMTST_DEBUG=no-expect-message for debugging assert_message tests
> > 
> > this commit also adds nmtst use to several tests, which seems to be
> > unrelated...
> 
> Split in two commits.
> 
> nmtst: add NMTST_DEBUG=no-expect-message for debugging assert_message tests
> test: convert several tests to nmtst assert_message mode

Oh, right. Now I remember why it was this way.

The tests that I modified #include "nm-test-utils.h", thus with the change to redefine g_test_expect_message(), these tests now call nmtst_ functions and must properly NMTST_DEFINE() and nmtst_init().

Otherwise linking fails with:

  CCLD     test-general
test-general.o: In function `test_setting_vpn_items':
/tmp/workspace/NetworkManager/libnm-util/tests/test-general.c:125: undefined reference to `__nmtst_internal'


I will squash these commits together again.

Comment 49 Thomas Haller 2014-06-05 10:56:17 UTC

(In reply to comment #43)

> Running 'make check':
> /link/software/team: **
> NetworkManager:ERROR:test-link.c:186:test_slave: assertion failed:
> (!nm_platform_link_is_connected (master))
> /bin/sh: line 5: 18147 Aborted                 ${dir}$tst
> FAIL: test-link-linux

This failure was on F17. I added a workaround to the tests, see the two new commits: Please review:

> nmtst: add nmtst_spawn_sync()
> platform/test: workaround team bug in older kernel version



> (/home/jklimes/work/NetworkManager/src/platform/tests/.libs/lt-test-address-linux:18168):
> NetworkManager-WARNING **: (../nm-platform.c:383):nm_platform_link_get_all:
> runtime check failed: (g_hash_table_contains (unseen, GINT_TO_POINTER
> (item->master)))
> /bin/sh: line 5: 18168 Trace/breakpoint trap   ${dir}$tst
> FAIL: test-address-linux

This error was caused by the previous failure, which left test devices hanging and interfered with test-address-linux.

Comment 50 Thomas Haller 2014-06-05 16:26:45 UTC

Branch th/bgo706293_platform_tests merged as:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=37aeee5f71a838db724207eae38ebacaae83eed9



Sometimes I saw test failures for test-link-linux, e.g. on rhel-7.0

/link/software/bond: 
(/tmp/workspace/NetworkManager/src/platform/tests/.libs/lt-test-link-linux:18994): NetworkManager-WARNING **: <error> [1401929488.957439] [../nm-linux-platform.c:1776] sysctl_set(): sysctl: failed to set '/sys/class/net/nm-test-device/bonding/mode' to 'active-backup': (1) Operation not permitted
FAIL: test-link-linux


(this issue has to be resolved later, it's probably a race condition in the test itself).



The branch th/bgo706293_nm_log_glib which adds the logging domain LOGD_GLIB will be reposted separately at a later point.