After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 693295 - researcher requesting data and paper review
researcher requesting data and paper review
Status: RESOLVED WONTFIX
Product: bugzilla.gnome.org
Classification: Infrastructure
Component: bug data
unspecified
Other Linux
: Normal major
: ---
Assigned To: Bugzilla Maintainers
Bugzilla Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-02-07 06:47 UTC by clare zhou
Modified: 2019-02-22 13:01 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a paper investigating Make Valuable Contributors (262.53 KB, application/pdf)
2013-02-07 06:47 UTC, clare zhou
Details
help dump sanitized data from Gnome Bugzilla (5.65 KB, text/plain)
2013-12-13 12:01 UTC, zhangfeixue
Details

Description clare zhou 2013-02-07 06:47:22 UTC
Created attachment 235347 [details]
a paper investigating  Make Valuable Contributors

Why I request the data?
I try to quantify the factors that make a successful open source community. In particular, I focus on people, e.g., contributors' willingness, their expertise, their interaction with project context, and how that's associated with their performance in the community.
The approach we are doing this is to investigate the traces people left in issue tracking systems and version control systems. Currently, Gnome is one of our target communities. 
My co-author and I have obtained some interesting results, please see attached paper. Is it possible for you people to review the paper and see if it's any help for the community? I was also wondering, is it possible for me to get a copy of the database for the later validation? 

What is my institution?
I'm a scholar in Peking University, interested in measuring how developers live their lives with the hope that could help understand and control large complex software systems. 

Hope you enjoy reading the paper: 
What Make Valuable Contributors:
Willingness and Opportunity in OSS Community

------------
Minghui
Comment 1 André Klapper 2013-02-07 07:28:01 UTC
Hi Minghui,

thanks for the paper! You might get a broader audience by also writing to the gnome-bugsquad@gnome.org mailing list.

(In reply to comment #0)
> is it possible to get a copy of the database for the later validation? 

I'm not in a position to decide on a dump (or even tell if it's possible), but as you want to identify the number of contributors I guess you'd also need non-anonymized Bugzilla information? Do you ask for a complete dump or would less data also work?
Using the XML-RPC query interface of Bugzilla 3.4 is not an option? (I think it isn't, as *useful* querying for activity of a certain Bugzilla user will only be possible in future Bugzilla 4.4, in my opinion. Had the same fun lately.)
Comment 2 clare zhou 2013-02-07 08:57:08 UTC
(In reply to comment #1)
> Hi Minghui,
> 
> thanks for the paper! You might get a broader audience by also writing to the
> gnome-bugsquad@gnome.org mailing list.

Thank you, andre, I just did.

> 
> (In reply to comment #0)
> > is it possible to get a copy of the database for the later validation? 
> 
> I'm not in a position to decide on a dump (or even tell if it's possible), but
> as you want to identify the number of contributors I guess you'd also need
> non-anonymized Bugzilla information? Do you ask for a complete dump or would
> less data also work?

Yep, I'm asking for a complete dump, if that is possible.
Can we reproduce the following practice here?
https://bugzilla.mozilla.org/page.cgi?id=researchers.html

> Using the XML-RPC query interface of Bugzilla 3.4 is not an option? (I think it
> isn't, as *useful* querying for activity of a certain Bugzilla user will only
> be possible in future Bugzilla 4.4, in my opinion. Had the same fun lately.)
Comment 3 André Klapper 2013-02-10 21:58:31 UTC
Regarding the paper:

> Data in "TABLE 1: Projects":
> For Evolution it states "21041 Cntrbtrs"

I am really curious how this number was calculated.
I get 1159 different authors for Evolution when running
  git log --author='' --pretty=format:"%ae" | sort -u | wc -l
(excluding translation commits in /po it's 769)
The total number of commits I get is 38654 when running
  git log --pretty=oneline | wc -l

> "Some issues were either not public or not obtainable: 121578 
> in Gnome and 25388 in Mozilla."

This can be the case if "Restrict Group Visibility" was applied (e.g. only developers of the affected project and the reporter can access a certain report if there is a security vulnerability was reported); or if a report simply does not exist, for example (but not only) in GNOME Bugzilla the range between 272654 and 299999, as 200001 to 299999 was reserved for importing the tickets 1 to 99999 from bugzilla.ximian.com in 2005.

However, for the specific cases of https://bugzilla.gnome.org/show_bug.cgi?id=121578 and https://bugzilla.mozilla.org/show_bug.cgi?id=25388 I don't see anything blocking access though, they both work for me without being logged in.
Comment 4 clare zhou 2013-02-11 06:04:32 UTC
(In reply to comment #3)
> Regarding the paper:
> 
> > Data in "TABLE 1: Projects":
> > For Evolution it states "21041 Cntrbtrs"
> 
> I am really curious how this number was calculated.
> I get 1159 different authors for Evolution when running
>   git log --author='' --pretty=format:"%ae" | sort -u | wc -l
> (excluding translation commits in /po it's 769)
> The total number of commits I get is 38654 when running
>   git log --pretty=oneline | wc -l
Sorry for the confusion. That was calculated based on Bugzilla data.
We sorted out all the activities, e.g.
<reporter name="Nathaniel Taylor">hvdc</reporter>
and get a dataset with each row being: bugid;actor;activity;time;product,
then calculated the number of contributors from this dataset.
However, because of the existence of multi-person ID and multi-ID person, I didn't see a simple way out.

> 
> > "Some issues were either not public or not obtainable: 121578 
> > in Gnome and 25388 in Mozilla."
> 
> This can be the case if "Restrict Group Visibility" was applied (e.g. only
> developers of the affected project and the reporter can access a certain report
> if there is a security vulnerability was reported); or if a report simply does
> not exist, for example (but not only) in GNOME Bugzilla the range between
> 272654 and 299999, as 200001 to 299999 was reserved for importing the tickets 1
> to 99999 from bugzilla.ximian.com in 2005.
Helpful, thanks.

> 
> However, for the specific cases of
> https://bugzilla.gnome.org/show_bug.cgi?id=121578 and
> https://bugzilla.mozilla.org/show_bug.cgi?id=25388 I don't see anything
> blocking access though, they both work for me without being logged in.
I just checked again. Yep, you are correct. But when we extracted the data, the difference is, if you loggin in, you got sth like:
<reporter name="Nathaniel Taylor">hvdc@onetel.net.uk</reporter>
If you didn't log in, you got:
<reporter name="Nathaniel Taylor">hvdc</reporter>
Comment 5 Audris Mockus 2013-02-12 00:01:26 UTC
André, 
Many thanks for reading and commenting on the draft!

1) Evolution contributors. Yes, the table misleads by providing the 
amount of code in Evolution, but the code is given only for backround information. 
The study concerns contributors to issue reporting and resolution, not 
contributors of source code.
We count the number of reporter/resolver/commenter/CC/QA Contact/Assignee
for 87594 issues that have Product ~ /^Evolution/


2) Missing issues. 
121578 is the number of issues that are missing (not an
example of e missing issue). Issue 121578 is, actually, in the data we have
retrieved. 

We were not aware of the 272654 to 299999 range. There appears to be
another range like that from 173191 to 200000.
The seven largest ranges missing in our extract of gnome issues 
are:
From       To  countMissing
1          60    60
619       680    62
264705 264773    69
38818   40002  1185
48557   49999  1443
274555 299999 25445
173191 200000 26810
Comment 6 clare zhou 2013-02-12 23:33:05 UTC
Dear olav, 

The script which mozilla uses is 
http://bzr.mozilla.org/bmo/4.0/annotate/head:/contrib/sanitizeme.pl

Basically, the sanitization process removes bugs or comments that are flagged as private or confidential (and all the sensitive user data as u like). According to Mike Hoye from mozilla, HR/Legal and open security bugs are the two big categories there. 

If you are open to this, we would like to help.
Comment 7 Olav Vitters 2013-02-20 20:25:10 UTC
Thanks Clare,

Script is very useful! GNOME Bugzilla does not use the same Bugzilla version, but this should help a lot. I'll try and find some time to work this. Might take me some time before I can work on it though.
Comment 8 zhangfeixue 2013-12-13 12:01:54 UTC
Created attachment 264135 [details]
help dump sanitized data from Gnome Bugzilla

Hi, I'm a student at Peking University. I'd like to help  dump sanitized
data from Gnome Bugzilla by reusing scripts from Mozilla Bugzilla
(http://bzr.mozilla.org/bmo/4.2-dev/view/head:/contrib/sanitizeme.pl).

I tried to run the scripts on Gnome Bugzilla version (bugzilla-3.4.13).
It worked. In the attached file I describe what should be done in detail
to help facilitate the dump process.

I suppose Gnome Bugzilla would be much more complicated than my testing
DB. If GNOME Bugzilla has some DB schema changes, or there are some
other data that should be sanitized, please let me know, we'd like to
help with that. 

----------
Feixue
Comment 9 Michael Schumacher 2017-02-27 15:12:38 UTC
This is probably obsolete by now?
Comment 10 clare zhou 2017-02-27 15:36:27 UTC
Still, it would be great if the bugzilla data could be provided...
Comment 11 André Klapper 2017-02-27 15:55:06 UTC
It would be great, however we (GNOME) do not have the capacity. 
I'd love to see upstream include and provide this functionality...
Comment 12 André Klapper 2019-02-22 13:01:19 UTC
Sorry, with the currently ongoing move from Bugzilla to Gitlab we won't investigate this task, hence declining. We just don't have the capacity. :(