GNOME Bugzilla – Bug 693295
researcher requesting data and paper review
Last modified: 2019-02-22 13:01:19 UTC
Created attachment 235347 [details] a paper investigating Make Valuable Contributors Why I request the data? I try to quantify the factors that make a successful open source community. In particular, I focus on people, e.g., contributors' willingness, their expertise, their interaction with project context, and how that's associated with their performance in the community. The approach we are doing this is to investigate the traces people left in issue tracking systems and version control systems. Currently, Gnome is one of our target communities. My co-author and I have obtained some interesting results, please see attached paper. Is it possible for you people to review the paper and see if it's any help for the community? I was also wondering, is it possible for me to get a copy of the database for the later validation? What is my institution? I'm a scholar in Peking University, interested in measuring how developers live their lives with the hope that could help understand and control large complex software systems. Hope you enjoy reading the paper: What Make Valuable Contributors: Willingness and Opportunity in OSS Community ------------ Minghui
Hi Minghui, thanks for the paper! You might get a broader audience by also writing to the gnome-bugsquad@gnome.org mailing list. (In reply to comment #0) > is it possible to get a copy of the database for the later validation? I'm not in a position to decide on a dump (or even tell if it's possible), but as you want to identify the number of contributors I guess you'd also need non-anonymized Bugzilla information? Do you ask for a complete dump or would less data also work? Using the XML-RPC query interface of Bugzilla 3.4 is not an option? (I think it isn't, as *useful* querying for activity of a certain Bugzilla user will only be possible in future Bugzilla 4.4, in my opinion. Had the same fun lately.)
(In reply to comment #1) > Hi Minghui, > > thanks for the paper! You might get a broader audience by also writing to the > gnome-bugsquad@gnome.org mailing list. Thank you, andre, I just did. > > (In reply to comment #0) > > is it possible to get a copy of the database for the later validation? > > I'm not in a position to decide on a dump (or even tell if it's possible), but > as you want to identify the number of contributors I guess you'd also need > non-anonymized Bugzilla information? Do you ask for a complete dump or would > less data also work? Yep, I'm asking for a complete dump, if that is possible. Can we reproduce the following practice here? https://bugzilla.mozilla.org/page.cgi?id=researchers.html > Using the XML-RPC query interface of Bugzilla 3.4 is not an option? (I think it > isn't, as *useful* querying for activity of a certain Bugzilla user will only > be possible in future Bugzilla 4.4, in my opinion. Had the same fun lately.)
Regarding the paper: > Data in "TABLE 1: Projects": > For Evolution it states "21041 Cntrbtrs" I am really curious how this number was calculated. I get 1159 different authors for Evolution when running git log --author='' --pretty=format:"%ae" | sort -u | wc -l (excluding translation commits in /po it's 769) The total number of commits I get is 38654 when running git log --pretty=oneline | wc -l > "Some issues were either not public or not obtainable: 121578 > in Gnome and 25388 in Mozilla." This can be the case if "Restrict Group Visibility" was applied (e.g. only developers of the affected project and the reporter can access a certain report if there is a security vulnerability was reported); or if a report simply does not exist, for example (but not only) in GNOME Bugzilla the range between 272654 and 299999, as 200001 to 299999 was reserved for importing the tickets 1 to 99999 from bugzilla.ximian.com in 2005. However, for the specific cases of https://bugzilla.gnome.org/show_bug.cgi?id=121578 and https://bugzilla.mozilla.org/show_bug.cgi?id=25388 I don't see anything blocking access though, they both work for me without being logged in.
(In reply to comment #3) > Regarding the paper: > > > Data in "TABLE 1: Projects": > > For Evolution it states "21041 Cntrbtrs" > > I am really curious how this number was calculated. > I get 1159 different authors for Evolution when running > git log --author='' --pretty=format:"%ae" | sort -u | wc -l > (excluding translation commits in /po it's 769) > The total number of commits I get is 38654 when running > git log --pretty=oneline | wc -l Sorry for the confusion. That was calculated based on Bugzilla data. We sorted out all the activities, e.g. <reporter name="Nathaniel Taylor">hvdc</reporter> and get a dataset with each row being: bugid;actor;activity;time;product, then calculated the number of contributors from this dataset. However, because of the existence of multi-person ID and multi-ID person, I didn't see a simple way out. > > > "Some issues were either not public or not obtainable: 121578 > > in Gnome and 25388 in Mozilla." > > This can be the case if "Restrict Group Visibility" was applied (e.g. only > developers of the affected project and the reporter can access a certain report > if there is a security vulnerability was reported); or if a report simply does > not exist, for example (but not only) in GNOME Bugzilla the range between > 272654 and 299999, as 200001 to 299999 was reserved for importing the tickets 1 > to 99999 from bugzilla.ximian.com in 2005. Helpful, thanks. > > However, for the specific cases of > https://bugzilla.gnome.org/show_bug.cgi?id=121578 and > https://bugzilla.mozilla.org/show_bug.cgi?id=25388 I don't see anything > blocking access though, they both work for me without being logged in. I just checked again. Yep, you are correct. But when we extracted the data, the difference is, if you loggin in, you got sth like: <reporter name="Nathaniel Taylor">hvdc@onetel.net.uk</reporter> If you didn't log in, you got: <reporter name="Nathaniel Taylor">hvdc</reporter>
André, Many thanks for reading and commenting on the draft! 1) Evolution contributors. Yes, the table misleads by providing the amount of code in Evolution, but the code is given only for backround information. The study concerns contributors to issue reporting and resolution, not contributors of source code. We count the number of reporter/resolver/commenter/CC/QA Contact/Assignee for 87594 issues that have Product ~ /^Evolution/ 2) Missing issues. 121578 is the number of issues that are missing (not an example of e missing issue). Issue 121578 is, actually, in the data we have retrieved. We were not aware of the 272654 to 299999 range. There appears to be another range like that from 173191 to 200000. The seven largest ranges missing in our extract of gnome issues are: From To countMissing 1 60 60 619 680 62 264705 264773 69 38818 40002 1185 48557 49999 1443 274555 299999 25445 173191 200000 26810
Dear olav, The script which mozilla uses is http://bzr.mozilla.org/bmo/4.0/annotate/head:/contrib/sanitizeme.pl Basically, the sanitization process removes bugs or comments that are flagged as private or confidential (and all the sensitive user data as u like). According to Mike Hoye from mozilla, HR/Legal and open security bugs are the two big categories there. If you are open to this, we would like to help.
Thanks Clare, Script is very useful! GNOME Bugzilla does not use the same Bugzilla version, but this should help a lot. I'll try and find some time to work this. Might take me some time before I can work on it though.
Created attachment 264135 [details] help dump sanitized data from Gnome Bugzilla Hi, I'm a student at Peking University. I'd like to help dump sanitized data from Gnome Bugzilla by reusing scripts from Mozilla Bugzilla (http://bzr.mozilla.org/bmo/4.2-dev/view/head:/contrib/sanitizeme.pl). I tried to run the scripts on Gnome Bugzilla version (bugzilla-3.4.13). It worked. In the attached file I describe what should be done in detail to help facilitate the dump process. I suppose Gnome Bugzilla would be much more complicated than my testing DB. If GNOME Bugzilla has some DB schema changes, or there are some other data that should be sanitized, please let me know, we'd like to help with that. ---------- Feixue
This is probably obsolete by now?
Still, it would be great if the bugzilla data could be provided...
It would be great, however we (GNOME) do not have the capacity. I'd love to see upstream include and provide this functionality...
Sorry, with the currently ongoing move from Bugzilla to Gitlab we won't investigate this task, hence declining. We just don't have the capacity. :(