GNOME Bugzilla – Bug 74288
Impossible to search mail.gnome.org/archives meaningfully
Last modified: 2007-10-15 21:19:50 UTC
http://mail.gnome.org/archives/ has a search form at the top of the page. Unfortunately when you search for something the result page notes that the search tables were "Last modified: 2000-08-06" so the search is useless. Sorry if www isn't the right component for this bug; there's not a component for mail.gnome.org.
Assigning to myself, I've done a lot of namazu stuff for other groups, so I'll have a look at ours.
This is still broken.
And the bug is still open. Surprised? :-)
Interestingly, the "Last modified:" date has changed to "2003-02-18", so compared to the original bug report, the search databases is about 14 months less broken than it used to be. 2002-3-11 minus 2000-8-06 = ~ 1 year, 7 months 2003-7-25 minus 2003-2-18 = ~ 5 months How's that for a useless statistic? :)
So, in 16 months real time, the search engine has added 30 archived months to search from. That's an improvement of 1.875 archive months per real-time month. At this rate it should converge in September, and we can mark this as `FIXED'. :)
Charles: ha ha ha. ;-)
This is a serious problem. It's impossible to find anything. No wonder people keep asking the same questions over and over again. As an example, try searching for libcroco. It gives no results at all. I know libcroco has been mentioned a lot at least on the garnome mailinglist. And judging from this bug report is has been broken almost two years... I looked through the cvs repository but couldn't find anything related to this, so it's hard to make a guess at what is wrong.
SLUG (http://slug.org.au/) recently switched over to htdig from namazu since it was really not working any more. We have a cron job every morning that indexes all the list archives, and it works quiet well. Maybe this might work well for GNOME?
Now "Last modified: 2000-08-06" The results are more recent though, at least from feb-204 the most recent. It's been a while since this bug was filed...
I have contacted the tech guys behind Mailman to see if we find a solution to this ancient bug.
Moving to the new mail.gnome.org component.
GNOME sysadmins are not answering. Namazu upstream contacted: http://www.namazu.org/trac-namazu/trac.cgi/ticket/9 I guess this is more a problem of our installation than a Namazu bug but who knows. In any case, maybe they can help.
Just found some comments on this issue in http://live.gnome.org/MailingLists --------- The namazu scripts are now in sysadmin CVS in the 'namazu' module, but we're currently missing a script to keep the indexes in sync regularly. At time of writing, the indexes were last updated in August 2005. Plans for improvement Get namazu hooked up to scan for recent changes (perhaps every 3-4hrs, do 'find' in the archives and feed that into namazu to update the indexes). ---------------- http://live.gnome.org/SysadminToDoList Archive search indexes not being maintained It looks like when I set up namazu back in April 2004, I ran the initial indexing script, but never hooked up anything to keep it indexed. I've checked the scripts that I found in ~mailman/namazu into 'namazu' module in sysadmin CVS, but they only seem to handle generating the initial indexes. As yet there doesn't appear to be a script capable of periodically checking for recent content, and indexing just that. I have re-run the index generation (so all content up to 21-Aug-2005 should get indexed), and I have started a script for this, and will check it into namazu module when I've had a chance to test it. Until then, the indexes will likely remain static.
Sorry, I closed the wrong bug window :(
We got an worksforme upstream... :( http://www.namazu.org/trac-namazu/trac.cgi/ticket/9 Ross, maybe if you could provide more details. I'll be happy pursuing the Namazu people in order to et help if we need it - I only need to know what to tell them. :)
It's been a while since I looked at the problem, but if I recall, I couldn't find an example anywhere of how to use namazu to only index content added/changed since the last run (or how to remove content that had been removed). It's probably not a particularly difficult script to write, I just didn't have time at that point and couldn't find an example anywhere from someone who had done something similar before. Anyway, I haven't got back round to it since.
Opened a new request at Namazu's bugtracker: http://www.namazu.org/trac-namazu/trac.cgi/ticket/12 I'll comment here any advance done there.
*** Bug 330578 has been marked as a duplicate of this bug. ***
Hi Ross, we got an answer from http://www.namazu.org/trac-namazu/trac.cgi/ticket/12 Mon Feb 6 14:55:59 2006: Modified by ot@zoy.org * resolution set to worksforme * status changed from new to closed By default, mknmz (the indexing script for namazu) will do just that: add to the index documents recently added, updates the ones that have changed, and delete the ones that were removed. just run: mknmz /path/to/the/directory-to-index -0 /where/the/index/resides/ (update the paths as needed)
I've added the 'genindex.sh' script onto cron, to run daily. I've run it once manually too. It seems to now have content from January 2006, as you can see by searching for 'Jan 2006', but only for a few lists (perhaps it died when I logged off). Not sure this is closed yet. Perhaps tonight's re-indexing will complete.
Apparently the index has improved but still is not 100% functional. Some lists such as gnome-infrastructure seem to be (fully?) indexed. Others like gnome-web-list are clearly not fully indexed. Maybe every day the script is incorporating some more messages? Anyway, hopefully the fix of this bug is approaching...
Yep, I spotted a cron mail the other day suggesting the nightly re-indexing is failing. IIRC, it looks like a bug in the script - probably an easy-fix, though. I'll look into it shortly.
Just confirming the script is not working properly yet, since there are still many lists not fully indexed.
*** Bug 340975 has been marked as a duplicate of this bug. ***
As a simple work-around until this is fixed, perhaps someone could add a brief note to the search page to the effect: "use google to search the lists with google's 'site:' operator". (It's what I did, and imagine what everybody else is doing given this bug has been open four years and the functionality is essential.) PS Not working for NetworkManager list.
Or just replace it with a Google site-search box?
Well, I didn't want to be quite that presumptuous - but, now you mention it...:)
I started *another* re-indexing a couple of days ago which is still running. Hopefully that should bring us up to now. However, we still need to find and hook up the script that keeps the indexes up-to-date as new mail is archived. I'm trying to keep notes on http://live.gnome.org/MailingLists about the setup and any problems so once it's fixed it stays fixed (or is easily fixable).
I've just realized that the search engine is more broke than I thought: 1. Go to http://mail.gnome.org/archives/ 2. Type "drupal" in the search string field and click Search! The result with the highest score on the top of the list http://mail.gnome.org/archives/evolution-patches/2004-December/msg00158.html It doesn't contain the string "drupal". I've tried with other words and it is pretty easy to get wrong results. Seriously: in all these years it has been impossible to search mail.gnome.org/archives meaningfully. In the meantime people needing to do so have been using Google's advanced search features. I think it's time to stop dreaming that one day we will have Namazu working and put an adapted Google search instead.
*** Bug 466278 has been marked as a duplicate of this bug. ***
bkor, I didn't touch other pages, and I am not totally sure about hq parameter, so I won't close this bug now. Could you update the site so it can be tested ? 2007-10-15 Frederic Peters <fpeters@0d.be> * css/layout.css: * index.html: added Google Site search to search in archives.