GNOME Bugzilla – Bug 509424
Only let search engined index stable, unstable versions
Last modified: 2016-02-29 12:57:45 UTC
GNOME Library should only let the stable and unstable documentation versions be indexed. This using a robots.txt. This is needed as * stable/unstable are the only things that you want people to see by default * providing every version lowers your google pagerank as much of the content is the same * should take extra work to see old versions
Hopefully bots won't choke on such a large robots.txt file. 2008-05-17 Frederic Peters <fpeters@0d.be> * data/xslt/indexes.xsl: added generation of a robots.txt file listing versioned path, so only /stable/ and /unstable/ URLs will be indexed. (closes: #509424)
Actually, this isn't quite fixed. There's considerable duplication in the robots.txt. Here's a small snippet to illustrate: Disallow: /admin/gdm/2.14/ Disallow: /admin/gdm/2.16/ Disallow: /admin/gdm/2.18/ Disallow: /admin/gdm/2.20/ Disallow: /admin/system-admin-guide/2.14/ Disallow: /admin/system-admin-guide/2.16/ Disallow: /admin/system-admin-guide/2.18/ Disallow: /admin/system-admin-guide/2.20/ Disallow: /admin/system-admin-guide/2.22/ Disallow: /admin/gdm/2.14/ Disallow: /admin/gdm/2.16/ Disallow: /admin/gdm/2.18/ Disallow: /admin/gdm/2.20/ It appears to me from a quick glance that each line appears AT LEAST 3 times.
Update: I counted the number of occurrences of the sample gdm block I gave and found 42 consecutive occurrences.
A more modern approach would be to have all the old documents point to the current stable version as the canonical link for all the versions. https://support.google.com/webmasters/answer/139066?hl=en