Monday, October 06, 2008

editing robots.txt

# see http://www.robotstxt.org/

Today I looked through a list of our top-level directories and added lines to /robots.txt.

symbolic links


If we have several names for the same directory (e.g., /ped-safety/ and /pedsafety/) I added a Disallow: directive for the symbolic link.
Disallow: /ped-safety/  # use /pedsafety/ instead
This way, Google et al. will cache things with the preferred URL. Especially desirable for holdovers from an era when our site was organized completely differently:
Disallow: /resources/media/multimedia/  # use /videos/ instead


"honeypots"


These are common vulnerabilities that are randomly requested by hackers and spambots. I added zero-length empty files for the most-requested to eliminate 404s for common vulnerabilities that we don't have, so that our 404 reports would show actual broken links.
Disallow: /_vti_bin/    # honeypot

private directories


set up for various special projects; the URLs are not published. I had a couple Disallow: directives for these, but that's essentially publishing the secret URL, so I removed those directives.

Labels: , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home