editing robots.txt
      # see http://www.robotstxt.org/
Today I looked through a list of our top-level directories and added lines to /robots.txt.
If we have several names for the same directory (e.g., /ped-safety/ and /pedsafety/) I added a Disallow: directive for the symbolic link.
These are common vulnerabilities that are randomly requested by hackers and spambots. I added zero-length empty files for the most-requested to eliminate 404s for common vulnerabilities that we don't have, so that our 404 reports would show actual broken links.
set up for various special projects; the URLs are not published. I had a couple Disallow: directives for these, but that's essentially publishing the secret URL, so I removed those directives.
    Today I looked through a list of our top-level directories and added lines to /robots.txt.
symbolic links
If we have several names for the same directory (e.g., /ped-safety/ and /pedsafety/) I added a Disallow: directive for the symbolic link.
Disallow: /ped-safety/ # use /pedsafety/ insteadThis way, Google et al. will cache things with the preferred URL. Especially desirable for holdovers from an era when our site was organized completely differently:
Disallow: /resources/media/multimedia/ # use /videos/ instead
"honeypots"
These are common vulnerabilities that are randomly requested by hackers and spambots. I added zero-length empty files for the most-requested to eliminate 404s for common vulnerabilities that we don't have, so that our 404 reports would show actual broken links.
Disallow: /_vti_bin/ # honeypot
private directories
set up for various special projects; the URLs are not published. I had a couple Disallow: directives for these, but that's essentially publishing the secret URL, so I removed those directives.
Labels: robots, robots.txt, server admin, System Administration, web site


0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home