Friday, October 31, 2008

robots.txt

Added a line:
Disallow: /freestuff/ # use /g3/ instead

Labels: , , , ,

Wednesday, October 15, 2008

Finding 404s

Today I once again tail'ed the error log, and fixed a bunch of broken links to /freestuff/store.php

Then there are a lot of SSL errors...

Ooh! and I fixed some broken image links in the 2003 Summer newsletter...

Labels: , , , , ,

Tuesday, October 14, 2008

Summer Newsletter

I have posted the HTML version of the Summer 2008 Newsletter.

Thanks to BBedit grep, the conversion from Word to HTML took about 4 hours for 6 articles.

But massaging the 5 images into shape took also about 4 hours, partly because I had to (re)learn the new version of Fireworks, and partly because I miscalculated the width of our main content table cell.

I first made the images 576 pixels wide, but that was 26 pixels too many, so I had to re-edit all to 550 pixels wide.

Funny thing: Caltrans has changed the URL of their DRI page sometime since 10/6 (since Google last indexed it), thus breaking all links everywhere on the web that used to point to the DRI page. So it's now wrong in the print version of the newsletter. I've corrected the URL in the web version of the newsletter, and will keep an eye on what Caltrans does next...

broken URL:
http://www.dot.ca.gov/newtech/

new URL:
http://www.dot.ca.gov/research/

On Oct 14, 2008, at 11:30 AM, Cindy Walker wrote:

Steve, Thank you for bringing this to our attention. We're working right
now to make sure both of the URLs in your email below work again, since
that's how they functioned previously.

Sincerely,

Cindy Walker
Caltrans Web Team

Labels: , , , , ,

Monday, October 06, 2008

editing robots.txt

# see http://www.robotstxt.org/

Today I looked through a list of our top-level directories and added lines to /robots.txt.

symbolic links


If we have several names for the same directory (e.g., /ped-safety/ and /pedsafety/) I added a Disallow: directive for the symbolic link.
Disallow: /ped-safety/  # use /pedsafety/ instead
This way, Google et al. will cache things with the preferred URL. Especially desirable for holdovers from an era when our site was organized completely differently:
Disallow: /resources/media/multimedia/  # use /videos/ instead


"honeypots"


These are common vulnerabilities that are randomly requested by hackers and spambots. I added zero-length empty files for the most-requested to eliminate 404s for common vulnerabilities that we don't have, so that our 404 reports would show actual broken links.
Disallow: /_vti_bin/    # honeypot

private directories


set up for various special projects; the URLs are not published. I had a couple Disallow: directives for these, but that's essentially publishing the secret URL, so I removed those directives.

Labels: , , , ,

Thursday, October 02, 2008

web stats reporting

Well, I've been doing some archaeology today...

There are 3 different installs of analog on our web server:
[steve@www /]# find -name analog
./usr/bin/analog
./usr/local/analog-www/analog
./home/steve/webstats/analog
./home/steve/webstats/analog/analog
The one in /usr/bin is version 6.0, the others are version 5.32.
[steve@www steve]$ /usr/bin/analog --help
This is analog version 6.0/Unix
For help see docs/Readme.html, or man analog, or http://www.analog.cx/
[steve@www steve]$ /usr/local/analog-www/analog --help
This is analog version 5.32/Unix
For help see docs/Readme.html, or man analog, or http://www.analog.cx/
[steve@www steve]$ /home/steve/webstats/analog/analog --help
This is analog version 5.32/Unix
For help see docs/Readme.html, or man analog, or http://www.analog.cx/
There are several corresponding analog.cfg files:
[steve@www /]# locate analog.cfg
/etc/analog.cfg
/usr/local/analog-www/analog.cfg
/home/steve/webstats/analog/analog.cfg
I haven't found any documentation explaining why, but /usr/bin/analog uses /etc/analog.cfg. The others use the .cfg file located in their directory.