[unisog] Getting things deleted from Google's cache

Chris Green cmgreen at uab.edu
Thu Apr 6 16:32:47 GMT 2006

On 4/5/06 7:09 PM, "Russell Fulton" <r.fulton at auckland.ac.nz> wrote:

> Last week we found out that the file was still available via Google!  Ouch!!

We had similar a situation with both google and a google search appliance.
The search appliance was a much bigger pain to remove data from.  They tend
to just remove it from the "collection" and not the appliance as a whole;
bug report open.

IIS webservers present their own set of problems because they are case
insensitive and sometimes the primary link to that is in uppercase.  HTTP is
supposed to be case sensitive so getting rid of documents indexed as
example.edu/foo/bar/baz/BADSTUFF/ proved to be very difficult as well since
they were indexed as both .../BADSTUFF and .../badstuff.

Google isn't the only place that can cache that data either. Ask.com, the
archive.org wayback machine, a9, etc.  Most of the big search engines these
days also present some sort of cache view and the robots.txt is the only
good way to get rid of them and they seem to remove the old documents if the
new robots.txt says to. Robots.txt doesn't seem to have a flag to say "treat
as case insensitive" either.

Chris Green
UAB Data Security, 5-0842

