It may appear however, that Google has already decided that robots.txt is merely a bug heading towards it’s windscreen and is indexing pages that are excluded via the robots.txt as is apparent with Dropbox.
Above is a copy of Dropbox’s robot.txt file and you can clearly see that “Gallery” is on the no-fly list however the following screenshot suggests otherwise.
Why should you care about this? What’s so special about “Gallery” anyway? Well if you are a Dropbox user and you have ever clicked “Start Import” when seeing the below message, you should know that those photo’s of yours may just be on that Google search result and as a result, public.
I’m almost positive that Dropbox did not want the galleries to be indexed and I’m also sure that Googles Army of Lawyers has a good reason why it was but as a curious individual who enjoys tinkering with things I couldn’t resist having a snoop through some of the results.
By appending some choice keywords to the searches such as confidential, screen, budget, excel, portfolio, audition, temp, delete and a few others, I was absolutely staggered at the amount of useful data people have images of. Just in the first 10 or so pages of each result I was able to find bank statements, drivers licenses, credit cards, passwords, wifi keys, serial numbers, medical information and yes…. lots and lots of porn (home made mostly.)
Whether you’re bored like me or digging up some info for (insert legitimate reason here) it is a fantastic search to kill time with. If you are planning on being pro-active with it, go and get Stach & Liu’s Search Diggity tool and look at some of the fantastic features they’ve built for searching for documents through Skydrive, AWS S3 and Dropbox.