Wednesday, September 3, 2014

Web Hacking 101: robots.txt

Here's a pretty specific trick that may get you somewhere or nowhere. However, it is good to see things from all angles. Now a robots.txt file is not required for websites, but it is often included to keep web crawlers and other indexing tools that bother to check, in check. It gives directives of how fast to move around and where it is allowed or not.

Our main interest is in where it tells bots not to go. The reason being, there may be something useful in there. A robots.txt file may look something like so.

User-agent: *
Disallow: /secrets/
Crawl-delay: 5

The general rule of thumb is you only disallow exposed links, as bots generally crawl exposed links only. They won't go finding something that does not have an obvious path. Yet, still people insist on putting there secret folders and such on a file that anyone can access, thus revealing secret areas.

A while ago, someone posted a link on a chat. The link was to download software that the website had set up so that you are supposed to pay for it. The problem was, they put the directory with the software in the robots.txt file and there was nothing stopping a direct download. So it was free software for the taking.

My personal experience has been that most of the time, the areas are dead, or at least appear to be. My guess is it is because most people don't feel the need to keep this up-to-date. Either way, there is a chance it will expose more of the site to you than meets the eye.

Tag Cloud

.NET (1) A+ (1) addon (6) Android (3) anonymous functions (5) application (9) arduino (1) artificial intelligence (2) bash (3) c (7) camera (1) certifications (1) cobol (1) comptia (2) computing (2) css (2) customize (15) encryption (2) error (15) exploit (13) ftp (2) gadget (2) games (2) Gtk (1) GUI (5) hardware (6) haskell (15) help (5) HTML (4) irc (1) java (5) javascript (20) Linux (18) Mac (4) malware (1) math (8) network (5) objects (2) OCaml (1) perl (4) php (8) plugin (6) programming (42) python (24) radio (1) regex (3) security (21) sound (1) speakers (1) ssh (1) telnet (1) tools (11) troubleshooting (1) Ubuntu (3) Unix (4) virtualization (1) web design (14) Windows (6) wx (2)