Pentesting Information Gathering Tools and Techniques


Using a WHOIS web application is the general first step in gathering information on your target.  Make sure to take note of the IP blocks, Technical contacts, Names and Emails.  This information will be valuable later.  My favorite whois application, because of minimal design and no ads, is:

robots.txt files

Downloading a site’s robots.txt file provides info on web pages the site designer(s) want displayed or hidden.  The Disallow:/ statement instructs the search engine spiders to not browse/visit that source by using the using the Robots Exclusion Protoco.  The disallowed sites can give insight into what the individual or company wishes to hide.  To view theRobots.txtfile, find the Robots.txtfile in the root directory of a target website.

Historical Website Search

This service allows you to visit older versions of the target site.  This also removes any current connection to the site, thereby allowing the attacker to scour the site without actually connecting to the site.  To access the WayBack Machine, open up the web browser and navigate to:  you will see Internet Archive WayBack Machine in the center of the page.  What’s also excellent about this tool is that some information in the past shouldn’t have been online in the first place or was later removed because of its sensitive nature, so be sure to examine different time periods of your target’s site.

HTTrack – clone a website

HTTrack is a tool built into Kali Linux. The purpose of HTTrack is to copy a website.  This is useful if you want to navigate the entire site offline.  Additionally, having a copy of a website could be used to develop fake phishing websites, that can be used in other penetration testing apps like BeEF (I will get to this later).  Here are the directions:

  1. open a Terminal window and type in apt-get install httrack
  2. create a directory to store the copied site using the mkdir command
  3. type httrack in the terminal window and choose a title
  4. select a directory to save the website. Choose the folder created in the previous step
  5. Enter the URL of the site you want to capture.
  6. The next two options are presented regarding what you want to do with the captured site. Option 2 is the easiest method, which is a mirror website with a wizard
  7. specify if you want to use a proxy to launch the attack. You can also specify what type of files you want to download (use * for all files). You can also define any command line options or flags you might want to set.
  8. Before httrack runs, it will display the command that it is running. You can use this command in the future if you want to run httrack without going through the wizard.
  9. navigate to the directory where you saved the cloned files.

Electronic Data Gathering, Analysis, and Retrieval (EDGAR)

The U.S. Securities and Exchange Commission (SEC) requires all companies, foreign and domestic, to file registration statements, periodic reports, and other forms electronically through EDGAR.  The info is free and available to the public.  The SEC also provides a tutorial on how to search company information on their website.  Visit: to search for your company.

Shodan Search

Shodan is a search engine that can identify a specific device connected to the Internet.  The search engine has been gaining a lot of attention as it can search for everything from a refrigerator to a nuclear power plant.  Visit and setup your free account.  There are also firefox and chrome plugins that allow you to view the services in use on the web pages you visit.  In addition to providing detailed information, it also provides exploit information.  Some available within Metasploit.  In my most recent search, I entered SCADA (100’s of results), which are industrial control systems, and Shodan provided me with exploits from a variety of sources.  Give it a go.

Google hacking

Google hacking is the most common form of search engine Reconnaissance of web applications. Google hacking uses advanced operations in the Google search engine to locate specific strings of text within search results.  There are many resources online.  The Google Hacking Database (GHDB)  is an excellent source for Google search queries. Searches for usernames, passwords, vulnerable systems, and exploits have been captured and categorized by Google hacking dorks. To access the GHDB, navigate to You will see the latest GHDB google hacking searches listed on main page. Click on one of the items, this will give you additional info, click again and it will auto search in google. You may also wish to examine:
DNS Lookup

The domain name system (DNS) is a distributed database arranged hierarchically. Its purpose is to provide a layer of abstraction between other Internet services (web, email, etc.) and the numeric addresses (IP addresses) used to uniquely identify any machine on the Internet.  Find all DNS informatino for your target system using the tools below.

Nslookup is a tool that lets you translate hostnames to IP addresses and vice versa.type:  nslookup

  1. open a terminal window and type:  dig any  the “any” searches for, you guessed it, any type of record.  The is your target domain.  You can also configure Dig to query custom DNS servers by adding @<IP> to the command. in your terminal window.  If you provide an IP address, DNS returns the domain name associated with that IP.
    To query the DNS server for the whole record associated to google.comtype: nslookup -querytype=ANY in your terminal window. With this command we will see the name servers, A records and perhaps, other important information to be used in later pentesting steps.

Dig (domain information groper) is one the most popular and widely used DNS Reconnaissance tools. Dig is a command line tool which is used to query DNS information. The dig tool is extremely powerful and can perform direct Nameserver queries, thus bypassing all cached DNS servers DNS entries.To use Dig try the following instructions:

try using the -t option, which will delegate a DNS zone to use the authoritative name servers.

  1. type dig -t ns
  2. Try searching for a specific record dig MX

Netcraft is another excellent tool to gain information about a target. Visit searching the domain name in netcraft, the app returns information of our target such as the web server version, name server and IP addresses of the different web servers in use. Other than web server version, IP addresses and Nameservers, Netcraft provides the following information useful for our tasks: server version, uptime stats, IP address owner, and hosting provider.  Netcraft may not provide all the information e need concerning the target web server version.  For this, we would use httprint.

httprint probes the web server with a series of requests comparing the responses against a database of signatures to find a match and accurately guess the Web server version, Installed modules, and Web enabled devices.httprint is capable of fingerprinting the web server version even when the banner or the HTTP response header have been obfuscated or altered through security modules such as mod_security or manually.  httprint can also load the output Nmap xml file. While Nmap will only retrieve the web server banner, we will be able to provide its output to httprint to check that the service banner has not been altered to mislead attackers.  This tool’s web server signatures are out of date, but it still is an excellent resource.

Fierce is another DNS recon tool in Kali Linux.  Among the standard DNS checks, Fierce will check to see if the DNS server allows zone transfers. If zone transfers are allowed, Fierce will execute a zone transfer and inform the user of the entries. If the DNS server does not allow zone transfers, Fierce can be configured to brute force host names on a DNS server. To use Fierce:

  1. navigate to Information Gathering | DNS Analysis | Fierce. Fierce will load into a terminal window
  2. type the following -dns

If  you see Fierce failed at completing a zone transfer, it will try and brute force a zone transfer using a word list or dictionary file if you have one defined.

Bing search allows us to search by IP whereby the results produce the domains and subdomains hosted on that IP.  Test it out by following along.

  1. setup up the search: ip:
  2. If you have multiple IP’s in scope you can combine them: (ip:[first_IP] OR ip:[second_ip])
  3. Let’s say the results provide us with 2 subdomains.  you will want to try a new search and remove these subdomains from the next search: ip:
  4. Note that the minus ‘-‘ sign signifies: exclude.
  5. If we get a new subdomain, we will insert it in our exclusion list:

TheHarvester is a tool for gathering subdomain names from different public sources such as search engines or PGP key servers.Options include the following:-d domain to search-l: limit of results to work with-b data source (bing, google, linkedin, pgp, all, …)-f output to html or XML file. (optional, good for long lists.)

Virtual Hosts: A virtual host is simply a website that shares an IP address with one or more other virtual hosts.  These hosts are domains and subdomains.  This is very common in a shared hosting environment where a multitude of websites share the same server/IP address.

Hostmap is probably one of the best tools for finding virtual hosts.  The tool employs a number of different techniques to uncover Virtual hosts and Subdomains The employed techniques include Bing’s IP dork, zone transfers,  DNS lookups and brute force.
hostmap –t [ip address]

Maltego – Information Gathering graphs
Maltego is another reconnaissance tool built into Kali Linux  that can gather information using open and public information on the Internet. Maltego uses some built-in DNS Reconnaissance, but also some interesting fingerprinting methods to gather intelligence on your target.  The app conveniently displays the information in a graph for your analysis or report.To start Maltego, navigate to Application menu in Kali, and click on the Kali menu. Then select Information Gathering | DNS Analysis | Maltego

  1. The first step when you launch Maltego is to register it. You cannot use the application without registration.
  2. Maltego has numerous methods of gathering information. The most through, and easiest, way to use Maltego is to take advantage of the startup wizard to select the type of information you want to gather.  The power of Maltego is that it lets you visually observe the relationship between a domain, organization, and people. You can focus around a specific organization, or look at an organization and its related partnerships from DNS queries.
  3. Depending on the scan options chosen, Maltego will let you perform the following tasks:
    Associate an e-mail address to a person• Associate websites to a person• Verify an e-mail address• Gather details from Twitter, including geolocation of pictures

FOCA – website metadata ReconnaissanceFOCA (Fingerprinting Organizations with Collected Archives) is a tool used mainly to find metadata and hidden information in the documents its scans. These documents may be on web pages, and can be downloaded and analyzed with FOCA. It is Windows OS only. Please download the newest version here:

  1. The first thing to do after launching FOCA is create a new projectWe recommend keeping all project files in one place. You shouldcreate a new folder for each project.
  2. Once you name your project and decide where you want to store the projectfiles, click on the Create button,
  3. Next thing to do is save your project file. Once you saved the project, clickon the Search All button so FOCA will use search engines to scan fordocuments. Optionally, you can use local documents as well
  4. Right-click on the file and select the Extract Metadata option
  5. Right-click on the file and select the Analyze Metadata optionFOCA allows the user to save and index a copy of all the metadata. In addition, each type of metadata file can be saved and copied. This gives a Penetration Tester a wealth of information. Screenshots are usually used to give an overview of the indexed files, along with a listing of all individual files. Finally, FOCA will allow a Penetration Tester to download individual files that can be used as examples.