Home > Articles

This chapter is from the book

Information Gathering

The information-gathering steps of footprinting and scanning are of utmost importance. Good information gathering can make the difference between a successful pen test and one that has failed to provide maximum benefit to the client. This information can be found on the organization’s website, published trade papers, Usenet, financial databases, or even from disgruntled employees. Some potential sources are discussed, but first, let’s review documentation.

Documentation

One important aspect of information gathering is documentation. Most people don’t like paperwork, but it’s a requirement that you cannot ignore. The best way to get off to a good start is to develop a systematic method to profile a target and record the results. Create a matrix with fields to record domain name, IP address, DNS servers, employee information, email addresses, IP address range, open ports, and banner details. Figure 3-1 gives an example of what your information matrix might look like when you start the documentation process.

FIGURE 3-1

Figure 3-1 Documentation

Building this type of information early on will help in mapping the network and planning the best method of attack.

The Organization’s Website

With the initial documentation out of the way, it’s time to get started. The best place to begin is the organization’s website. Search for the company’s URL with Google, Bing, Dogpile, Shodan, or your search engine of choice. You will want to look for the following:

  • Company URL: Domain name.

  • Internal URLs: As an example, not only xyz.com but also support.xyz.com.

  • Restricted URLs: Any domains not accessible to the public.

  • Internal pages: Company news, employment opportunities, addresses, and phone numbers. Overall, you want to look for all open source information, which is information freely provided to clients, customers, or the general public.

Let’s look at an example of a local web hosting company. A quick review of its site shows it has a news and updates section. Recent news states the following:

We are proud to have just updated all of our servers to Plesk 10.0.1. Anyone logging in to these new servers as admin should use the username of the domain, for example, www.xyz.com. The passwords have been transferred from the old servers, so no password reset should be required. We used the existing domain administrator password. Our continued alliance with Extreme Networks has allowed us to complete our transition from Cisco equipment. These upgrades, along with our addition of a third connection to the Internet, give us a high degree of fault tolerance.

You might consider this good marketing information to provide potential clients. The problem is that this information is available to anyone who browses the website. This information allows attackers to know that the new systems are Linux based and that the network equipment is all Extreme Networks. If attackers were planning to launch a denial of service (DoS) attack against the organization, they now know that they must knock out three nodes to the Internet. Even a competitor would benefit from this knowledge because the company is telling the competition everything about its infrastructure.

In some cases, information may have been removed from a company website. That is when the Wayback Machine, at https://archive.org, is useful to browse archived web pages that date back to 1996. It’s a useful tool for looking for information that no longer exists on a site.

Another big information leakage point is the company directories. These usually identify key employees or departments. By combining this information with a little social engineering, an attacker can call the help desk, pretend he works for one of these key employees, and demand that a password is reset or changed. He could also use biographical information about a key employee to perform other types of social engineering trickery. Kevin Mitnick used social engineering techniques. During a pen test, you want to record any such findings and make sure to alert the organization as to what information is available and how it might be used in an attack.

One method to gain additional information about the organization’s email server is to send an email that will bounce from the site. If the site is www.xyz.com, send a mail to badaddress@xyz.com. It will bounce back to you and give you information in its header, including the email server IP address and email server version. Another great reason for bouncing an email message is to find out whether the organization makes use of mail scrubbers. Whatever you find, you should copy the information from the headers and make a note of it as you continue to gather information.

Finally, keep in mind that it’s not just logical information that you want to gather. Now is a good time to record all physical information about the targeted company. Location information is used to determine the physical location of the targeted company. Bing Maps and Google Earth are two tools that can be used to get physical layout information. Bing Maps is particularly interesting because it offers a 45-degree perspective, which gives a unique view of facilities and physical landmarks. This view enables you to identify objects such as entry points and points of ingress/egress.

Job Boards

If you’re lucky, the company has a job posting board. Look this over carefully; you will be surprised at how much information is given here. If no job listings are posted on the organization’s website, get interactive and check out some of the major Internet job boards. Popular sites include the following:

  • Careerbuilder.com

  • Monster.com

  • Dice.com

  • Indeed.com

At the job posting site, query for the organization. Here’s an example of the type of information usually found:

  • Primary responsibilities for this position include management of a Windows 2019 Active Directory environment, including MS Exchange 2008, SQL 2016, and Citrix.

  • Interact with the technical support supervisor to resolve issues and evaluate/maintain patch level and security updates.

  • Experience necessary in Active Directory, Microsoft Clustering and Network Load Balancing, MS Exchange 2007, MS SQL 2016, Citrix MetaFrame XP, EMC CX-400 SAN-related or other enterprise-level SAN, Veritas Net Backup, BigBrother, and NetIQ Monitoring SW.

  • Maintain, support, and troubleshoot a Windows 10 LAN.

Did this organization give away any information that might be valuable to an attacker? They actually have told attackers almost everything about their network. Just the knowledge that the organization is running Windows 10 is extremely valuable.

One way to reduce the information leakage from job postings is to reduce the system-specific information in the job post or to use a company confidential job posting. Company confidential postings hide the true company’s identity and make it harder for attackers to misuse this type of information.

Employee and People Searches

Security is not just about technical and physical controls. It’s also about people. In many modern attacks, people are the initial target. All this really means is that an ethical hacker is also going to want to see what information is available about key personnel. Whereas websites, employee directories, and press releases may provide employee names, third-party sites have the potential to provide sensitive data an attacker might be able to leverage. We can categorize these sites as either data aggregation brokers or social networking.

A staggering number of data aggregation brokerage sites are on the Web. It is easy for an attacker to perform online searches about a person. These sites allow attackers to locate key individuals, identify home phone numbers, and even create maps to people’s houses. Attackers can even see the surroundings of the company or the home they are targeting with great quality satellite pictures. Here are some of the sites:

What’s interesting about these sites is that many sites promise everything from criminal background checks, to previous addresses, to marriage records, to family members. Figures 3-2 and 3-3 offer some examples of what these sites provide.

FIGURE 3-2

Figure 3-2 Zabasearch

FIGURE 3-3

Figure 3-3 Pipl Email Lookup

Social networks are another big target for attackers. Although social media has opened up great channels for communication and is very useful for marketers, it is fraught with potential security problems. Social networking sites are becoming one of the biggest threats to a user’s security and will remain so for the foreseeable future. One reason is that users don’t always think about security when using these sites. There is also the issue that these sites are designed to connect people. Security is not always the primary concern. Some sites that the ethical hacker may want to check include the following:

  • Facebook

  • Twitter

  • LinkedIn

  • Pinterest

Although some organizations might be relatively secure, gaining the names, addresses, and locations of key employees can allow attackers to fly a drone over their homes, guess passwords, or even possibly backdoor the organization through an employee’s unsecure credentials.

EDGAR Database

If the organization you are working for is publicly traded, you want to review the Security and Exchange Commission’s EDGAR database. It’s located at https://www.sec.gov/edgar/searchedgar/companysearch.html. A ton of information is available at this site. Hackers focus on the 10-Q and 10-K. These two documents contain yearly and quarterly reports.

Not only do these documents contain earnings and potential revenue, they also contain details about any acquisitions and mergers. Anytime there is a merger, or one firm acquires another, there is a rush to integrate the two networks. Having the networks integrated is more of an immediate concern than security. Therefore, you will be looking for entity names that are different from the parent organization. These findings might help you discover ways to jump from the subsidiary to the more secure parent company. You should record this information and have it ready when you start to research the Internet Assigned Numbers Authority (IANA) and American Registry for Internet Numbers (ARIN) databases. Here are some other sites you can use to gather financial information about an organization:

Google Hacking

Most of us use Google or another search engine to locate information. What you might not know is that search engines, such as Google, can perform much more powerful searches than most people ever dream of. Not only can Google translate documents, perform news searches, and do image searches, but it can also be used by hackers and attackers to do something that has been termed Google hacking.

By using basic search techniques combined with advanced operators, Google can become a powerful vulnerability search tool. Table 3-2 describes some advanced operators.

Table 3-2 Google Search Terms

Operator

Description

Filetype

Directs Google to search only within the test of a particular type of file. Example: filetype:xls

Inurl

Directs Google to search only within the specified URL of a document. Example: inurl:search-text

Link

Directs Google to search within hyperlinks for a specific term. Example: link:www.domain.com

Intitle

Directs Google to search for a term within the title of a document. Example: intitle: “Index of.etc”

By using the advanced operators shown in Table 3-2 in combination with key terms, Google can be used to uncover many pieces of sensitive information that shouldn’t be revealed. A term even exists for the people who blindly post this information on the Internet; they are called Google dorks. To see how this works, enter the following phrase into Google:

intext:JSESSIONID OR intext:PHPSESSID inurl:access.log ext:log

This query searches in a URL for the session IDs that could be used to potentially impersonate users. The search found more than 100 sites that store sensitive session IDs in logs that were publicly accessible. If these IDs have not timed out, they could be used to gain access to restricted resources. You can use advanced operators to search for many types of data. Figure 3-4 shows a search where Social Security numbers (SSNs) were queried. Although this type of information should not be listed on the Web, it might have been placed there inadvertently or by someone who did not understand the security implications.

FIGURE 3-4

Figure 3-4 Google Hacking Social Security Numbers

Finally, don’t forget that finding a vulnerability using Google is not unethical, but using that vulnerability can be unethical unless you have written permission from the domain owner. For example, here is a link to the Google hack for Shellshock (a Bash vulnerability introduced later in the chapter): https://www.exploit-db.com/exploits/34895/. Notice how it took only a few minutes for an attacker to gather this type of information. Security professionals should always be concerned about what kind of information is posted on the Web and who can access it.

Now that we have discussed some basic Google search techniques, let’s look at advanced Google hacking. If you have never visited the Google Hacking Database (GHDB) repositories, I suggest that you visit http://www.hackersforcharity.org/ghdb/ and https://www.exploit-db.com/google-hacking-database/. These sites have the following search categories:

  • Footholds

  • Files containing usernames

  • Sensitive directories

  • Web server detection

  • Vulnerable files

  • Vulnerable servers

  • Error messages

  • Files containing juicy info

  • Files containing passwords

  • Sensitive online shopping info

  • Network or vulnerability data

  • Pages containing login portals

  • Various online devices

  • Advisories and vulnerabilities

Johnny Long, Bill Gardner, and Justin Brown have written an excellent book on the subject, Google Hacking for Penetration Testers, Third Edition. Using these techniques, you can find all sorts of information on services, files, and even people. Figure 3-4 shows an example of some of the more unbelievable things found by Google hacking.

A tool such as the GHDB has made using Google easier, but it’s not your only option. Maltego, FOCA, Recon Dog, and Shodan are others worth discussion. Maltego is an open source intelligence and forensics application. It is a tool-based approach to mining and gathering Internet data that can be compiled in an easy-to-understand format. Maltego offers plenty of data on websites and their services. FOCA is another example of an open source information-gathering tool. Figure 3-5 shows an example of FOCA being used to extract metadata from documents to determine such details as software version. Similar to FOCA is Recon Dog, which is another example of an all-in-one information-gathering tool.

FIGURE 3-5

Figure 3-5 FOCA Extracting Metadata

Shodan offers the ability to search for the servers, webcams, printers, routers, and even SCADA devices connected to the Internet. SCADA devices are industrial controls with embedded computers that can be connected to the Internet.

Tools such as Shodan and Censys can be used to find network-connected devices, such as routers, servers, IoT devices, and even printers. Using a variety of filters, these search engines allow you to query hosts and networks for specific information.

You might be wondering who is using all these web search tools. It’s not just hackers. In 2013, documents made public by the National Security Agency (NSA) following a Freedom of Information Act (FOIA) request uncovered a PDF book titled Untangling the Web: A Guide to Internet Research. Although it is somewhat dated, its 643 pages contain many pages dedicated to showing federal agents how to “Google hack” and search directly for documents published online, such as Excel spreadsheets, Word documents, and PDFs. Although much of this document deals with manual ways to footprint, more modern tools like OSRFramework make the job much easier. OSRFamework is just a set of libraries that can be used to search for usernames, DNS data, phone numbers, and so on.

Usenet

Usenet is a user’s network, which is nothing more than a collection of the thousands of discussion groups that reside on the Internet. Each discussion group contains information and messages centered on a specific topic. Messages are posted and responded to by readers either as public or private emails. Even without direct access to Usenet, a convenient way to browse the content is by using Google Groups. Google Groups allows any Internet user a way to post and read Usenet messages. During a penetration test, you will want to review Google Groups for postings from the target company.

One way to search is to use individuals’ names you might have uncovered; another is to do a simple search of the company. Searching for @company.com will work. Many times, this will reveal useful information. One company that I performed some work for had listings from the network administrator. He had been asked to set up a new router and was having trouble getting it configured properly. The administrator had not only asked the group for help but had also posted the router configuration to see whether someone could help figure out what was wrong. The problem was that the configuration file had not been sanitized and not only contained IP addresses but also the following information:

enable secret 5 $1$2RKf$OMOAcvzpb7j9uhfw6C5Uj1

enable password 7 583132656321654949

For those of you who might not be Cisco gurus, those are encrypted passwords. The first one is MD5 and the second is a type 7. According to Cisco, type 7 passwords were not designed to prevent a determined or targeted attack. Type 7 password encryption is only a basic protection mechanism based on a reversible algorithm. Because of the weak encryption algorithm, the Cisco position is that customers should treat configuration files as sensitive information. The problem is that attackers can potentially obtain these configuration files using a number of means, such as Usenet postings, help forums, or even a TFTP server. Others of you who say that “it’s only router passwords” might be right, but let’s hope that the administrator doesn’t reuse passwords (as many people do). As you can see, you can gain additional information about an organization and its technical strengths just by uncovering a few Usenet posts. With possession of the password, the attacker can then use any number of tools to quickly decode the obscured password. Well-known tools that can decode Cisco 7 passwords include Cain and Abel and the Cisco Password decoder. A quick search of the Web returns dozens of hits on such a query. This brings us to the inevitable question of how to fix this problem. Actually, it is not that hard to do. First, you should not post router or firewall configurations, and the enable password command should no longer be used. Use the enable secret command instead; it uses the MD5 algorithm, which is much more secure.

Registrar Query

Not long ago, searching for domain name information was much easier. There were only a few places to obtain domain names, and the activities of spammers and hackers had yet to cause the Internet Assigned Numbers Authority (IANA) to restrict the release of this information. Today, the Internet Corporation for Assigned Names and Numbers (ICANN) is the primary body charged with management of IP address space allocation, protocol parameter assignment, and domain name system management. Its role is that of overall management, as domain name registration is handled by a number of competing firms that offer various value-added services. These include firms such as Network Solutions (https://networksolutions.com), Register.com (https://www.register.com), GoDaddy (https://godaddy.com), and Tucows (http://www.tucows.com). There is also a series of Regional Internet Registries (RIRs) that manage, distribute, and register public IP addresses within their respective regions. There are five RIRs. These are shown in Table 3-3.

Table 3-3 RIRs and Their Area of Control

RIR

Region of Control

ARIN

North and South America and sub-Saharan Africa

APNIC

Asia and Pacific

RIPE

Europe, Middle East, and parts of Africa

LACNIC

Latin America and the Caribbean

AfriNIC

Planned RIR to support Africa

The primary tool to navigate these databases is Whois. Whois is a utility that interrogates the Internet domain name administration system and returns the domain ownership, address, location, phone number, and other details about a specified domain name. Whois is the primary tool used to query Domain Name System (DNS). If you’re performing this information gathering from a Linux computer, the good news is Whois is built in. From the Linux prompt, users can type whois domainname.com or whois? to get a list of various options. Windows users are not as fortunate because Windows does not have a built-in Whois client. Windows users have to use a third-party tool or website to obtain Whois information.

One tool that a Windows user can use to perform Whois lookups is SmartWhois. It can be downloaded from http://www.tamos.com/products/smartwhois/. SmartWhois is a useful network information utility that allows you to look up all the available information about an IP address, hostname, or domain, including country, state or province, city, name of the network provider, administrator, and technical support contact information. You can also use a variety of other tools to obtain Whois information, including the following:

Regardless of the tool, the goal is to obtain registrar information. As an example, the following listing shows the results after www.domaintools.com/ is queried for information about www.pearson.com:

Registrant:
      Pearson PLC
      Clive Carmock
      80 Strand London
      London, UK WC2R 0RL
      GB
      Email:

   Registrar Name....: CORPORATE DOMAINS, INC.
   Registrar Whois...: whois.corporatedomains.com
   Registrar Homepage: www.cscprotectsbrands.com

   Domain Name: pearson.com

      Created on..............: Mon, Nov 25, 1996
      Expires on..............: Thu, Nov 23, 2023
      Record last updated on..: Thu, Feb 02, 2017

   Administrative Contact :
      Pearson PLC
      Clive Carmock
      80 Strand London
     ., . WC2R 0RL
      GB
      Phone: 044-2070-105580
      Email:

   Technical Contact:
      Pearson PLC
      Clive Carmock
      80 Strand London
      ., . WC2R 0RL
      GB
      Phone: 044-2070-105580
      Email:

   DNS Servers:
   usrxdns1.pearsontc.com
   oldtxdns2.pearsontc.com
   ns.pearson.com
   ns2.pearson.com

This information provides a contact, address, phone number, and DNS servers. A hacker skilled in the art of social engineering might use this information to call the organization and pretend to be a valid contact.

DNS Enumeration

If all the previous information has been acquired, the DNS might be targeted for zone transfers. A zone transfer is the mechanism used by DNS servers to update each other by transferring the contents of their database. DNS is structured as a hierarchy so that when you request DNS information, your request is passed up the hierarchy until a DNS server is found that can resolve the domain name request. You can get a better idea of how DNS is structured by examining Figure 3-6, which shows a total of 13 DNS root servers.

FIGURE 3-6

Figure 3-6 DNS Structure

What’s left at this step is to try to gather additional information from the organization’s DNS servers. The primary tool to query DNS servers is Nslookup. Nslookup provides machine name and address information. Both Linux and Windows have Nslookup clients. Nslookup is used by typing nslookup from the command line followed by an IP address or a machine name. Doing so causes Nslookup to return the name, all known IP addresses, and all known CNAMEs for the identified machine. Nslookup queries DNS servers for machine name and address information. Using Nslookup is rather straightforward. Let’s look at an example in which Nslookup is used to find out the IP addresses of Google’s web servers. If you enter nslookup www.google.com, the following response is obtained:

C:\ >nslookup www.google.com
Server:dnsr1.sbcglobal.net
Address:68.94.156.1
Non-authoritative answer:
Name:www.google.com
Addresses:64.233.187.99, 64.233.187.104
Aliases:www.google.com

The first two lines of output say which DNS servers are being queried. In this case, it’s dnsr1.sbcglobal.net in Texas. The nonauthoritative answer lists two IP addresses for the Google web servers. Responses from nonauthoritative servers do not contain copies of any domains. They have a cache file that is constructed from all the DNS lookups it has performed in the past for which it has gotten an authoritative response.

Nslookup can also be used in an interactive mode by just typing nslookup at the command prompt in Windows or the Bash shell in Linux. In interactive mode, the user will be given a prompt of >; at which point the user can enter a variety of options, including attempts to perform a zone transfer. Table 3-4 shows some common DNS resource record names and types.

Table 3-4 IPv4 DNS Records and Types

Record Name

Record Type

Purpose

Host

A

Maps a domain name to an IPv4 address

Host

AAAA

Maps a domain name to an IPv6 address

Pointer

PTR

Maps an IP address to a domain name

Name Server

NS

Specifies the servers that provide DNS services

Start of Authority

SOA

Configures settings for zone transfers and record caching

Service Locator

SRV

Used to locate services in the network

Mail

MX

Used to identify SMTP servers

DNS normally moves information from one DNS server to another through the DNS zone transfer process. If a domain contains more than one name server, only one of these servers will be the primary. Any other servers in the domain will be secondary servers. Zone transfers are much like the DHCP process in that each is a four-step process. DNS zone transfers function as follows:

  1. The secondary name server starts the process by requesting the SOA record from the primary name server.

  2. The primary then checks the list of authorized servers, and if the secondary server’s name is on that list, the SOA record is sent.

  3. The secondary must then check the SOA record to see whether there is a match against the SOA it already maintains. If the SOA is a match, the process stops here; however, if the SOA has a serial number that is higher, the secondary will need an update. The serial number indicates if changes were made since the last time the secondary server synchronized with the primary server. If an update is required, the secondary name server will send an All Zone Transfer (AXFR) request to the primary server.

  4. Upon receipt of the AXFR, the primary server sends the entire zone file to the secondary name server.

A zone transfer is unlike a normal lookup in that the user is attempting to retrieve a copy of the entire zone file for a domain from a DNS server. This can provide a hacker or pen tester with a wealth of information. This is not something that the target organization should be allowing. Unlike lookups that primarily occur on UDP 53, unless the response is greater than 512 bytes, zone transfers use TCP 53. To attempt a zone transfer, you must be connected to a DNS server that is the authoritative server for that zone. An example is shown here for your convenience:

Registrant:
      Technology Centre
      Domain Administrator
      200 Old Tappan Rd.
      Old Tappan, NJ 07675 USA
      Email: billing@superlibrary.com
 Phone: 001-201-7846187
   Registrar Name....: REGISTER.COM, INC.
   Registrar Whois...: whois.register.com
   Registrar Homepage: www.register.com
DNS Servers:
   usrxdns1.pearsontc.com
   oldtxdns2.pearsontc.com

Review the last two entries. Both usrxdns1.pearsontc.com and oldtxdns2.pearsontc.com are the DNS authoritative servers listed. These are the addresses that an attacker will target to attempt a zone transfer. The steps to try to force a zone transfer are shown here:

  1. nslookup: Enter nslookup from the command line.

  2. server <ipaddress >: Enter the IP address of the authoritative server for that zone.

  3. set type = any: Tells Nslookup to query for any record.

  4. ls -d < domain.com>: Domain.com is the name of the targeted domain of the final step that performs the zone transfer.

One of two things will happen at this point. You will receive an error message indicating that the transfer was unsuccessful, or you will be returned a wealth of information, as shown in the following:

C:\Windows\system32>nslookup
Default Server:dnsr1.sbcglobal.net
Address:128.112.3.12
server 172.6.1.114
set type=any
ls -d example.com
example.com. SOA    hostmaster.sbc.net (950849 21600 3600 1728000
  3600)
example.com. NS        auth100.ns.sbc.net
example.com. NS        auth110.ns.sbc.net
example.com.  A     10.14.229.23
example.com. MX      10   dallassmtpr1.example.com
example.com. MX      20   dallassmtpr2.example.com
example.com. MX      30   lasmtpr1.example.com
lasmtpr1      A     192.172.243.240
dallassmtpr1  A     192.172.163.9
dallaslink2   A     192.172.161.4
spamassassin  A     192.172.170.49
dallassmtpr2  A     192.172.163.7
dallasextra   A     192.172.170.17
dallasgate    A     192.172.163.22
lalink        A     172.16.208.249
dallassmtp1   A     192.172.170.49
nygate        A     192.172.3.250
www           A     10.49.229.203
dallassmtp   MX      10   dallassmtpr1.example.com
dallassmtp   MX      20   dallassmtpr2.example.com
dallassmtp   MX      30   lasmtpr1.example.com

Dig is another tool that you can use to provide this type of information. It’s built in to most all Linux distributions and can be run from Bash or run from the command prompt when installed into Windows. Dig is a powerful tool that can be used to investigate the DNS system. There is also a range of tools that can be used to interrogate DNS servers, including the following:

Internal DNS information should not be made available to just anyone. Hackers can use this to find out what other servers are running on the network, and it can help them map the network and formulate what types of attacks to launch. Notice the first line in the previous printout that has example.com listed. Observe the final value of 3600 on that line. That is the TTL value discussed previously and would inform a hacker as to how long DNS poisoning would last. 3,600 seconds is 60 minutes. Zone transfers are intended for use by secondary DNS servers to synchronize with their primary DNS server. You should make sure that only specific IP addresses are allowed to request zone transfers. Most operating systems restrict this by default. All DNS servers should be tested. It is often the case that the primary has tight security but the secondaries may allow zone transfers if misconfigured.

Pearson IT Certification Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from Pearson IT Certification and its family of brands. I can unsubscribe at any time.