Skip to content

01. Website Recon and Footprinting

It involve gathering as much information as possible about a target website to prepare for further analysis or penetration testing. Here's a detailed guide on what you are looking for and the tools or techniques you can use, along with proper explanations.


Key Objectives

  1. IP Addresses: Identify the hosting server and IP range.
  2. Hidden Directories: Detect restricted or hidden content.
  3. Names, Emails, Phone Numbers: Extract personal or organizational data.
  4. Physical Addresses: Locate the entity's geographical presence.
  5. Web Technologies: Identify technologies, frameworks, and CMS used.

Tools and Techniques

1. Basic DNS and Host Lookup

  • Command: host <domain_name>

    • Finds the IP address associated with a domain name.
    • Example:

      host example.com
      
    • Output includes IP address and aliases if any.

2. Accessing Robots.txt and Sitemap.xml

  • Robots.txt:

    • Found at https://example.com/robots.txt
    • Reveals disallowed directories and crawlers' instructions.
    • Example content:

      User-agent: *
      Disallow: /admin/
      
  • Sitemap.xml:

    • Found at https://example.com/sitemap.xml
    • Provides structured information about website URLs.
    • Useful for discovering less obvious pages or sections.

3. Browser Plugins

  • BuiltWith: Analyzes the website's backend technologies.
  • Wappalyzer: Detects CMS, web servers, frameworks, and plugins in use.

4. WhatWeb

  • Command: whatweb <domain_name>
    • Identifies web technologies, such as server software, frameworks, or CMS.
    • Example:

      whatweb example.com
      

5. WebHTTrack

  • Description:
    • A website copier tool to download and browse websites offline.
  • Usage:

    • Install:

      sudo apt install httrack
      
    • Command:

      httrack "https://example.com" -O /path/to/save
      
    • Downloads the site's content to analyze for hidden files or directories.

6. Checking for security.txt

A security.txt file is a standardized method for website owners to publicly share security contact information.
It is similar to robots.txt but specifically for vulnerability reporting.

Purpose
Helps ethical hackers / security researchers easily find how to responsibly report security issues.

Location Requirements
A valid security.txt file must be hosted at:

  • https://example.com/.well-known/security.txtPreferred and standardized path

  • or https://example.com/security.txt

Format Requirements

  • Must be served over HTTPS

  • Must be in plaintext

  • Must be easily readable by both humans and automated tools

Common Fields Inside security.txt

Example:

Contact: mailto:security@example.com
Expires: 2025-12-31T23:59:59z
Encryption: https://example.com/pgp-key.txt
Acknowledgments: https://example.com/security-hall-of-fame/
Preferred-Languages: en, fr

Why It Matters in Recon

  • Reveals security contact emails

  • May expose bug bounty programs or responsible disclosure policies

  • Could leak data handling practices or organizational structure

Real-World Adoption
Major companies using security.txt include:

  • Google

  • GitHub

  • LinkedIn

  • Facebook

How to Check

Use direct URL access in a browser or curl:

curl -I https://example.com/.well-known/security.txt
curl https://example.com/.well-known/security.txt

Note-Taking

While using the above commands and tools, maintain proper documentation:

  1. Command Executed:
    • Mention the tool and the syntax used.
  2. Output Summary:
    • Record useful results like IPs, detected technologies, or URLs.
  3. Interpretation:
    • Describe what the output indicates and how it can be leveraged.

Example Notes

Host Lookup

Command: host example.com
Output: example.com has address 93.184.216.34
Interpretation: This is the IP address of the hosting server.

Robots.txt

URL: https://example.com/robots.txt
Content:
    User-agent: *
    Disallow: /private/
Interpretation: The directory `/private/` is restricted for crawlers and may contain sensitive data.

WhatWeb

Command: whatweb example.com
Output: 
    WordPress [5.7.2], Apache [2.4.41], PHP [7.4.3]
Interpretation: The website uses WordPress CMS, Apache server, and PHP.

Security.txt

URL Checked:
https://example.com/.well-known/security.txt

Findings:
- Contact email: security@example.com
- Public PGP key available
- Bug bounty acknowledgment page

Interpretation:
The organization actively supports responsible disclosure,
which may indicate a mature security posture.