Content Discovery

What is Content Discovery?

content could mean pictures, videos etc. but we want to discover the hidden ones, like pages or portals intended for staff usage, older versions of the website, backup files, configuraion files, administration panels etc.
Three main ways of discovering content:
- Manually
- Automated
- OSINT (Open-Source Intelligence)

Questions

What is the Content Discovery method that begins with M?

R: Manually

What is the Content Discovery method that begins with A?

R: Automated

What is the Content Discovery method that begins with O?

R: OSINT

Manual Discovery - Robots.txt

robots.txt is a document that tells search engines which pages are or are not allowed to be shown on their search engine or ban specific search engines from crawling the website altogether.
Example of banned pages could be administration portals.

Questions

What is the directory in the robots.txt that isn’t allowed to be viewed by web crawlers?

R: staff-portal

Manual Discovery - Favicon

favicon is a small icon displayed in the browser’s address bar or tab used for branding a website.

sometimes when a framework is used to build a website, a favicon is left over and so we can figure out the framework that was used in building that website.
- then we can check this database of common frameworks

Questions

What framework did the favicon belong to?

Viewed the source and got the path to the favicon:

Used the following command to get the md5 of the favicon:

Got the following md5:

Searched on the md5 databses of favicons and got the result:

R: cgiirc

Manual Discovery - Sitemap.xml

unlike robots.txt which restricts the websites that search engine crawlers can look at, sitemap.xml lists all of the files that a search engine can show.
- from this file we can maybe find some old versions of the website we can exploit.

Questions

What is the path of the secret area that can be found in the sitemap.xml file?

R: /s3cr3t-area

Manual Discovery - HTTP Headers

when we make a request, the server returns a response with various HTTP Headers in it
- sometimes there is useful info in there: webserver software, programming language etc.

Questions

What is the flag value from the X-FLAG header?

R: THM{HEADER_FLAG}

Manual Discovery - Framework Stack

after you find the framework of website, either by the use of the favicon or by looking at the comments on the source of the page, you can locate the framework’s website so we can learn more about the software and its contents and vulnerabilities.

Questions

What is the flag from the framework’s administration portal?

Navigate to the framwork login page we get the following clues:

We login with the credentials given and we get the flag:

R: THM{CHANGE_DEFAULT_CREDENTIALS}

OSINT - Google Hacking / Dorking

OSINT -> freely available external tools

Questions

What Google dork operator can be used to only show results from a particular site?

R: site:

OSINT - Wappalyzer

Wappalyzer is an online tool and beowser extension which helps in finding the technologies a website uses

OSINT - The Wayback Machine

The Wayback Machine is a historical archive of websites that dates back to the late 90s.
this can help you find old pages that might still be active on the current website.

OSINT - Github

first understand git: a version control system that tracks changes to files in a project.
Github is a hosted version of Git on the internet.
use Github’s features to look for company names or websites to try and locate repositories belonging to your target.

Questions

What is Git?

R: Version Control System

OSINT - S3 Buckets

S3 Buckets are a storage device provided by Amazon AWS, allowing people to save files and even static website content in the cloud accessible over HTTP and HTTPS.
the owner can set the access permissions to public, private or even writable.
format of S3 Buckets:
- http(s)://{name}.s3.amazonaws.com
here, name is decided by the owner
one method to find these buckets is using the company name followed by common terms like {name}-assets, {name}-www, {name}-public, {name}-private…

Questions

What URL format do Amazon S3 buckets end in?

R: s3.amazonaws.com

Automated Discovery

used tools: ffuf, dirb and gobuster
ffuf:

dirb:

gobuster:

Questions

Used dirb to search for directories and files in the given domain:

What is the name of the directory beginning “/mo….” that was discovered?

R: /monthly

What is the name of the log file that was discovered?

R: /development.log