Code Highlight: Scraping IP’s from Mullvad

We frequently rely on publicly available data such as IP addresses and hostnames to conduct our testing and analysis. Sometime it’s easy to access and sometimes we have to do a little extra work. However, getting access to this data in a structured format can be quite challenging. Enter the Mullvad Extractor – a nifty script that extracts IP addresses and hostnames from the Mullvad VPN servers page.

Our Project: Extracting VPN Server Data

Our goal was to extract domain names and IPv4 addresses of VPN servers from the Mullvad website and save them as a CSV file. While this might sound straightforward at first, we encountered a few roadblocks along the way – namely SvelteKit’s lazy loading feature.

The Challenges: SvelteKit Lazy Loading

SvelteKit is a fantastic framework for building web applications, but its lazy loading feature presented some challenges when it came to extracting data. The website we were working with loads its content dynamically, meaning that the VPN server information only becomes available in the DOM as users scroll down the page. This behavior complicated our data extraction efforts.

To work around SvelteKit’s lazy loading, we used JavaScript to simulate scrolling, giving the website enough time to load the data before moving on to the next section. This way, we could extract the required information without missing anything important.

The Solution: A JavaScript Extravaganza

1. Asynchronous Functions: Keeping Things Smooth

To ensure a smooth user experience, we made use of asynchronous functions. By doing so, we avoided blocking the main thread, allowing the browser to remain responsive during the execution of our script.

The main() function, which is the core of our script, was declared async. We also employed await with the sleep() function to pause execution at specific points, giving the website enough time to load content before continuing.

For example, we used await sleep(800) before and after clicking on an expandable icon element. This allowed the website to properly load the content before our script continued to process it.

2. Mutation Observers: Keeping an Eye on Changes

Since the website we worked with used SvelteKit’s lazy loading, we had to find a way to detect when new content was added to the page. Enter mutation observers!

Mutation observers allowed us to monitor the DOM for changes and update our extracted data accordingly. By setting up a MutationObserver instance, we were able to observe the container element where new content was added. Whenever new content was loaded, the observer would detect the change, and our script would process the new information.

This approach proved to be efficient and robust, as it allowed our script to work seamlessly with the website’s dynamic nature.

3. Sets: A Unique Approach to Data Deduplication

During the data extraction process, we noticed that our script sometimes captured duplicate entries. To address this issue, we used a JavaScript Set to store unique IP addresses.

A Set is an ordered collection of unique values, which means it automatically removes duplicates. By adding each IP address to a Set, we ensured that our final extractedData array contained only unique entries.

Once we had a unique list of IP addresses, we easily converted the data into CSV format and provided a downloadable file for the user.

Lessons Learned and Limitations

While our solution successfully extracts the VPN server information, it’s not without its limitations. The script relies on the specific structure of the website, meaning that changes in the site’s layout or classes may break the code. Additionally, the sleep durations used in the script might need to be adjusted if the website’s loading speed changes.

In conclusion, our script demonstrates how to tackle the challenges of extracting data from a website using SvelteKit’s lazy loading feature. It shows that with a little creativity and perseverance, we can overcome such hurdles and build a solution that works.

So the next time you find yourself facing a dynamic website that’s trying to keep its data hidden from view, remember our little adventure here and know that you too can conquer the challenge with a bit of JavaScript magic! Happy coding!

The Code

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

async function main() {
    try {
        await sleep(3000);

        const container = document.querySelector('div[style*="overflow-y: auto"]');
        if (!container) {
            throw new Error('Could not find the container element');
        }

        const expandableIconElements = document.querySelectorAll('.expanding-icon');
        const extractedData = [];

        const observer = new MutationObserver((mutations) => {
            mutations.forEach((mutation) => {
                if (mutation.type === 'childList') {
                    mutation.addedNodes.forEach((node) => {
                        if (node.nodeType === Node.ELEMENT_NODE && node.querySelector('.servers-dl')) {
                            const ipElement = node.querySelector('.servers-dl .dt.no-uppercase + .dd');
                            const domainElement = Array.from(node.querySelectorAll('.servers-dl .dt')).find(el => el.textContent === "Domain name");
                            const domainText = domainElement ? domainElement.nextElementSibling.textContent : "Not found";
                            
                            if (ipElement) {
                                const ip = ipElement.innerText;
                                extractedData.push({ domain: domainText, ip });
                            } else {
                                console.warn('Could not find the IPv4 element in the expanded div:', node);
                            }
                        }
                    });
                }
            });
        });

        observer.observe(container, { childList: true, subtree: true });

        for (let i = 0; i < expandableIconElements.length; i++) {
            const iconElement = expandableIconElements[i];
            const parentElement = iconElement.parentElement;
            if (parentElement) {
                parentElement.scrollIntoView({ behavior: 'smooth', block: 'center' });

                await sleep(800);

                parentElement.click();

                await sleep(800);
            }
        }

        observer.disconnect();

        // Remove duplicate entries from extractedData
        const uniqueData = [];
        const uniqueDomains = new Set();
        for (const data of extractedData) {
            if (!uniqueDomains.has(data.domain)) {
                uniqueDomains.add(data.domain);
                uniqueData.push(data);
            }
        }

        // Convert uniqueData to CSV format
        const headers = Object.keys(uniqueData[0]).join(",");
        const csvData = uniqueData.map(obj => Object.values(obj).join(",")).join("\n");
        const csvContent = `${headers}\n${csvData}`;

        // Create a blob object from the CSV content and download it as a file
        const blob = new Blob([csvContent], { type: "text/csv;charset=utf-8;" });
        const url = URL.createObjectURL(blob);
        const link = document.createElement("a");
        link.setAttribute("href", url);
        link.setAttribute("download", "vpn-servers.csv");
        link.style.visibility = "hidden";
        document.body.appendChild(link);
        link.click();
        document.body.removeChild(link);

        console.log('Data has been downloaded as a CSV file.');
    } catch (error) {
        console.error('An error occurred during script execution:', error);
    }
}

main

Markus Askildsen

Guest Writer

Markus Askildsen is a tech-loving Norwegian with an insatiable appetite for adventure and a passion for the outdoors. Hailing from the stunning landscapes of Scandinavia, he balances his love for hiking with an immersion in cutting-edge technology. As an author, Markus weaves tales of exploration and discovery that inspire readers to embrace the unknown and appreciate the wonders of both the natural world and the digital realm. Whether scaling remote peaks or navigating virtual reality, Markus encourages others to live life to the fullest and never shy away from excitement and beauty just beyond the horizon.

See Also: