Sometimes, for the research we need to download massive amounts of content from password-protected websites. It can be done manually using Internet Explorer, though it becomes a problem when downloading large amounts of pages. Manually, it is a tedious task, which can be completely automated.
Automating the download process of massive content
For my work, I use the following tools: firefox and curl. Both of these tools are available for Windows and Linux. Curl is a web page downloading tool. It is similar to the well known wget tool. I guess I do not need any words for Firefox. I can say that it is much better alternative to Internet Explorer.
Basically, establish a session with the password-protected website using Firefox. Use the resulting cookie in a standalone curl application to fetch website pages.
Step 1. (optional) Clean Cookies.
Clean all cookies to reduce load as follows: Firefox Menu -> Tools -> Clear Private Data. The Clear Private Data window appears. Select the Cookies checkbox as follows:
Step 2. Log in to the website.
Go to password-protected website and fill in your username and password and login.
For example, http://demo.greensql.net/
Step 3. Getting the cookie.
Open Firefox Preferences Window and select the Privacy tab in Firefox Menu -> Edit -> Preferences -> Privacy tab:
Next, click the Show Cookies button. Select the required website and open the
Pay attention to the cookie's name and the content fields, we will use them in the next step.
Step 4. Create the cookie container.
Cookie container is a special file used to store cookie values. curl can use this file to get access to your established session (you have signed in to the website, right?).
Open your favorite text editor and create the cookie.txt file:
server.com FALSE / FALSE 0 cookie-name cookie-content
All fields must be separated by tabs.
demo.greensql.net FALSE / FALSE 0 PHPSESSID a571d80ccf683df8da9031ada698336e
Step 5. Download the website.
We are almost done! Download the required pages. Run the following command:
curl --cookie-jar cookie.txt -b cookie.txt http://demo.greensql.net/log_view.php
The cookie prompts the website to accept you as a legitimate user!
Now you can continue to download all required content by running the shell command.
Command for downloading GreenSQL database names:
* curl --cookie-jar cookie.txt -b cookie.txt http://demo.greensql.net/db_view.php?id=1
* curl --cookie-jar cookie.txt -b cookie.txt http://demo.greensql.net/db_view.php?id=2
* curl --cookie-jar cookie.txt -b cookie.txt http://demo.greensql.net/db_view.php?id=3