How to Copy a Website to Your Computer
There are a variety of reasons why you would want to make a copy of a website to your PC. Perhaps you're going somewhere where there's no WiFi connection, like an airplane, and you want to have the website on hand. Maybe you're doing research on a competitor's website. Maybe you just got a trial subscription to a particular website and you want to download as much content as possible for later reference. Whatever the reason--the process is a lot easier than you think--thanks to a free program called HTTrack Website Copier. HTTrack will not only copy the website to your computer, but it will also allow you to browse the website on your machine, just as if you were online.
Things You'll Need
- Computer (Windows in this example)
- Internet Connection
- HTTrack software (See Resources section for link)
Download HTTrack (see Resources below). Once you're at the website, click "Download" to go to the download page and choose the latest exe. Save the exe to your computer.
Run the file you downloaded to install HTTrack. Accept the license agreement and click "Next" through all the windows and you're done. Leave "Launch WinHTTRack Website Copier" checked to automatically open the program after the install is done.
Click "Next" when you see the introduction window.
Enter a project name in the project window (it can be whatever you like) and enter a path where you want the website stored. I suggest putting it into a new folder of its own, because you're going to have a lot of files when you're done.
Enter the URL for the website you want to copy on the next page. Just click the button that says "Add URL" and type it in. If you want to copy a whole site, only enter the main part the the URL up to the .com or .net, etc. If you only want to copy specific pages, then just enter the specific URLs for those pages. If a site needs authorization or is a subscription site, it is possible to still copy it, but it does require some setup, which is beyond the scope of this tutorial. The secret lies in using the "Capture URL" button.
Click "OK." Now your URL shows up in the "Web Addresses" list. You can also copy and paste multiple URLS straight into this window.
Set other options by hitting the "Set Options" button. Under the Options you can tell HTTrack to avoid downloading certain items such as file types. For the most part you can just ignore the options and leave the defaults. If you find that HTtrack is having trouble downloading the images off of a site, then the robots.txt file for the site is probably blocking you from doing so. You can fix this by going into Options and choosing the Spider tab. Set the Spider drop down to "no robots.txt rules."
Hit "Next." This is a settings page where you can tell HTTrack to use a remote access connection or to shut down when the download is done. You can just leave the defaults.
Click "Finish" and the copying will begin. The program will start on the index page and then follow all of the static links until it has identified every possible page under the main URL that it can find. Then it will download them to the folder you specified. Any links or image paths will be rewritten by HTTrack to compensate for the fact that they are now located on your computer. Unless you specified otherwise, all PDFs or other documents hosted by the site will also be downloaded. *Note: Server side apps such as java, databases and non-static links will not be downloaded. Depending on how big the site is, the download process could take a long time.
Once the download is finished, HTTrack will notify you by making a somewhat disturbing "ahhhh" sound, and you will be presented with a "Site mirroring finished!" window. To view your newly copied site, click the button that says "Browse Mirrored Website." You can also open it by browsing the folder where you stored the site and double-clicking "index.html." You'll notice that the path to the page points to your hard drive, but the page looks exactly like the website, and you can browse from page to page just as if you were online.