Creating a static copy of a Drupal, Wordpress or other CMS website
wget -P . -mpck --html-extension -e robots=off --wait 0.5 <URL>
To understand the flags, you can check man wget
of course, but some
explanations follow here:
- -P - Tell where to store the site
- -m - Create a mirror
- -p - Download all the required files (.css, .js) needed to properly render the page
- -c - Continue getting partially downloaded files
- -k - Convert links to enable local viewing
- –html-extension - Add the .html extension after file names. This is important since when serving the plain files, a web server such as NGinx need the .html extension to know that the files should be sent directly to the user’s browser, not offered as a file to download. See below for how to redirect from old to new links.
- -e robots=off - Don’t read the robots.txt file. Not sure exactly how this one works, but I got a lot of errors if not including it.
- –wait 0.5 - It is better to not overwhelm the web server where your site is hosted, by waiting a little between each page download.
After finishing this command, you will have a folder with static HTML-files and other files, that you can just upload to your web server instead of your CMS.
Finally, you might want to add this rule to the Nginx config, to make sure the old non-.html URLs are redirected to the .html variant:
location / {
if ($request_filename !~* (/|(.+).(html|css|js|gif|png|jpg))$ ) {
rewrite ^(.+)$ $1.html permanent;
}
}
Add these lines in the appropriate server config in the relevant file, such as /etc/nginx/sites-enabled/default.
What the rule does, is that for all URLs which are not the home page (/) or static files with any of the common file extensions, it will redirect to the same URL with ‘.html’ padded on at the end.
That’s it! Visit my now archived old blog at saml.rilspace.com for an example if you wish!
Samuel