~/Blog

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI studying Automated Reasoning in AI and Linux Enthusiast.

Mirroring or Archiving an Entire Website

Published on

Updated on

Warning: This post has not been modified for over 2 years. For technical posts, make sure that it is still relevant.

I have several old Wordpress sites lying around that I would like to archive but not maintain anymore. Since I don’t intend to create any more content on these sites, we can use tools like wget to scrape an existing site and provide a somewhat read-only copy of it. I say read-only not because we can’t edit it, but because it’s not in the original source format of the website.

There have been several tackles to the problem:

And ultimately after consulting these resources I’ve came to the following command:

wget --mirror \
     --convert-links \
     --adjust-extension \
     --page-requisites \
     --no-verbose \
     https://url/of/web/site

There were other solutions in that stack overflow post, but something about the simplicity of wget appealed to me.

Example site I archived with this.

Reply via Email Buy me a Coffee
Was this useful? Feel free to share: Hacker News Reddit Twitter

Published a response to this? :