GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Judeau (EatTheRich) (judeau@mas.to)'s status on Friday, 07-Nov-2025 03:45:52 JST Judeau (EatTheRich) Judeau (EatTheRich)

    There is an old website that I really want to preserve and archive a section of. Basically like how the Wayback machine does.

    I would really like to be able to essentially browse the website offline on my computer.

    Also, if it does not increase complexity, i would like to retain all of the downloadable links and files as well.

    Is there a program or a simple way to go about this? Any help would be appreciated.

    #AskFedi #AskMastodon #archive

    In conversation about 5 months ago from mas.to permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: sapusmidjan.is
      Home
      from tandie
    • Embed this notice
      Peter Krefting (nafmo@social.vivaldi.net)'s status on Friday, 07-Nov-2025 03:45:52 JST Peter Krefting Peter Krefting
      in reply to

      @Judeau If it is a simple static site, GNU Wget has a recursive download mode which can also convert links for offline browsing.

      In conversation about 5 months ago permalink
    • Embed this notice
      Peter Krefting (nafmo@social.vivaldi.net)'s status on Friday, 07-Nov-2025 21:22:57 JST Peter Krefting Peter Krefting
      in reply to
      • Nazo

      @nazokiyoubinbou @Judeau There is a --span-hosts options that seems to be there to download things from other servers as well; I haven't used Wget in this mode in several years, so I don't know how well it works, though.

      In conversation about 5 months ago permalink
    • Embed this notice
      Nazo (nazokiyoubinbou@urusai.social)'s status on Friday, 07-Nov-2025 21:22:58 JST Nazo Nazo
      in reply to
      • Peter Krefting

      @Judeau @nafmo Will this grab external resources though?

      A lot of sites may directly block recursive activity (falsely — or not depending on how you look at it — determining it to be bot activity.) You'll want to, at the very least, add --random-wait in the commandline so it looks less obviously bot-like (and hits the server less hard anyway.)

      Are there a lot of pages? One thing I'm enjoying is the "SingleFile HTML" plugin to save a page into a single .html file instead of a file + directory with broken resources. However, this would give you individual pages, not the equivalent of a functioning site.

      I think there are actually tools specifically designed for archiving sites though. Stuff more directly designed to handle external resources and all. Sadly I can't remember them...

      In conversation about 5 months ago permalink
    • Embed this notice
      Judeau (EatTheRich) (judeau@mas.to)'s status on Friday, 07-Nov-2025 21:23:11 JST Judeau (EatTheRich) Judeau (EatTheRich)
      in reply to
      • Peter Krefting

      @nafmo Thanks for the tip. I have only messed around with Wget a few times and it's been many years ago.

      I used it for grabbing files off an FTP like site. I imagine a website might be a bit more complicated for me, but I will definitely take a look at it.

      Thanks again!

      In conversation about 5 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.