HackaDump

Description

This great website invites us to add more and more contents and our precious time gets invested in a "write only" system. Without a backup and restore function, our projects can only live on their servers and... that's all.

Because it is always too late to back things up, I have written a script to easily dump my work on my own (Linux) computer. It started humbly but it's slowly getting more mature, with each tiny adaptation and fix.

Yet it's just a hack, it lacks features, it doesn't save every detail and it doesn't even use the API (https://dev.hackaday.io). By using this code, you confirm that you understand these limitations and the inherent risks and you take responsibility for all the consequences. Oh, and being fluent with bash scripting is necessary.

Be careful and mindful, kids!

Details

As the #Discrete YASEP grows, logs accumulate and more than 40^W50^W60 logs contain a lot of reflections and experiments, with illustrations of all kinds. With so much work that depends on the goodwill of others, it would be very sad if "something happened" right ?

"Better safe than sorry" so I asked a backup function on the #Feedback - Hackaday.io channel (we'll see the restore function later) but I couldn't wait.

In fact, it's not difficult at all to dump the essential parts of the project, but the script is crude, not optimised for speed, it will break if the website changes its underlying layout, links are saved but not easy to restore, and the styles/presentation is not preserved.

At least I don't have to type/remember everything "if something happens".

The current script has a big shortcoming: it relies on a carefully edited list of log links on the "details" page. Maybe I'll update this one day but so far "it works".

Done: save the personnal "pages" (only the last 3 were saved, before, from the user's main page).

It's a bit slow but as is, there is no risk of flooding the server. And speed is not important if it's run every few weeks...

Feedback welcome, until HaD provides us with a clean, official way to backup (and restore) :-)

PS: the project's logo is a screenshot of the source code of the HaD pages, for those who have never looked at that code ;-)

Logs:
1. better, faster, fatter
2. Files are now supported
3. Some updates and enhancements
4. Formatting guidelines
5. Some more script fun
6. 404
7. Broken script

Files

backup_pages_contrib.sh

(20170122) Like 20160925 and also saves the contributed to projects

x-shellscript - 7.42 kB - 01/23/2017 at 00:09

Download

backup_pages.sh

Latest version (20160925), cleaner, fixing many changes of the page layout

application/x-shellscript - 5.81 kB - 09/25/2016 at 22:56

Download

backup_serial.sh

(*OBSOLETE*) a safer, serialising version, to make errors easier to spot.

x-shellscript - 3.33 kB - 05/10/2016 at 04:15

Download

backup_profile.sh

(*OBSOLETE*) The script (version 20160404)

x-shellscript - 3.27 kB - 04/04/2016 at 06:37

Download

Components

1 × bash

1 × wget

1 × sed Discrete Semiconductors / Diodes and Rectifiers

1 × grep

1 × Brain (usually located between your ears)

Project Logs

Collapse

Broken script
Yann Guidon / YGDES • 09/25/2016 at 15:02 • 0 comments
The layout of the site has changed and I didn't see that my regular backups didn't work well.
Until today.
I'm updating the pattern matching, please stay tuned...
pages: check. Nor sure how to deal with more pages though but the limit is not yet reached.
projects: a few things have been broken, let's investigate step by step.
- main project page: OK (though could still get some more cleanup)
- components: OK
- instructions: OK
- images/gallery: OK
TODO:
- ensmarten wget (inside a function/procedure) so it detects failures and errors that are not reported as 404 by the server !
- cleanup the HTML files to remove most of the HaD formatting and boilerplates. AT LEAST remove that huge HaD logo in ASCII art that is easily compressed but takes a lot of room anyway... (done with a grep oneliner)
- Test if a file has already been downloaded and remove previous identical versions of the past backups... (meaningful for the big files !)
lun. sept. 26 00:50:45 CEST 2016 : New version online ! It took only 8h to polish it but it's well worth it. UPDATE YOUR SCRIPTS !
404
Yann Guidon / YGDES • 05/10/2016 at 04:14 • 2 comments

During my last run, I briefly saw 404 errors but couldn't make sense of them because the script output was scrambled between different commands.
These last days/weeks, I've noticed more transient errors on hackaday.io and I have to find a way to wait and retry if the page fails to load the first time...
Until then, I made a different version with all parallelising removed and the output is also saved to a log file, for easy grepping. The new file backup_serial.sh is slower but apparently safer.
Actually, 404 errors are becoming endemic. One script run can get a few or more and there is no provision yet to retry... I have to code this because several independent runs are required to get a good sampling of the data.
Some wget magic should be done ...
New twist !
No 404 error this time. The page migh load but the contents will be "something is wrong. please reload the page." I should have made a screenshot and saved the page to extract its textual signature...
I must find a way to restart the download when this error occurs too.
Some more script fun
Yann Guidon / YGDES • 04/13/2016 at 12:51 • 9 comments

I just hacked this. Shame on me !
Let's say, it might be useful to those who test bash on W10...
Read more »
Formatting guidelines
Yann Guidon / YGDES • 04/04/2016 at 05:08 • 0 comments

I'm lazy.
I'm too lazy to implement a proper scraper for log pages, even though I would spare efforts by making some efforts. I have even started to implement a suitable feature for the projects list pages. But the "quick and dirty solution" so far is to list all the project logs by hand, in the "details" page. After all there are other advantages, including easier navigation.
The script uses grep and sed to recognise a specific pattern that indicates the start of the list. First, note that the elements are separated by a line break, "<br>" code in HTML, so you have to hit "shift+enter" instead of only "enter" (which generates a paragraph "<p>")
The list starts with a bold keyword, recognised in HTML by: "<strong>Logs:</strong>" (click on the bold B in the edition menu)
Then the rest of the page should be the list of links. Each link starts at the beginning of each line (remember: shift+enter) with a number (no ordering is checked) followed by a dot and a space, then a link ("<a ") and a line break. Yeah, these are absolute links, so be careful...
overall the script detects this:
Logs:
42. some link
43. another link

There are some other minor gotchas so don't hesitate to look at the scraped and sed'ed files named logs.url if something is weird.
I told you it was dirty...
Some updates and enhancements
Yann Guidon / YGDES • 03/28/2016 at 19:53 • 0 comments
Time for an update !
- Fixed a parsing issue (the pages have changed a tag from <h2> to <h1>)
- Support more than one projects page (I was wondering why all my projects didn't get saved.... Now I look at the "next" link to build the list of projects)
- Kinder to the server, to avoid triggering DOS/flood protection from the image server. It's slower but it's not critical...
My backups now use several minutes and around 17MB.
It could be faster because a lot of log pages return "301 Moved Permanently", this should be fixed with a better parsing and directly reading the logs pages (those that are in chunks of 10 logs).
Files are now supported
Yann Guidon / YGDES • 01/10/2016 at 12:51 • 0 comments

Hello HaD crowd !
The admins have now provided us with a 1GB storage area with a nice listing page, similar to the other resources. I have updated the script to fetch everything AND I've put the new script in the download area.
Fun fact: when I'll next backup my projects, the script will download itself, if all goes well ;-)

better, faster, fatter

Yann Guidon / YGDES • 12/15/2015 at 01:24 • 0 comments

Today I have 19 projects on hackaday (even after I asked al1 to take ownership of #PICTIL) and I need to automate more !

So I added more features, parallelised the script a bit, scraping more pages and more conditional execution to adapt to each project (some have building instructions, others have logs, some have nothing...)

So here is the new version in its whole ugliness ! (remember kids, don't do this at home, yada yada)

#!/bin/bash

MYHACKERNUMBER=4012 # Change it !

fetchproject() {
  mkdir $1
  pushd $1

    # Get the main page:
    wget -O main.html "https://hackaday.io/project/$1"

    grep '<div class="section section-instructions">' main.html &&
      wget -O instructions.html "https://hackaday.io/project/$1/instructions/" &

    # Get the images from the gallery
    wget -O gallery.html "https://hackaday.io/project/$PRJNR/gallery"
    grep 'very-small-button">View Full Size</a>' gallery.html |\
    sed -e 's/.*href="//' \
        -e 's/".*//' |\
    tee images.url
    [[ "$( <  images.url )" ]] && ( \
      mkdir images
      pushd images
        wget -i ../images.url
      popd
    ) &

    # Get the general description of the project
    detail=$(grep 'show">See all details</a' main.html|sed 's/.*href="/https:\/\/hackaday.io/; s/".*//')

    if [[ "$detail" ]]; then
      echo "getting $detail"
      wget -O detail.html "$detail"

      # list the logs:
      grep 'https://hackaday.io/project/.*/log/' detail.html|\
      sed -e 's/.*<strong>Logs:<\/strong>//' \
          -e 's/<br>/\n/g' \
          -e 's/<p>/\n/g'|\
      grep '^[0-9]*[.] <a ' |\
      tee index.txt

      sed 's/.*href="//' index.txt |\
      sed 's/".*//' |\
      tee logs.url

      if [[ "$( <  logs.url )" ]]; then
        mkdir logs
        pushd logs
          wget -i ../logs.url &
        popd
      fi
    fi
  popd
}

######### Start here #########

DATECODE=$(date '+%Y%m%d')
mkdir $DATECODE
pushd $DATECODE

  wget -O profile.html https://hackaday.io/hacker/$MYHACKERNUMBER

  # List all the projects:
  wget -O projects.html https://hackaday.io/projects/hacker/$MYHACKERNUMBER
  #stop before the contributions:
  sed '/contributes to<\/h2>/ q' projects.html |\
  grep 'class="item-link">' |\
  sed -e 's/.*href="\/project\///' -e 's/".*//' |\
  tee projects.names

  ProjectList=$( < projects.names )
  if [[ "$ProjectList" ]]; then
    for PRJNR in $ProjectList
    do
      ( fetchproject $PRJNR ) &
    done
  else
    echo "No project found."
  fi
popd

I still have to make a better system to save the logs, I have an idea but...

PS: it's another quick and dirty hack, so far I'm too lazy to look deeply into the API. It's also a problem of language since bash is not ... adapted. Sue me.

OTOH the above script works and does not require you to get an API key.

View all 7 project logs

Discussions

Yann Guidon / YGDES wrote 01/19/2022 at 05:07

https://dev.hackaday.io/doc/api exists.

my motivation though is not sufficient...

Are you sure? yes | no

Yann Guidon / YGDES wrote 07/23/2017 at 19:08

The new page layout breaks my script :-(

Are you sure? yes | no

Yann Guidon / YGDES wrote 07/23/2017 at 19:23

Apparently, WGET_HTML() removes WAY TOO MUCH from the page, the <body> has disappeared...

Are you sure? yes | no

RoGeorge wrote 07/24/2017 at 03:40

Welcome to the club!

My scripts for gathering statistics were broken too.

Are you sure? yes | no

Yann Guidon / YGDES wrote 11/11/2016 at 05:09

Well well well, does the script save the page banner picture ?....

Are you sure? yes | no

Dave Gönner wrote 05/21/2016 at 09:49

Thanks for the script mate, works like a charm!

Are you sure? yes | no

Yann Guidon / YGDES wrote 05/21/2016 at 11:19

Wonderful :-)

Are you sure? yes | no

Eric Hertz wrote 04/21/2016 at 08:21

My first run was a couple days ago, looks pretty useful and grabbed the vast-majority of my stuff with only a single line change :)

There were some error messages, but didn't interfere with the process, I'll let you know if they're anything important.

Thanks for sharing this, yo!

Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 08:26

Wow, I'm surprised someone actually run it at home :-D

Depending on your projects, the script requires some tuning : you have more projects than me so you should be even more careful to not hammer the server ;-)

Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 08:35

BTW did you recover/save all the project logs ? I use the Formatting guidelines https://hackaday.io/project/8536/log/35174-formatting-guidelines in the details page of each project, otherwise it won't save everything... You'll have to try and see, check the intermediary files.

Are you sure? yes | no

Eric Hertz wrote 04/21/2016 at 09:46

Ah hah! Thanks for that heads-up, I just ran it as-is as a quick/emergency backup in case my #"From Nerd to Criminal in Seven Easy Years" turned out more-negative than it did ;)

But, I do want to run a complete backup more regularly, so I'll probably be doing some looking into that in the near future. Why, if you don't mind my asking, don't you just have it download directly from the logs page?

Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 10:01

Well there is a question of ... programming convenience.

One day I'll figure out the proper algo to harvest the logs correctly. For now I'm lazy but I think I have a hint with the algo that checks the next projects page...

For now I'm routing a PCB :-D

Are you sure? yes | no

Ivan Lazarevic wrote 11/23/2015 at 19:17

good idea. maybe you should try to use api for this https://dev.hackaday.io/

Are you sure? yes | no

Yann Guidon / YGDES wrote 11/23/2015 at 20:29

Damn ! why do I find this AFTER I spent time doing this ?

Are you sure? yes | no

Yann Guidon / YGDES wrote 11/24/2015 at 11:46

I just went to https://dev.hackaday.io/applications and created a key for hackadump.
Maybe I'll figure out how to use the keys in my scripts ? :-)

Are you sure? yes | no

jaromir.sukuba wrote 11/23/2015 at 16:55

Oh, here we go. I have to try it out.

Are you sure? yes | no

Yann Guidon / YGDES wrote 11/23/2015 at 20:27

use wisely :-)

Are you sure? yes | no

HackaDump

Description

Details

Files

backup_pages_contrib.sh

backup_pages.sh

backup_serial.sh

backup_profile.sh

Components

Project Logs

Collapse

Broken script

404

Some more script fun

Formatting guidelines

Some updates and enhancements

Files are now supported

better, faster, fatter

Discussions

Similar Projects

pyNNG Nanomsg NextGen Async Examples

Fire and Forget Wardriving

LZRTag - Flexible DIY Lasertag

No coding! iot developer you can do it!

HackaDump

Become a Hackaday.io member

Just one more thing

Description

Details

Files

Components

Project Logs Collapse

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse