Close

3D Web Scraping

A project log for Metaverse Lab

Experiments with Decentralized VR/AR Infrastructure, Neural Networks, and 3D Internet.

alusionalusion 10/14/2016 at 05:350 Comments

FireBoxRoom Scraper

These scripts are meant to make the lives of Metaverse explorers and developers better while helping to decentralize the Metaverse by archiving assets to the Interplanetary Filesystem. While exploring the immersive web using JanusVR, pressing Ctrl+S will copy the source code of the site you are currently on (as well as download the html/json file to your workspace folder) to the clipboard.

One line to brute download assets in a FireBoxRoom with absolute paths. This requires the package 'wget' to be installed, otherwise you can chop off the part '&& wget -i assets.txt' and just have the assets.txt file serve as a list of absolute links found in the file.

cat index.html | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | uniq > assets.txt && wget -i assets.txt

I wrote a script using python3 to more politely index and count the various assets in a given FireBoxRoom and optionally download them separately or all at once. https://gitlab.com/alusion/fbparser

I plan to update this script to accept a url argument to easily scrape relative pathsand be able to archive VR websites with IPFS.

Discussions