• Getting some data

    johnowhitaker04/25/2016 at 17:58 0 comments

    I'm a little past the deadline, so I may net some extra data not counted in the first round but who cares. Here goes:

    I couldn't find an easy way to access a list of entries using the API, so I went to https://hackaday.io/submissions/prize2016/list and downloaded the page source. Some regexp magic, and I have a list of unique project numbers for which I can download data:

    import urllib, json, time, re
    p = re.compile('project/\d\d*-') #search for strings 'project/XXXX-name...'
    f = open("source.html", "r") #The saved source
    matches = p.findall(f.readlines()[2]) # The relevant bit
    project_numbers = [int(s[8:-1]) for s in list(set(matches))] #the list(set( part is to remove duplicates
    Now, I can use the hackaday API to get a json object describing each project:
    url = "https://api.hackaday.io/v1/projects/%sapi_key=MY_KEY"
    data = []
    for ID in project_numbers:
    	response = urllib.urlopen(url % ID)
    	data.append(json.loads(response.read()))
    	time.sleep(1)
    	print(ID)
    import numpy
    numpy.save(open("projects.txt", 'w'), data) #So we don't have to do this each time we start
    So now I can get a project's view count with data[i][u'views'] and so on.