Dynamic data in Google Spreadsheet

At the new job, I started piecing together a Google Spreadsheet about our open-source software in various languages, based on frameworks, or example applications. One of the things we wanted to track for each repository was a count of open issues and pull requests we had at GitHub. Turns out Google made this pretty easy, but you have to scrape the repo page first.

I wrote a scraper in Python using Flask, Requests and BeatifulSoup, hosted the project on Heroku, and works a little like this:

# app.py
# listen to a URL like /github.com/GetStream/stream-python
# return a JSON block of both the count of issues and pull_requests
def scraper_nometadata(site, account, repo):
    return scraper(site, account, repo, None)
# listen to a URL like /github.com/GetStream/stream-js/issues or
# /github.com/GetStream/stream-node-orm/pull_requests
def scraper(site, account, repo, metadata):
    url = 'https://{site}/{account}/{repo}'.format(
    counters = []
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")
    # look for a CSS class called 'counter', it's only used three
    # times on github.com for counting issues, pull requests and
    # projects
    # I didn't test this on bitbucket or other sites yet
    counters = soup.select(".counter")
    counter_position = None
    if metadata:
        if metadata == 'issues':
            counter_position = 0
        elif metadata == 'pull_requests':
            counter_position = 1
        return jsonify(counters[counter_position].contents[0])
    return jsonify({
        'issues': counters[0].contents[0],
        'pull_requests': counters[1].contents[0],

The code took minutes to write. I spent about three times as long figuring out how to launch a Python app on Heroku, so here are some tips:

1. You’ll need to upload a requirements.txt with your project. I recommend including gunicorn.

# requirements.txt

2. To launch on Heroku, you’ll need a file called “Profile” to tell Heroku how to launch your application.

# Procfile
web: gunicorn app:app

3. Before¬†you push code to Heroku, you’ll need to run this at the command line:

# launch a web instance for your app
$ heroku ps:scale web=1

Always check “heroku logs” if something doesn’t work.

Loading the Data Dynamically in a Google Spreadsheet

Now that that’s working, you can add this data to your Google Spreadsheet by using the IMPORTDATA() method. It takes a URL as a parameter, and updates data about once per hour. There are ways to load the data more often, but that’s not something we need to worry about. Your field data will look something like:


Leave a Reply