API Keys, API Keys everywhere

It's common knowledge that developers often leak sensitive information when they publish open source code.

A single mistake can accidentally leak out enough information for an attacker to infiltrate a company and tear it down from the inside out or rack up huge bills in the name of mining bitcoin. The concept of secret confidential information is very often lost on people in the name of ease.

By taking a simple script we can get a few AWS keys pretty quickly:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import requests
import time
from bs4 import BeautifulSoup

def github_login(username, password):
    s = requests.session()
    token = BeautifulSoup(s.get('https://github.com/login').text) \
        .select('input[name=authenticity_token]')[0]['value']
    data = {
        'commit':'Sign in',
        'utf8':'✓',
        'authenticity_token': token,
        'login': username,
        'password': password
    }
    s.post('https://github.com/session', data=data)
    return s

s = github_login(raw_input('username: '), raw_input('password: '))
search_terms = [x.strip() for x in open('search_terms.txt').readlines()]

for term in search_terms:
    with open('{}.txt'.format(term.replace(' ', '_')), 'w+') as f:
        for p in xrange(1, 101):
            print term
            soup = s.get('https://github.com/search?p={}&q={}&ref=searchresults&type=Code&utf8=✓' \
                .format(p, term)).text
            soup = BeautifulSoup(soup)

            for i in soup.select('.code-list-item'):
                data = i.get_text().encode('utf8')
                f.write(data)
                print data

            time.sleep(10)

This yielded about 60 AWS keys from one run. I presume over multiple iterations run over a few days or weeks I'd get more.

I used both my personal Github account and a throwaway that I created specifically for testing this and on both I received API keys and even usernames and passwords. I didn't test any of them since that would be mean and probably illegal but I do suspect that many of these keys have already been scooped up by a bot of some kind since this idea is very well known.

Young developers and even those already with jobs need to understand the risks of open source software and how their online presence can compromise both their security and their employers.

But AWS keys are really only one kind of key that people will leak.

Here's a whole bunch of search terms that gave me results.

filename:credentials aws_access_key_id
filename:credentials aws_secret_access_key
MONGODB_URI
MAILCHIMP_API
MAILGUN_API
MAILGUN_DOMAIN
SENDGRID_API_KEY
GMAIL_PASSWORD
REDIS_URL
REDISCLOUD_URL
SQLALCHEMY_DATABASE_URI
DATABASE_URL
FACEBOOK_APP_SECRET
GITHUB_KEY
GITHUB_SECRET
API_KEY
STRIPE_CONNECT_CLIENT_ID
STRIPE_CONNECT_SECRET_KEY
RECAPTCHA_PRIVATE_KEY
RECAPTCHA_SITE_KEY
MAP_API_KEY
mongolab.com
mlab.com
mongohq.com
amqp_url
INSTAGRAM_CLIENT_ID
FACEBOOK_ID
FACEBOOK_SECRET
FLICKR_API_KEY
bingAccKey
TMDB_API_KEY
SLACK_TOKEN
SLACK_WEBHOOK
SLACK_WEBHOOK_SECRET
SLACK_INCOMING_WEBHOOK_URL
FIREBASE_URL
FIREBASE_ROOM_KEY
GITHUB_USERNAME
GITHUB_OAUTH_TOKEN
TWILIO_TOKEN
MASHAPE_KEY
DOCKERCLOUD_APIKEY
redmine_token
REDMINE_API_KEY
GOOGLE_API_KEY
GOOGLE_CSE_ID
STATUS_CAKE_API
SPOTIFY_SECRET
KG_API_KEY
JIRA_USERNAME
JIRA_PASSWORD
WEATHER_API_KEY
HEROKU_API_KEY
SECRET_KEY_BASE
digitalocean_client
digitalocean_api

Unfortunately this is a problem that will likely never go away since it really only takes a single slip up for confidential information to get leaked.

Environment variables are the way forward of course but proper open source development isn't always discussed in educational scenarios and often times they're just ignored for small projects. I'm guilty of this, we all are and it's hard to resist sometimes.

I'm not entirely sure why Github doesn't protect against this since it's fairly easy to identify these kinds of searches and lock the user out or not provide search results. Even if the bots were to change their search queries, Github should be able to see the new searches and change their restrictions.

When I started this, I originally used a completely unauthenticated script and got the original set of AWS keys pictured. But upon revisiting the script I noticed that unauthenticated users could not search so I modified the script to use a throwaway account and I still received plenty of keys.

In my opinion, Github has the opportunity to prevent this abuse but doesn't seem to be doing much about it. I could be mistaken and I could just have text files of nothing but bogus API keys but it doesn't seem likely...