API Keys, API Keys everywhere

It's common knowledge that developers often leak sensitive information when they publish open source code.

A single mistake can accidentally leak out enough information for an attacker to infiltrate a company and tear it down from the inside out or rack up huge bills in the name of mining bitcoin. The concept of secret confidential information is very often lost on people in the name of ease.

By taking a simple script we can get a few AWS keys pretty quickly:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import requests  
import time  
from bs4 import BeautifulSoup

def github_login(username, password):  
    s = requests.session()
    token = BeautifulSoup(s.get('https://github.com/login').text) \
        .select('input[name=authenticity_token]')[0]['value']
    data = {
        'commit':'Sign in',
        'utf8':'✓',
        'authenticity_token': token,
        'login': username,
        'password': password
    }
    s.post('https://github.com/session', data=data)
    return s

s = github_login(raw_input('username: '), raw_input('password: '))  
search_terms = [x.strip() for x in open('search_terms.txt').readlines()]

for term in search_terms:  
    with open('{}.txt'.format(term.replace(' ', '_')), 'w+') as f:
        for p in xrange(1, 101):
            print term
            soup = s.get('https://github.com/search?p={}&q={}&ref=searchresults&type=Code&utf8=✓' \
                .format(p, term)).text
            soup = BeautifulSoup(soup)

            for i in soup.select('.code-list-item'):
                data = i.get_text().encode('utf8')
                f.write(data)
                print data

            time.sleep(10)

This yielded about 60 AWS keys from one run. I presume over multiple iterations run over a few days or weeks I'd get more.

Whether they work or don't I don't know, but definitely not a good sign

I used both my personal Github account and a throwaway that I created specifically for testing this and on both I received API keys and even usernames and passwords. I didn't test any of them since that would be mean and probably illegal but I do suspect that many of these keys have already been scooped up by a bot of some kind since this idea is very well known.

Github pls.

Young developers and even those already with jobs need to understand the risks of open source software and how their online presence can compromise both their security and their employers.

But AWS keys are really only one kind of key that people will leak.

Here's a whole bunch of search terms that gave me results.

filename:credentials aws_access_key_id  
filename:credentials aws_secret_access_key  
MONGODB_URI  
MAILCHIMP_API  
MAILGUN_API  
MAILGUN_DOMAIN  
SENDGRID_API_KEY  
GMAIL_PASSWORD  
REDIS_URL  
REDISCLOUD_URL  
SQLALCHEMY_DATABASE_URI  
DATABASE_URL  
FACEBOOK_APP_SECRET  
GITHUB_KEY  
GITHUB_SECRET  
API_KEY  
STRIPE_CONNECT_CLIENT_ID  
STRIPE_CONNECT_SECRET_KEY  
RECAPTCHA_PRIVATE_KEY  
RECAPTCHA_SITE_KEY  
MAP_API_KEY  
mongolab.com  
mlab.com  
mongohq.com  
amqp_url  
INSTAGRAM_CLIENT_ID  
FACEBOOK_ID  
FACEBOOK_SECRET  
FLICKR_API_KEY  
bingAccKey  
TMDB_API_KEY  
SLACK_TOKEN  
SLACK_WEBHOOK  
SLACK_WEBHOOK_SECRET  
SLACK_INCOMING_WEBHOOK_URL  
FIREBASE_URL  
FIREBASE_ROOM_KEY  
GITHUB_USERNAME  
GITHUB_OAUTH_TOKEN  
TWILIO_TOKEN  
MASHAPE_KEY  
DOCKERCLOUD_APIKEY  
redmine_token  
REDMINE_API_KEY  
GOOGLE_API_KEY  
GOOGLE_CSE_ID  
STATUS_CAKE_API  
SPOTIFY_SECRET  
KG_API_KEY  
JIRA_USERNAME  
JIRA_PASSWORD  
WEATHER_API_KEY  
HEROKU_API_KEY  
SECRET_KEY_BASE  
digitalocean_client  
digitalocean_api  

Unfortunately this is a problem that will likely never go away since it really only takes a single slip up for confidential information to get leaked.

Environment variables are the way forward of course but proper open source development isn't always discussed in educational scenarios and often times they're just ignored for small projects. I'm guilty of this, we all are and it's hard to resist sometimes.

I'm not entirely sure why Github doesn't protect against this since it's fairly easy to identify these kinds of searches and lock the user out or not provide search results. Even if the bots were to change their search queries, Github should be able to see the new searches and change their restrictions.

When I started this, I originally used a completely unauthenticated script and got the original set of AWS keys pictured. But upon revisiting the script I noticed that unauthenticated users could not search so I modified the script to use a throwaway account and I still received plenty of keys.

In my opinion, Github has the opportunity to prevent this abuse but doesn't seem to be doing much about it. I could be mistaken and I could just have text files of nothing but bogus API keys but it doesn't seem likely...

Kevin Chung

I like doing computer stuff, playing chess, and playing video games. I ran CSAW CTF for a couple of years and I wrote CTFd which is a popular Capture The Flag framework.

Subscribe to Kevin Chung

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!