A closer look at the leaked Snapchat data
A new year, a new data leak.
Last Tuesday an anonymous group or person leaked about 4.6 million Snapchat usernames along with the associated phone numbers, save for the last two digits. At SnapchatDB you can download the data in either SQL or CSV format.
The leak came about a week after Snapchat dismissed the threat – brought to their attention by Gibson Security – in a blogpost: “Theoretically, if someone were able to upload a huge set of phone numbers, like every number in an area code, or every possible number in the U.S., they could create a database of the results and match usernames to phone numbers that way. Over the past year we’ve implemented various safeguards to make it more difficult to do. We recently added additional counter-measures and continue to make improvements to combat spam and abuse.” Apparently their counter-measures weren’t enough.
Soon after the breach, the alleged hackers responded to requests for comment by The Verge: “Our motivation behind the release was to raise the public awareness around the issue, and also put public pressure on Snapchat to get this exploit fixed,” they say. “Security matters as much as user experience does.”
I decided to take a closer look at the data to see if there’s anything we can learn.
First, I wanted to see what users have been effected. (To see if you are part of the leaked data, you can search for your name here.) To do so, we can take a look at the area codes of the phone numbers. As mentioned in various reports, the leaks only effect North American users. You can find the list of North American phone numbers on Wikipedia: there are 323 American and 41 Canadian area codes. After loading up the data in R, I split the telephone numbers and counted the area codes. There are 76 in total, 2 from Canada and the rest from the US (attached).
Here are the top 10 area codes:
Next I fired up SAP Lumira to take advantage of the great mapping features it has (you can get your free copy here). Here’s a choropleth map showing which states have been affected the most by the leak. If you live in a grey state, you’re not part of the leak. If you’re in a green state, you may want to check to see if you’re data has been compromised (you can check here).
The SnapchatDB data also comes marked up with regional information below the state level, which after looking into a bit I’m pretty confident is accurate for the most part. Here are the top 10 regional locations affected by the leaks (attached):
|New York City||334,445|
|Eastern Los Angeles||215,855|
|San Fernando Valley||205,544|
|Northern Chicago Suburbs||195,925|
|Downtown Los Angeles||168,565|
As this dataset is incomplete, we can only really draw conclusions about what users have been affected by this leak. Some larger urban areas – like my Philadelphia – is fairly low on the list, but I wouldn’t conclude that Snapchat is unpopular here, simply that it wasn’t hit as hard.
Next I looked at the usernames, specifically the number of characters in usernames. What we get is a nice, nearly normal distribution with a bit of right skew. Looking at the data, I would guess that Snapchat requires a username of between 3 and 15 characters. However, there are exactly 5 notable outliers (excluded in the histogram below), users for whom their email address has been published instead of their username (unless these 5 could choose to have their username be their email?). I wonder how these 5 got to be included in this data set.
What other things might we be able to glean from this data?
Lastly, Gibson security has some good advice for those whose information has been compromised: you can delete your Snapchat account, change your phone number, and never give out your phone number if you don’t have to. The moral of the story is that even super-popular services getting billion dollar buyout offers aren’t immune to user data leaks, so be sure you take your online privacy seriously.
If you liked this post, don’t forget to hit the Like button at the top of the page: thanks!