A new year, a new data leak.

Last Tuesday an anonymous group or person leaked about 4.6 million Snapchat usernames along with the associated phone numbers, save for the last two digits.  At SnapchatDB you can download the data in either SQL or CSV format.

The leak came about a week after Snapchat dismissed the threat – brought to their attention by Gibson Security – in a blogpost: “Theoretically, if someone were able to upload a huge set of phone numbers, like every number in an area code, or every possible number in the U.S., they could create a database of the results and match usernames to phone numbers that way. Over the past year we’ve implemented various safeguards to make it more difficult to do. We recently added additional counter-measures and continue to make improvements to combat spam and abuse.” Apparently their counter-measures weren’t enough.

Soon after the breach, the alleged hackers responded to requests for comment by The Verge: “Our motivation behind the release was to raise the public awareness around the issue, and also put public pressure on Snapchat to get this exploit fixed,” they say. “Security matters as much as user experience does.”

I decided to take a closer look at the data to see if there’s anything we can learn.

First, I wanted to see what users have been effected. (To see if you are part of the leaked data, you can search for your name here.)  To do so, we can take a look at the area codes of the phone numbers.  As mentioned in various reports, the leaks only effect North American users.  You can find the list of North American phone numbers on Wikipedia: there are 323 American and 41 Canadian area codes.  After loading up the data in R, I split the telephone numbers and counted the area codes.  There are 76 in total, 2 from Canada and the rest from the US (attached).

Here are the top 10 area codes:

Area Code Frequency State
815 215,953 Illinois
909 215,855 California
818 205,544 California
951 200,008 California
310 196,183 California
847 195,925 Illinois
720 188,285 Colorado
323 168,565 California
347 166,374 New York
917 165,420 New York

Next I fired up SAP Lumira to take advantage of the great mapping features it has (you can get your free copy here).  Here’s a choropleth map showing which states have been affected the most by the leak. If you live in a grey state, you’re not part of the leak.  If you’re in a green state, you may want to check to see if you’re data has been compromised (you can check here).

/wp-content/uploads/2014/01/snapchat_leaks_356008.png

The SnapchatDB data also comes marked up with regional information below the state level, which after looking into a bit I’m pretty confident is accurate for the most part.  Here are the top 10 regional locations affected by the leaks (attached):

Region Frequency
New York City 334,445
Miami 222,321
Chicago Suburbs 215,953
Eastern Los Angeles 215,855
Los Angeles 209,888
San Fernando Valley 205,544
Southern California 200,008
Northern Chicago Suburbs 195,925
Denver-Boulder 188,285
Downtown Los Angeles 168,565

As this dataset is incomplete, we can only really draw conclusions about what users have been affected by this leak.  Some larger urban areas – like my Philadelphia – is fairly low on the list, but I wouldn’t conclude that Snapchat is unpopular here, simply that it wasn’t hit as hard.

Next I looked at the usernames, specifically the number of characters in usernames.  What we get is a nice, nearly normal distribution with a bit of right skew.  Looking at the data, I would guess that Snapchat requires a username of between 3 and 15 characters.  However, there are exactly 5 notable outliers (excluded in the histogram below), users for whom their email address has been published instead of their username (unless these 5 could choose to have their username be their email?).  I wonder how these 5 got to be included in this data set.

/wp-content/uploads/2014/01/snapchat_names_histogram_356023.png

What other things might we be able to glean from this data?

Lastly, Gibson security has some good advice for those whose information has been compromised: you can delete your Snapchat account, change your phone number, and never give out your phone number if you don’t have to.  The moral of the story is that even super-popular services getting billion dollar buyout offers aren’t immune to user data leaks, so be sure you take your online privacy seriously.

If you liked this post, don’t forget to hit the Like button at the top of the page: thanks!

You can find me on Twitter @leeclemmer.  I also occasionally post stuff at leeclemmer.com.

To report this post you need to login first.

13 Comments

You must be Logged on to comment or reply to a post.

  1. Tim Clark

    Thanks for sharing, Lee. Excellent write-up and very helpful info. My daughter has/had a Snapchat account. I checked to see if her info was leaked. Looks like we’re safe.

    (0) 
  2. Jelena Perfiljeva

    Thank you, Lee. Very interesting read and analysis. Even though I didn’t even know what Snapchat was before the leak (there is no bad PR, eh? 🙂 ), glad to find I’m in a “grey” state anyway.

    And I’m soooo getting Lumira just for myself now, thanks for the link!

    (0) 
      1. Jelena Perfiljeva

        Major fail on Lumira – aparently 1.92 Gb memory on my 3 year old laptop is not enough. Oh well… Not sure though why it doesn’t check the requirements before installation, like most applications. Perhaps something to look into.

        (0) 
  3. Stephen Johannes

    You know we better also worry about those pesky yellow and white pages.  How dare someone publish all our phone numbers and distribute those phone numbers along with our address to every physical address for free and encourage us to use those directories.

    Take care,

    Stephen

    (0) 
    1. Lee Clemmer Post author

      Hi Stephen, I appreciate your (snarky) comment, which I suppose is making the point that it’s no problem that this information was leaked, since hey, we’re publishing personal information in phone directories already anyway. 

      While I think you bring up a good point, you’re also missing some critical nuances here.  First, I can always choose to have my information removed from a phone directory; in other words, I’m in control.  Second, phone directories don’t list the numbers of minors.  And finally, this kind of data leak potentially exposes the identity behind an anonymous username against that person’s wishes.

      I understand the cynicism behind scoffing at a user’s expectation of privacy in using web services; I think the NSA revelations certainly warrant such cynicism. But don’t we expect certain online information to remain private, like your banking information?  Now Snapchat data of course isn’t on par with banking data, I understand that, but at the end of the day, the information leaked here was information which was shared by users under the assumption it would be kept private. Exposing this information (or letting it be exposed) could have harmful consequences to some users, even though we may not know what those are.

      (0) 
      1. Stephen Johannes

        Actually getting your phone number removed from a directory in most cases involves a paid fee and even then the phone company up until many years ago still had the right to sell your phone number/information to 3rd party marketing agencies which they did.  It wasn’t until the last ten to fifteen years before you had any real rights to ensure your phone number was distributed without your permission.

        The problem is that we suffer from the entitlement attitude that services we do not pay for, should do everything we expect them to.  In addition your comparison to the banking information is a straw argument as banking/financial data is a paid for/heavily regulated service.  It doesn’t mean that a breach can’t occur, but there are severe consequences on both ends if such breach occurs.

        Finally by performing an analysis on data that was illegally retrieved and posting the results here, you are part of the problem.  It’s hard to preach to us about privacy concerns when you are using stolen data in the first place.  Does SAP have any standards anymore, or as long as you are pushing/selling software it doesn’t matter about ethics.

        Take care,

        Stephen

        (0) 
        1. Jelena Perfiljeva

          Stephen Johannes wrote:

          The problem is that we suffer from the entitlement attitude that services we do not pay for, should do everything we expect them to.

          Stephen, normally I agree with everything you say, but in this case I believe the privacy regulations would also apply to the web sites that provide free services. To think that if something is free to use then it’s OK for it to be defective or ignore legal requirements is not right either.

          Part of the problem could be that not many people actually read T&C or privacy disclosures (and also many companies make them way too long and complex). There was actually a South Park episode on this – Human CentiPad. Personally I’m not for more regulations, but wish this was simplified, like it was done with credit card information in the US recently. In this way at least the users could make an easier and educated decision on whether to use certain service and what to expect.

          Yes, there is already scary amount of information available about almost anyone online. But do we need to be non-chalant if more of such information is leaked like happened in this case?

          (0) 
        2. Lee Clemmer Post author

          Hey Stephen, thanks for responding again.  As I conceded originally, bank data is different from Snapchat data, but not, I would argue, because we have to pay for it. Should that be the line then, that if we paid for a service we should expect our data to be kept private, otherwise everything is fair game?

          I just read through the post again, and I actually can’t see where I am preaching – in any event, if that is how I came across, I apologize as I am really not trying to preach to anyone about anything.  The question of privacy is one I myself have come to consider differently. Initially I thought, ok, so what, what can you really do with this information.  After looking a little bit more into it, however, I realized that you could potentially do quite a bit if you were so inclined. I think privacy is complicated issue.

          Finally, in regards to your Ad Hominem: I disagree with you that it is unethical to take a look at and write about publicly available data attained illegally.  If I were mirroring the data and making it available that would be one thing, but writing about this topic to have a discussion about the topic of privacy (as we are having here) is not a bad thing.

          Thanks again,

          – Lee

          (0) 

Leave a Reply