KoreLogic Blog
LinkedIn Revisited – Full 2012 Hash Dump Analysis 2016-05-19 15:00

As you may know, a "full" dump of email addresses and password hashes for the Linkedin.com attack that occured in 2012 has become available. Here at KoreLogic, we got our hands on the list of emails and the separate list of passwords (but nothing linking the two together, which we don't want or need). We started to gather some statistics on them using our Password Recovery Service (PRS). The following analysis assumes the lists are real; due to the valid email addresses and confirming some of our own accounts' data from back then, we believe that the dump is real.

What we know so far:

It contains 164,590,819 unique email addresses.

It contains 177,500,189 unsalted SHA1 password hashes. Note that this is a larger number than the amount of email addresses.

It contains 61,829,207 unique hashes. This means there are duplicates, and this is good for password researchers because it allows us to come up with statistics of how often certain passwords are used.

As of Thursday May 19 14:09 EDT 2016, we've cracked 65% of the lists, after about two hours work on our private distributed cracking grid. Approximately 41,500,000 plain-text hashes have been recovered so far. There are literally thousands of new cracks coming in every minute, so the numbers are a bit rough.

The most common password hashes are:

Number | Hash
1135936 7c4a8d09ca3762af61e59520943dc26494f8941b
 207488 7728240c80b6bfd450849405e8500d6d207783b6
 188380 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
 149916 f7c3bc1d808e04732adf679965ccc34ca7ae3441
  95854 7c222fb2927d828af22f592134e8932480637c0d
  85515 3d4f2bf07dc1be38b20cd6e46949a1071f9d0e3d
  75780 20eabe5d64b0e216796e834f52d61fd0b70332fc
  51969 dd5fef9c1c1da1394d6d34b248c51be2ad740840
  51870 b1b3773a05c0ed0176787a4f1574ff0075f7521e
  51535 8d6e34f987851aa599257d3831a1af040886842f
  49235 c984aed014aec7623a54f0591da07a85fd4b762d
  41449 6367c48dd193d56ea7b0baad25b19455e529f5ee
  35919 d8cd10b920dcbdb5163ca0185e402357bc27c265
  34440 1411678a0b9e25ee2f7c8b2f7ac92b6a74b3f9c5
  32879 601f1889667efaebb33b8c12572835da3f027f78
  32289 ff539c96a2ed9f72a47a5e1c7d59e143ba1fba94
  30972 019db0bfd5f85951cb46e4452e9642858c004155
  30923 01b307acba4f54f55aafc33bb06bbbf6ca803e9a
  28928 775bb961b81da1ca49217a48e533c832c337154a
  28705 17b9e1c64588c7fa6419b4d29dc1f4426279ba01

These values crack to:

Number | Hash                                   | Plaintext
1135936 7c4a8d09ca3762af61e59520943dc26494f8941b 123456
 207488 7728240c80b6bfd450849405e8500d6d207783b6 linkedin
 188380 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 password
 149916 f7c3bc1d808e04732adf679965ccc34ca7ae3441 123456789
  95854 7c222fb2927d828af22f592134e8932480637c0d 12345678
  85515 3d4f2bf07dc1be38b20cd6e46949a1071f9d0e3d 111111
  75780 20eabe5d64b0e216796e834f52d61fd0b70332fc 1234567
  51969 dd5fef9c1c1da1394d6d34b248c51be2ad740840 654321
  51870 b1b3773a05c0ed0176787a4f1574ff0075f7521e qwerty
  51535 8d6e34f987851aa599257d3831a1af040886842f sunshine
  49235 c984aed014aec7623a54f0591da07a85fd4b762d 000000
  41449 6367c48dd193d56ea7b0baad25b19455e529f5ee abc123
  35919 d8cd10b920dcbdb5163ca0185e402357bc27c265 charlie
  34440 1411678a0b9e25ee2f7c8b2f7ac92b6a74b3f9c5 666666
  32879 601f1889667efaebb33b8c12572835da3f027f78 123123
  32289 ff539c96a2ed9f72a47a5e1c7d59e143ba1fba94 linked
  30972 019db0bfd5f85951cb46e4452e9642858c004155 maggie
  30923 01b307acba4f54f55aafc33bb06bbbf6ca803e9a 1234567890
  28928 775bb961b81da1ca49217a48e533c832c337154a princess
  28705 17b9e1c64588c7fa6419b4d29dc1f4426279ba01 michael

The most common patterns used in the passwords are follows: (Updated May 20 11:00 EDT 2016)

?d = Digit [0-9]
?s = "Special Character" +_)*(&^%$#@!~`-=[]\{}|;':",./<>? ...etc.
?l = Lower case letter [a-z]
?u = Upper case letter [A-Z]

Number | Pattern
2464707 ?l?l?l?l?l?l?l?l    Example: linkedin
1776416 ?l?l?l?l?l?l?d?d    Example: linked12
1663330 ?l?l?l?l?l?l?l?l?l  Example: alinkedin
1587423 ?l?l?l?l?d?d?d?d    Example: link2012
1528434 ?l?l?l?l?l?l?l      Example: linkedi
1525784 ?l?l?l?l?l?l        Example: linked
1348195 ?d?d?d?d?d?d?d?d
1172612 ?l?l?l?l?l?l?l?l?l?l
1074096 ?l?l?l?l?l?d?d?d?d
1042003 ?d?d?d?d?d?d?d?d?d?d
 984939 ?l?l?l?l?l?l?d?d?d?d
 936771 ?l?l?l?l?l?l?l?d?d
 819341 ?l?l?l?d?d?d?d
 781166 ?d?d?d?d?d?d?d
 723656 ?l?l?l?l?l?d?d
 713165 ?l?l?l?l?l?l?l?l?l?l?l
 692280 ?l?l?l?l?l?d?d?d
 690521 ?d?d?d?d?d?d
 670878 ?l?l?l?l?l?l?l?l?d?d
 653118 ?l?l?l?l?l?l?l?d
 539001 ?l?l?l?l?l?l?d?d?d
 494526 ?l?l?l?l?d?d
 491474 ?l?l?d?d?d?d
 462250 ?l?l?l?l?l?l?l?l?l?l?l?l

The most common "base words" used in the passwords are shown below. These are calculated by taking all the recovered passwords, removing all special characters and digits, and then sorting the results. This was the initial technique used by KoreLogic in 2012 to determine that the set of ~6.5 million hashes found on a Russian message board was in fact from LinkedIn.com (which now appears to have been only a subset of this larger leak).

Number | Base word
  29883 linkedin    Examples: linkedin1 linkedin2012 linkedin!
  26194 link        Examples: link2012 2012link !!link!!
  21731 love
  19721 ever
  15574 linked
  14156 life
  11674 alex
  10773 mike
  10566 pass
   9540 john
   9176 blue
   8937 june
   8338 jack
   8006 july
   7305 home
   7205 star
   7094 password
   7005 angel

Update: May 19 15:53 EDT 2016

Here is a list of the most common domains used by the accounts in the dump. No real surprises here.

Number | Domain Name
32865035 gmail.com
24018467 hotmail.com
20361246 yahoo.com
 4268015 aol.com
 1977483 comcast.net
 1427168 yahoo.co.in
 1333354 msn.com
 1039135 sbcglobal.net
 1036522 rediffmail.com
  992936 yahoo.fr
  913406 yahoo.co.uk
  843158 live.com
  839735 yahoo.com.br
  748001 hotmail.co.uk
  740473 verizon.net
  574117 hotmail.fr
  549022 yahoo.com
  528635 ymail.com
  528040 cox.net
  509047 bellsouth.net
  503271 libero.it
  478587 att.net
  428930 yahoo.es
  406492 btinternet.com

Update: May 19 17:00 EDT 2016

42,691,862 unique passwords recovered so far; 69% of the unique hashes have cracked at this point.

Of the total 177,500,189 non-unique hashes leaked, there are 143,914,964 password hashes cracked, 33,585,225 left. That represents 81.07% of all LinkedIn.com users in the dump.

Update: May 20 10:00 EDT 2016

~48,520,000 unique passwords recovered so far; ~78% of the unique hashes have cracked at this point. And we have recovered the passwords for ~86% of all LinkedIn.com users in the dump.

~13,360,000 unique hashes left to crack ...

Update: May 20 11:00 EDT 2016

Here is a list of the most common email addresses without their domain. No real surprises here.

 555249 info@
  64325 john@
  60845 david@
  55525 mike@
  52685 chris@
  52251 mail@
  50654 sales@
  50444 mark@
  48006 steve@
  45872 paul@
  39051 contact@
  37424 linkedin@
  36511 peter@
  35818 michael@
  35770 admin@
  30473 dave@
  30034 tom@
  29102 jim@
  26872 jeff@

Update: May 20 18:00 EDT 2016

Our grid was busy doing client work for about 24 hours, so not many new cracks today. But here's some updated stats and analysis.

~49,290,000 unique passwords recovered so far.

~12,520,000 unique hashes left to crack.

5,184,351 of the recovered passwords are 8+ characters and contain one upper, one lower, and one digit.

825,975 of the recovered passwords are 8+ characters and contain one upper, one lower, and one digit and one special character.

The pattern distribution of these passwords closely resembles the findings of our PathWell research - they are heavily biased towards some universally common topologies:

  29742 ?u?l?l?l?l?l?s?d?d
  26640 ?u?l?l?l?l?l?d?d?s
  26287 ?u?l?l?l?l?s?d?d
  23830 ?u?l?l?l?l?l?s?d
  20296 ?u?l?l?l?l?l?d?s
  18365 ?u?l?l?l?l?d?d?s
  17390 ?u?l?l?l?s?d?d?d?d
  17085 ?u?l?l?l?l?l?l?d?s
  16723 ?u?l?l?l?l?l?l?s?d
  14989 ?u?l?l?l?l?l?l?s?d?d
  13565 ?u?l?l?l?l?s?d?d?d?d
  12986 ?u?l?l?l?l?l?l?d?d?s
  12590 ?u?l?l?s?d?d?d?d
  12305 ?u?l?l?l?l?s?d?d?d
  11280 ?u?l?l?l?l?l?l?l?d?s
  10991 ?u?l?l?l?d?d?d?d?s
  10822 ?u?l?l?l?l?l?s?d?d?d?d
  10796 ?u?l?l?l?s?d?d?d

The PACK output of the unqiue cracks so far (numbers rounded slightly):

[*] Length:
[+]                         8: 29% (14,620,000)
[+]                         9: 17% (8,430,000)
[+]                        10: 14% (6,950,000)
[+]                         7: 13% (6,660,000)
[+]                         6: 10% (5,410,000)
[+]                        11: 06% (3,270,000)
[+]                        12: 03% (1,930,000)
[+]                        13: 01% (921,000)
[+]                        14: 01% (508,000)
[+]                        15: 00% (263,000)
[+]                        16: 00% (159,000)

[*] Character-set:
[+]             loweralphanum: 48% (24,128,000)
[+]                loweralpha: 20% (10,303,000)
[+]             mixedalphanum: 10% (5,026,000)
[+]                   numeric: 08% (4,428,000)
[+]      loweralphaspecialnum: 02% (1,377,000)
[+]             upperalphanum: 01% (957,000)
[+]                       all: 01% (936,000)
[+]                mixedalpha: 01% (852,000)
[+]         loweralphaspecial: 01% (507,000)
[+]                upperalpha: 00% (431,000)
[+]         mixedalphaspecial: 00% (147,000)
[+]                specialnum: 00% (84,000)
[+]      upperalphaspecialnum: 00% (62,000)
[+]         upperalphaspecial: 00% (19,000)

Update: May 21 18:00 EDT 2016

~49,999,999 unique passwords recovered so far.

~11,863,000 unique hashes left to crack.

Update: May 25 15:40 EDT 2016

Our grid is mostly doing other things now. We have gotten a couple requests about re-sharing the list, and/or about building some kind of online interface to look up individual credentials. We have no plans to do so.

For more of KoreLogic's talks about password recovery, check out the following videos of KoreLogic employee, and founder of PRS, Rick Redman:
Your Password Complexity Requirements are Worthless - OWASP AppSecUSA 2014
Cracking Corporate Passwords: Why Your Password Policy Sucks


0 comments Posted by Rick Redman / Minga / @CrackMeIfYouCan at: 15:00 permalink

Comments are closed for this story.