KoreLogic Blog

2016-05-25 11:30

Here at KoreLogic, we are constantly cracking passwords – it's just one of the things we do. While we haven't made a concerted effort to track it, I'd venture to say that cracking for us is pretty close to a 24/7/365 operation. Between paid cracking engagements and penetration tests, our resident cracking expert, Rick, almost always has something cooking on our Distributed Cracking Grid ("Grid"). This week, it happens to be LinkedIn hashes. This level of uptime is made possible by the WebJob framework, the foundation upon which our Grid was built (check out this paper for a brief overview of the technology). WebJob's queuing system allows us to maintain a number of concurrent work orders at any given time. Today, for instance, we have 22 active work orders consisting of 151,995 jobs (or attacks) spread out over 35 queues. At any time, a single attack can be in one of several states (e.g., waiting, working, complete), and resources (i.e., GPU and CPU cores) can be shifted from queue to queue as needs dictate. Additionally, attacks within any given queue can be prioritized. All of this allows us to keep work orders active for days, weeks, or even months at a time, and that's pretty darn cool.

As the Grid's chief architect and primary developer, it's my job to keep the Grid running and add new features/capabilities over time. In this article, I'd like to share with you our aspirations and reasons for creating a cracking grid that is secure, distributed, scalable, and extensible.

Secure

With all the data breaches that have occurred and been reported on in recent years, it shouldn't take much to convince you that user credentials (and password hashes by extension) are highly sought after items. Armed with the correct set of credentials, an attacker can bridge the outsider/insider gap, and we all know that being an insider is a far better place to be when one is bent on doing bad things.

Whether you elect to audit your own passwords or engage an external firm, like KoreLogic, to provide that service for you, it stands to reason that the audit or any aspect of it should not increase your level of exposure or risk simply because you had it done. Therefore, the process and systems used to conduct the audit are at least as important as the raw data and final results. You certainly wouldn't want the source of a data breach to be sensitive data handled poorly, transferred in a compromising way, or stored in the clear on unsecured systems. No, you'd want assurance that things are going to be done right (i.e., securely and with high degree of care and attention to detail).

Since KoreLogic is a security company, we take security seriously. While no one can be 100% secure, we strive towards that goal on a daily basis. In fact, the majority of our effort is spent ensuring that the Grid continues to function in spite of the unique challenges posed by things like system hardening, full disk encryption, network- and host-based firewalls, transport layer security, mutual authentication, digitally-signed jobs, privilege separation, data segregation, and keeping things under our exclusive control (i.e., no cloud providers). That last item deserves a bit more discussion. Even though cracking in the cloud, at first glance, sounds like an attractive, inexpensive, and convenient option, it would eliminate our ability to maintain a level of security and control to our standards, and that's a risk that we're not willing to take.

Distributed

A cracking grid should be distributed. Unless you are a three-letter agency or keep a farm of super computers in your basement, you'll need a way to distribute the load. For weak hash algorithms (e.g., LM), this isn't as important, but for strong algorithms (e.g., bcrypt), it's an absolute requirement. Why? Because the amount of time it would take to exhaust the keyspace is beyond our reach. Thus, we are forced to mount targeted attacks (e.g., dictionaries, rules, masks, etc.). The problem is that there are thousands of targeted attacks. In fact, there are many more attacks than a single system (or small cluster of systems) could mount in a short period of time. By distributing the load you can execute more attacks concurrently and avoid potential limiting factors such as insufficient cooling and/or power.

Our Grid is comprised of both GPU- and CPU-based compute nodes spread out over a number of different geographic locations. Since the underlying technology is not tied to a single physical location, compute nodes can be placed anywhere there's adequate physical security, power, cooling, and network connectivity. This makes it extremely easy to add capacity or support custom requests like: "Hey, we have GPU compute nodes in several data centers. Can you guys combine those resources to attack this hash set?"

Scalable

A cracking grid should be scalable. This becomes apparent as soon as you are faced with a need to crack multiple hash sets concurrently or when the number of GPU/CPU cores at your disposal is less than what's needed to get the job done. Oh, you can manage for a while using ad hoc methods, but you will eventually reach a tipping point where your capacity and concurrency requirements can't be met.

Our Grid scales in both ways: capacity and concurrency. New compute nodes can be added at any time to increase capacity. Once added to the Grid through a secure on-boarding process, each GPU/CPU core in the compute node is immediately available as an individual cracking resource. Similarly, new queues can be created at any time to support an increased set of concurrent work orders. As long as at least one core can be assigned to each queue, forward progress can be made.

Extensible

A cracking grid should be extensible. Personally, I'd prefer to find one good cracking solution and stick with it, but in today's ever changing world, that's an idealistic view, at best. Interests change. Solutions come and go; some ebb and flow. GPU models or even brands of choice change, targeted hash types change, reporting requirements change, and so on. Thus, any cracking grid worth its salt must be flexible and readily adapted to keep up with the changing tides.

Our Grid can be extended, with minimal effort, to support new cracking applications and tasks. Originally, the Grid only supported one cracking solution: John the Ripper (JtR). As GPU cracking became a standard part of our practice, integrating support for oclHashcat was simply a matter of creating a handful of new, yet highly-related-to-the-existing-JtR, scripts. When support for cudaHashcat was added, only a few lines of code were changed. When faced with the need to count guesses while cracking for a research collaboration with Carnegie Mellon University, only a small C program and a few minor script changes were needed. In all cases, the same basic framework was used.

Posted by Klayton at: 11:30 permalink