KoreLogic Blog

2015-01-12 16:00

JavaScript is often used to facilitate web-based attacks. To make analysis more difficult and hide from signature-based systems, attackers will often obfuscate their JavaScript. Fortunately, there are many ways to deobfuscate JavaScript, or at least determine what it is doing. Sometimes, however, you come across obfuscated JavaScript that just makes your brain bleed.

UPDATE: Some have requested the actual JS used in this analysis, so here it is:

https://blog.korelogic.com/2015/01/12/javascript_deobfuscation/malJS.zip (MD5: 8ad201d4dba1e19295ea1162308f3c0b, pass: infected)

In the last few days there has been a Dyre (banking trojan) spam campaign with the subject lines "Fax #123456" or "Employee Documents - Internal Use". The emails contain a link to a web page that loads two obfuscated JavaScript pages - each of which look like this:

If there ever was obfuscated JavaScript that made you want to crawl under your desk and cry, this is it.

A common methodology to deobfuscate malicious JavaScript (JS) is to run it in a modified interpreter, such as the SpiderMonkey modified by Didier Stevens. These programs run the obfuscated JavaScript and give you output from JS eval or document.write commands, which is often used in obfuscated JavaScript.

Unfortunately, there are times these programs don't work. When that happens, if you want to see the deobfuscated code you have to do some manual analysis. This post will illustrate how to manually decode the JavaScript from this attack.

JJEncode

This obfuscated JavaScript is encoded using JJEncode, a JavaScript encoder. Unfortunately, I did not realize this until halfway through manual decoding. There is an excellent paper on how JJEncode works by Peter Ferrie, and two automated deobfuscation tools by Jacob Soo and Nahuel Riva. Any duplication of information from these resources is accidental.

Despite the availability of these resources, it is still worth examining how the JJEncoded JavaScript can be manually decoded so you can use these techniques in future deobfuscation attempts.

The Obfuscated Code

This JavaScript hurts to look at - mostly due to the lack of line breaks and the obfuscated variable names. However, if we examine the code one line at a time, we will start to get an idea of what it is doing. Code beautifiers, like JS Nice, work great for inserting line breaks automatically. Doing so with the obfuscated JavaScript shows that we are dealing with only 6 lines of code.

As we shall see during the analysis, JJEncode is essentially a substitution encoding that goes through three phases:

Initialization, where characters and values are assigned to variables.
Substitution, where the variables are used to construct code.
Execution, where the constructed code is executed.

As each line is analyzed, we will see these phases and construct the deobfuscated code.

Line 1

The first line consists of:

$ = ~[];

JavaScript variable names are pretty flexible in the characters that can be used, so the above variable name of "$" is a valid name. Unlike a number of other languages, such as Perl, the dollar sign is not a reserved character and therefore able to be used in any part of the variable name. This allows other variables names that we'll see in this JS, such as $_, $$$, and _$_.

Line 1 is an assignment statement, assigning the value of ~[] to the variable $. The tilde character is a bitwise NOT operation, and the [] signifies a JavaScript array. What happens when you NOT an array? You get -1.

So, this statement assigns -1 to the variable $.

Line 2

The second line is a bit longer, but broken up we see:

$ = {
  ___ : ++$,
  $$$$ : (![] + "")[$],
  __$ : ++$,
  $_$_ : (![] + "")[$],
  _$_ : ++$,
  $_$$ : ({} + "")[$],
  $$_$ : ($[$] + "")[$],
  _$$ : ++$,
  $$$_ : (!"" + "")[$],
  $__ : ++$,
  $_$ : ++$,
  $$__ : ({} + "")[$],
  $$_ : ++$,
  $$$ : ++$,
  $___ : ++$,
  $__$ : ++$
};

In this line, $ is being reassigned to a JavaScript object, as denoted by the curly braces. Properties of the object are defined within the braces in the form "name : value", and individual properties are separated by commas.

The first property is ___ (3 underscores), or its full name, $.___. The value of this property is ++$, which takes the value of $ (currently -1), increments it (to 0), and then assigns it to the property. So, in this statement, $ is incremented by one to 0 and then assigned to $.___. Note that since the object is still being built, $ is still a number and not an object yet.

The second property is $$$$. The value of this property is (![] + "")[$].

The first part of this value is (![] + ""). ![] is an array that is logically NOT'd. This turns it into the boolean value "false". By concatenating it with an empty string, the value is turned a string. Therefore, (![] + "") evaluates to the string "false".

However, there is a [$] after the string "false". In JavaScript, a letter of a string can ¹ be obtained by specifying the index of the character within brackets (string positions start at 0). Here, $ currently evaluates to 0, so this line is asking for the character at position 0 in the string "false", or "f".

Iterations for the explanation above are shown below to better illustrate the process.

$$$$ : (![] + "")[$]
1. $$$$ : (false + "")[$]
2. $$$$ : ("false")[$]
3. $$$$ : ("false")[0]
4. $$$$ : "f"

The rest of the object properties are constructed in a similar fashion: incrementing the $ variable, constructing a string, and grabbing a character out of the string by specifying its index.

After decoding all of the object, the values look as such:

$ = {
  ___  : 0,
  $$$$ : "f",
  __$  : 1,
  $_$_ : "a",
  _$_  : 2,
  $_$$ : "b",
  $$_$ : "d",
  _$$  : 3,
  $$$_ : "e",
  $__  : 4,
  $_$  : 5,
  $$__ : "c",
  $$_  : 6,
  $$$  : 7,
  $___ : 8,
  $__$ : 9
};

What do we have? The hexadecimal alphabet! The purpose of this whole statement was to produce the hexadecimal alphabet for use in the later substitutions.

Line 3

The next three lines construct more variables used for substition. Line 3 (separated to easily view) is:

$.$_=($.$_=$+"")[5]+
     ($._$=$.$_[1])+
     ($.$$=($.$+"")[1])+
     ((!$)+"")[3]+
     ($.__=$.$_[6])+
     ($.$=(!""+"")[1])+
     ($._=(!""+"")[2])+
     $.$_[5]+
     $.__+
     $._$+
     $.$;

This is an assignment to a new property in the $ object, $.$_. The value of the property is constructed by concatenating values together, as denoted by the plus operator. Each value is grabbing a character using an index, so this is likely a string being constructed. We can evaluate each of the values to get the entire string.

The first character is ($.$_=$+"")[5]. This operation assigns the value of $+"" to $.$_, then takes the 6th character (index 5 is the 6th character). $+"" is the string "[object Object]", and the 6th character is "c".
The second character is ($._$=$.$_[1]). $.$_ was previously assigned the value of "[object Object]", so the 2nd character (index 1) is "o". Note this also assigns "o" to $._$.
Third, we have ($.$$=($.$+"")[1]) which assigns a value to $.$$. The value assigned is ($.$+"")[1]. $.$ has not been seen yet, so it is undefined, thus creating the string "undefined". The second letter is "n", so "n" is assigned to $.$$.
Fourth is ((!$)+"")[3]. This obtains the 4th letter of the string created by ((!$)+""). Performing a boolean NOT (the '!' operator) on an object returns false, so the string "false" is created. The fourth letter is "s".
($.__=$.$_[6]) is the fifth operation, which assigns the 7th letter of $.$_ (the string "[object Object]") to $.__. The 7th letter is "t".
Sixth, ($.$=(!""+"")[1]) assigns a value to $.$. The value assigned is the 2nd letter of !""+"". In JavaScript, an empty string is considered another representation of false, so a logical NOT of false is the value true. The operaton !""+"" creates the string "true", the 2nd letter of which is "r".
The seventh operation, ($._=(!""+"")[2]), gets the 3rd character from the same string ("true"), "u", and assigns it to $._.
Eigth, the 6th character of $.$_ ("[object Object]") is obtained, "c".
The last three characters are composed of object properties that have already been assigned values: $.__, $._$, and $.$. These values are "t", "o", and "r", respectively.

In the end, this line of code creates the string "constructor" and assigns it to $.$_.

Line 4

Lines 4 and 5 construct two strings in a similar fashion.

Line 4's string is constructed through the code:

$.$$=$.$+
     (!""+"")[3]+
     $.__+
     $._+
     $.$+
     $.$$;

Most of the characters in the string use previously assigned values that we can substitute in, giving us:

$.$$="r"+(!""+"")[3]+"t"+"u"+"r"+"n"

The only letter not substitued is constructed through (!""+"")[3]. This is the operation that returns the string "true", the fourth character of which is "e". So, this line creates the string "return".

Line 5

The last string constructed is on line 5:

$.$=(0)[$.$_][$.$_];

$.$_ is equal to the word "constructor", so we have the operation (0)[constructor][constructor]. Typing that into a JavaScript interpreter returns the following function:

function Function() {
    [native code]
}

This is creating a JavaScript function definition. Since this was not passed any data, the function itself is empty. However, if we were to pass a string of JavaScript code into it, as shown in the example below, we would create an anonymous JavaScript function:

js> (0)["constructor"]["constructor"]("alert ('hi!');")
function anonymous() {
    alert("hi!");
}

For the purposes of our decoding, this statement is creating a function and we can substitute the function keyword when we see $.$ later in the obfuscated JavaScript.

For those keeping track, our $ object now has the following values:

$ {
  ___  : 0,
  $$$$ : "f",
  __$  : 1,
  $_$_ : "a",
  _$_  : 2,
  $_$$ : "b",
  $$_$ : "d",
  _$$  : 3,
  $$$_ : "e",
  $__  : 4,
  $_$  : 5,
  $$__ : "c",
  $$_  : 6,
  $$$  : 7,
  $___ : 8,
  $__$ : 9,
  $_   : "constructor",
  _$   : "o",
  $$   : "n",
  __   : "t",
  _    : "u",
  $$   : "return",
   $  : function
}

Line 6

At this point we have performed all of the data initialization. Line 6 is where the substition and execution of the deobfuscated code occurs. This happens simultaneously in the JavaScript code, but we can seperate it out to view what is occurring. By substituting in the values we know about, we get a clearer picture of what the code is doing.

Note that you have to be careful when doing this, as a simple search and replace cannot be performed as you run the risk of substituting incorrect values. Performing the substitution is left as an exercise to the reader, but when done you will get the following code.

As seen above, the line 6 creates an anonymous function that is executed. The function's code is created by substituting values that were constructed earlier in the JavaScript. Since this code is currently a string of concatenations, we can bring the string together to be a bit more readable.

The code still isn't entirely clear, but it is much better than before.

A number of characters in the obfuscated code above are in the format backslash followed by a number. In JavaScript, this is the format used to represent a character by its octal (base 8) value.

The octal values can be replaced with their ASCII equivalents to remove this level of obfuscation. This leaves us with the unobfuscated code that is executed within the anonymous function.

Conclusion

The JJEncoded JavaScript looks daunting (and brain bleeding) at first. The obfuscation makes use of non-standard and repetitive variable names, and strings constructed in odd methods to fool analysts into thinking that it is much harder to deobfuscate than it actually is. However, by moving slowly and taking the code one line at a time, we can remove the obfuscation to get at the actual code beneath.

While there are automatic decoders for JJEncode available, the next obfuscation technique you come across might not have a decoder. Tools help, but they only get so far and won't work all the time so being able to perform deobfuscation manually is a skill worth having.

Notes

1. Interesting to note, using indexes to obtain string characters was not always a standard JavaScript feature, and therefore may not work in older browsers.

Posted by Tyler at: 16:00 permalink

Rob wrote at 2015-01-15 10:05:

Wow...great job both in the deobfuscation of the code but also in the very clear documentation to teach something that is very complex and break it down in such a way that even a layman like myself can understand. Thanks!

tesla wrote at 2015-01-29 20:43:

Thank you for sharing it, amazing piece.

Tom wrote at 2015-02-02 18:21:

Well presented, step by step, thanks, it was very interesting.

sgaawc wrote at 2015-04-17 12:59:

Interesting article.

Vladas wrote at 2015-04-17 15:25:

Very nice article! And very useful.

SylvainPV wrote at 2015-04-17 15:12:

Well done Tyler ! Now I have another challenge for you: http://pastebin.com/cSYS0w1f#

alucab wrote at 2015-04-18 11:56:

this is a great post
many compliments for the work

Vikash wrote at 2015-04-20 13:10:

Excellent article. After reading this, i got the confidence of decoding the greek JS to understandable one :)

Pradash wrote at 2015-05-16 07:25:

Funny but it only works with some small files it seems. You can use jscrewit and encode the whole jquery: http://jscrew.it