Malicious PDF Document Analysis

4 minute read

Malicious PDF Analysis

Understand the PDF file structure

1. Header - Contains the version number of the pdf file.

2. Body - Contains objects - obj values (number) denotes its name and its version number, obj & endobj refers to the beginning and end of an object.

3. Cross Reference Table - Specifies the offset from the start of the file to each object in the file, so that the PDF reader will be able to locate them without loading the whole document, begins with the keyword xref.

4. Trailer - Contains overall info about the PDF, points to the start of Cross Reference Table.

image image

PDF file Actions :

1. /OpenAction /AA - The function of this element is to carry out an action for e.g. execute a script.

2. /JavaScript /JS - Link to the JavaScript that will run when the PDF is opened.

3. /Names - Names of files that will likely be referred to by the PDF itself.

4. /EmbeddedFile - Shows the other files embedded within the PDF file itself e.g., scripts.

5. /URI /SubmitForm - Links to other URLs on the internet e.g., possible link to a 2nd stage payload/additional tools for malware to run.

6. /Launch - Similar to OpenAction, can be used to run embedded scripts within the PDF file itself or run new additional files that have been downloaded by the PDF.

Actions of elements that describe how a PDF works :

  • /OpenAction /AA - This element’s function is to carry out an action, such as running a script.

  • /JavaScript /JS - Link to the JavaScript that will run when the PDF is opened.

  • /Names - File names that will most likely be referred to by the PDF itself.

  • /EmbeddedFile - Shows other files embedded within the PDF file, such as scripts.

  • /URI /SubmitForm - Links to other URLs on the internet.

  • /Launch - Used to run embedded scripts within the PDF file itself or run new additional files that have been downloaded by the PDF.

String and Data Encoding

PDF can encode strings in multiple ways to obfuscate data,Below are the encoding examples :

image

PDF also uses Filters to decode the encoded data, which tell the PDF reader that the corresponding string is supposed to be decoded using the provided method.

image

Tools used for Analysis

pdfid - Identifies PDF object types and filters.

pdf-parser - Parses, Searches and Extracts data from PDF documents.

peepdf - Combination of pdfid & pdf-parser, as it is able to find suspicious objects, decode data and has JavaScript analysis built-ins

Malware Sample

MD5: 2264DD0EE26D8E3FBDF715DD0D807569

SHA256: ad6cedb0d1244c1d740bf5f681850a275c4592281cdebb491ce533edd9d6a77d

image

Tool - pdfid

REMnux: pdfid.py "location of badpdf.pdf file"

  • The output indicates PDF version is 1.3 and PDF contain 14 Objects, 2 Streams and JavaScript objects.

image

Tool - pdf-parser

  • pdf-parser will extract all data from the PDF. In order to narrow down our search we need to use the built-in command options such as ‘–Search’.

  • Use pdfparser with –search to show the /OpenAction object

REMnux: pdf-parser.py --search openaction badpdf.pdf

image

  • Now let’s search for the Javascript object with pdfparser

REMnux: pdf-parser.py --search javascript badpdf.pdf

image

  • Now let’s search for the Javascript object 10 and 13.

REMnux: pdf-parser.py --object 10 badpdf.pdf

REMnux: pdf-parser.py --object 13 badpdf.pdf

  • Object 10 references boject 12 and its calling the /Namnes object (New_Script)

  • Object 13 stores the actual Javascript. It contain /Filter and /FlatDecode action elements, That means it is compressed.

image

  • Now let’s search for the Javascript object 13, We will use the -f (filter) & -w (raw output) to check this object.

REMnux: pdf-parser.py --object 13 -f -w badpdf.pdf

image

  • To check JavaScript code we need to dump the code into seprate find and will use JavaScript editior or pee-pdf tool.

REMnux: pdf-parser.py --object 13 -f -w -d obj13 badpdf.pdf

image

  • We will use pee-pdf tool to look JavaScript.

Tool - peepdf

image

  • Use below command to downlode js file in peepdf.

REMnux: PPDF> object 13 > obj13.js

image

Script Obfucsation Techniques :

  • Formatting - Modifies the format of the code to make it defficult to read.

  • Extraneous Code - Add extra lines of code to confuse analysts.

  • Data Obfucsation - Use operations to make data unreadable of confusing.

  • Substitution - Modify the veriable names to random and meaningless names.

image

  • Modified downloded JavaScript and executed to find out what is inside the Shellcode.

  • We will use SpiderMonkey tool to execute Javascript.

image

  • After the execution we can see three log files created, Which is Binary, Unicode and Ascii representation of Shellcode.

image

  • We have used hexdump and strings command to look into our log file.

image

References :