Close
0%
0%

Introduction to Reverse Engineering with Ghidra

Learn how to reverse engineer software using Ghidra! This four-session course will walk you through the basics.

Instructors wrongbaudwrongbaud
Thursday, September 24, 2020 12:00 am GMT Local time zone:
Register for thiscourse
Similar projects worth following

Updates / Information

For regular updates regarding class materials, follow wrongbaud and voidstarsec on twitter.

Course Overview

This is a four session course that covers the basics of reverse engineering software with Ghidra. For each session there are exercises to be completed that can be found on the project github page.

  1. Session One Lecture
  2. Session Two Lecture
  3. Session Three Lecture
  4. Session Four Lecture

Exercises and materials can be found here.

Hardware Requirements

  • 8GB RAM 

Software Requirements

  1. Docker (or an Ubuntu 18.04 VM)
  2. The Ghidra SRE Tool

Getting Started

  1. Download Ghidra from here
  2. Download the exercises / Docker container from here
    • git clone https://github.com/wrongbaud/hackaday-u
  3. Build the docker container (Note: You can also use an Ubuntu 18.04 VM if you're doing this, skip to step 5)
    • cd hackaday-u/docker
      docker build . -t hackaday
  4. Test the Docker container (If using Ubuntu 18.04, skip to step 5!)
    • docker run --rm -it hackaday /bin/bash
  5. Run a challenge binary as a test!
    • root@522471199b16:/home/hackaday# ./hackaday-u/session-one/exercises/c1 
      Please supply the password!
      root@522471199b16:/home/hackaday# ./hackaday-u/session-one/exercises/c1 test
      Wrong answer, we'd never use test as the password!

The goal of these challenges is to bypass or provide a proper password. Over the course of the sessions the amount of information that you have to provide will change and the complexity of the passwords will increase. 

Course Goals

  • Familiarize students with the basic concepts behind software reverse engineering
    • x86_64 Architecture Review
    • Identifying C constructs in assembly code
    • Disassembly vs Decompilation
  • Teach students how to use the Ghidra SRE tool to reverse engineer Linux based binaries
    • Basic navigation and usage
    • How to identify and reconstruct structures, local variables and other program components
  • Demonstrate and explain the methodologies used when approaching an unknown program with Ghidra
    • Where to start when looking at an unknown binary
    • How to quickly gain an understanding of an unknown program
  • Provide challenges and "crackme" exercises so that students gain hands on experience with Ghidra

Prerequisites / Resources


Playlist for the Reverse Engineering with Ghidra series:
https://www.youtube.com/playlist?list=PL_tws4AXg7auglkFo6ZRoWGXnWL0FHAEi

  • Class 1 video

    Lutetium06/23/2020 at 18:13 0 comments

    Reverse Engineering with Ghidra Class 1


    Class 1 outline

    0:00 - Presentation Outline
    2:50 - What is Software Reverse Engineering?
    4:12 - Software Engineering Review
    24:54 - x86_64 Architecture Review
    45:10 - Ghidra Overview and Basic Usage

    Questions can be sent to superconference@hackaday.io

  • Office Hour Questions 6/25/20

    wrongbaud06/26/2020 at 11:45 0 comments

    Office Hour Notes from 6/25/20

    Questions:

    • When is code obfuscation executed?
      • There are various levels of code obfuscation, sometimes the source code itself is obfuscated, and other times it’s applied to the machine code. 
    • What do you consider .NET stuff?  Is it a binary file or something else?
    • Do people obfuscate by handwriting assembly or are there obfuscating compilers?
      • Obfuscation can be performed in a number of ways. For example, there are obfuscating assemblers, and various compiler tricks that can be done to aid in obfuscation.
    • How often is obfuscation in play?
      • It depends on your target, you’ll find it often in games and other things that require some sort of DRM, but it’s less common when looking at embedded firmware images for example.
    • It would be great if you can give some pointer on how to identify packed or encrypted code using Ghidra
    • Any binary can be reversed?
      • Yes, technically speaking anything that contains machine code that can eventually be run by the CPU can be reverse engineered.
    • Do you have any resources for extracting binaries from a platform/uC?
    • On embedded systems do you often see heap being used? Or is deterministic memory (stack) more common?
      • This depends entirely on what the system is used for - if it’s running an RTOS or a Linux based OS, then you’re going to see heap usage. Smaller microcontrollers may not have space/resources to implement a memory allocator and will rely on statically sized buffers in SRAM. 
    • Is the memory for AH / AX / EAX / etc. shared? i.e. can you access 8 bits of AX by accessing AH?
      • Yes, the various representations of these registers can be used to access those specific size ranges. 
    • Is there a universal reference to the instruction set for x86_64?
      • Yes, the Intel instruction set architecture reference is linked on the course page. 
    • x86-64 has a flat 64bit memory model so RAM, as well as PCIe peripherals, can end up in memory space, correct?
      • Technically this is correct, however there are memory protections in place to try to prevent these regions from being accessed. The operating system / mmu will protect these regions of memory, as well as the drivers utilizing them from being accessed. 
    • What are 'high level' differences between Ghidra and Ida Pro? [understand it may just be OpenSource vs not]
      • There are many differences between the two, and we will go over these during the second class session!
    • Will we be touching on what to do if Ghidra can’t find cross-references because the pointers are some +value off from the virtual addresses in this course? (trying to reverse some firmware blob)
      • When looking at firmware blobs, properly creating a memory map is very important, and may be the reason why you’re having issues with XRefs.. This can be done from within ghidra by clicking on the memory viewer, or by writing a loader / script to perform this for you. It is important to also create relevant RAM regions when working with firmware images as these are often where the XRefs will be located. 
    • Thank you for the amazing tour of the tools, but what is the "goal" - what can we expect to do with all this? :) 
      • The goal of this course is to familiarize students with the concepts behind reverse engineering software, and provide a base understanding of how to use Ghidra to solve binary puzzles and challenges. 
      • By the end of this course, students will be comfortable loading x86_64 ELF files into Ghidra and be able to analyze them. 
    • I thought EABI was for embedded...
    Read more »

  • Class 2 Video

    Lutetium06/30/2020 at 17:11 0 comments

    Reverse Engineering with Ghidra Class 2

    Class 2 outline

    Intro: 0:0
    Assembly Language / Applying Function Signatures: 3:08
    Imports and Exports: 8:49
    Control Flow Statements in Assembly Language: 10:23
    Switch Statements in Assembly Language: 18:10
    Loops in Assembly Language: 24:34
    Variables in Assembly Language: 32:42
    Functions in Assembly Language: 39:46
    Heap Memory: 48:08
    Array Accesses in Assembly Language: 50:11


  • Class 2: Q&A

    wrongbaud07/03/2020 at 15:18 0 comments

    • Where do we get the exercises? The only info I've gotten is on the Eventbrite.
    • I tried Gidra on two firmware images I had: an esp32 and STM32 dev board. Since it's a binary blob, it did not provide the ELF info on architecture, so I had to fill it out by hand, and there are many ARM options. I chose ARM Cortex, but it didn't seem to work that well. How do you pick arch from the many ARM options? What would be the right one for this firmware?
      • In order to determine the proper CPU architecture, you should start with any applicable datasheets. A lot of the MCUs in those series’ that were mentioned use Cortex cores, but analysis will fall short if you do not properly define the appropriate memory regions, which can be acquired from the relevant datasheet. 
    • Why arent functions like main() for c++ automatically set to the right parameters?
      • The decompiler tries not to make too many assumptions for these function prototypes and uses the context that is provided by the instructions in use - this allows things to be more generic and causes fewer failures, but also means that users have to sometimes identify and add the appropriate types. In short - you don't want to assume too much such that it breaks other use cases. 
    • Can you take thins like [rbp+8] and give them symbolic names for local variables?
      • Yes, if you right click the label that is being used, you can rename the variable to something different.
    • Also for the exercises, should we use docker on windows or do WSL and git clone the repo?
      • They have been tested within the docker container, and in an Ubuntu 18.04 VM, so I would recommend sticking with one of those two. If you have issues with docker, reach out and we’ll try to help you.

  • Class 3 Video

    Lutetium07/07/2020 at 02:07 0 comments

    Reverse Engineering with Ghidra Class 3

    Class 3 Outline

    0:00 Intro 
    2:36 - SRE Tool Landscape 
    8:03 - Structs: ASM, Identificaion and Ghidra Analysis 
    20:19 - Pointers: ASM, Identificaion and Ghidra Analysis  
    35:30 - Enums: ASM, Identificaion and Ghidra Analysis
    40:00 - x86_64 System Calls
    45:40 - File Operations
    51:02 - Ghidra Tips: Patching, Bookmarks, Searching, Comments

    Questions can be sent to superconference@hackaday.io

  • Class 3 Q&A

    wrongbaud07/10/2020 at 10:24 0 comments

    Class 3 Q&A

    • How are we using docker for this class?
      • Docker is used to run the exercises in, you can also use an Ubuntu 18.04 virtual machine if you prefer
    • What is a (the most?) common example of being able to pull in a header file? Doesn't most of RE activities assume you don't have the source?
      • If you are aware of an open source library that the program may be using, you can import header files from that. Or perhaps if you are reverse engineering a custom kernel module, a lot of the structs in use are likely from the mainline kernel
    • Can you touch on using tools like Ghidra to remove calls to say a dongle attached to the system?
      • This would be entirely system dependent and more context would be needed. Are you trying to functions? What is the end goal? etc
    • How do you do expressions in Ghidra? e.g. the last exercise which did a complicated shift and arithmetic---I ended up using Octave to calculate it, but Ghidra must have it as well but I couldn't find it.
      • Aside from PCODE emulation, I am not aware of a way to directly evaluate the resulting decompilation. This would require using an external emulator of some sorts. 
    • Could you give a couple of examples of what is IDA good at that nobody else has, and vice versa Ghidra and R2
      • IDA: Good at c++ demangling, windows PDB parsing, strong decompiler
      • R2: Extensible, can easily be expanded upon with plugins, community support, open source
      • Ghidra: Decompiler support for every processor, open source, actively developed
    • Is it possible to demo running Ghidra alongside a debugger? I know the ret-sync plugin exists but I've had trouble with it
      • Right now the built in debugger in in alpha testing and hopefully they will be releasing it with the next official release
    • Could you use Ghidra to reverse engineer itself?
      • Ghidra is Java based, and open source so there would be little reason to reverse engineer it specifically when the source code is hosted up on github
    • Should all struct members take up the same amount of space?
      • No, it depends on multiple factors - the architecture of the target system, the compiler optimization settings and of course the members of the struct itself!
    • Does running a syscall, by definition, execute instructions defined in the kernel? How does that carry over to an embedded context?
      • The syscall instruction does a number of things, but most importantly it puts the value of IA32_LSTAR MSR into PC, this will jump to valid code that will handle the syscall properly, think of it as similar to an interrupt vector table on an embedded processor. 
    • I was surprised there wasn’t any Ghidra feature for syscall analysis.
      • As of right now I am not aware of any plugins that do this, but it would make for a great side project!
    • A question I've been having is why I often see extra (typically repeated) arguments in the decompiled output.  For example, in many of the examples the functions that add two parameters are known to only take two parameters, but Ghidra shows them as having 3 or 4 being provided.  What does it mean/why does it happen?  How do I fix it in Ghidra?
      • This happens because the decompiler makes a lot of assumptions and often times these assumptions are simply incorrect. You can fix this up by changing the types of variables on the stack such that they are the proper size. 
    • are there any special considerations needed to reverse a proprietary kernel module?
      • Nope! They are just ELF files, this is a good example of when one might want to import header files from the kernel source depending on the driver and what other subsystems it uses, for example if your driver uses USB URB objects / structs you could import those in order to make it simpler. 

  • Class 4 Video

    Lutetium07/14/2020 at 21:54 0 comments

    Reverse Engineering with Ghidra Class 4

    Class 4 Outline

    0:00 - Intro 
    3:14 - Ghidra: Loading External Libraries
    10:31 - Ghidra: Patch Diffing and Analysis
    19:30 - Ghidra: Checksum Tool 
    21:38 - Ghidra: Memory Manager 
    25:39 - Ghidra Internals: PCODE and SLEIGH 
    39:00 - Ghidra Extensions 
    45:00 - Ghidra Scripting Overview and Examples

    Questions can be sent to superconference@hackaday.io

View all 7 course classes

Enjoy this course?

Share

Discussions

todd.c734 wrote 06/18/2020 at 14:54 point

Is each class a building on from the other class, or are there just 4 sessions to allow 30 people each session? Just wondering if I need to sign up for all 4 or just 1. 

  Are you sure? yes | no

wrongbaud wrote 06/18/2020 at 21:22 point

Each class will build off of the previous classes, and the videos will be posted here in case anyone needs to catch up or misses a class. 

  Are you sure? yes | no

Akash M wrote 06/18/2020 at 06:03 point

6pm EDT = 3am IST, which is like late night or early morning, is there any chance to change the timing which will be suitable for all people around the world?

  Are you sure? yes | no

wrongbaud wrote 06/18/2020 at 21:21 point

It is difficult to pick a time that works worldwide. The videos of the classes will be released on this page for everyone to use and follow along

  Are you sure? yes | no

Scott Shell wrote 06/17/2020 at 22:01 point

Are we supposed to get a ticket for each of the 4 sessions?  It looks like it is being offered 4 times the way Eventbrite is setup...

  Are you sure? yes | no

wrongbaud wrote 06/20/2020 at 17:29 point

Hi - it is currently set up that way as far as I understand. But if you don't make it to a session we will be uploading the videos as well

  Are you sure? yes | no

ubuntourist wrote 06/17/2020 at 00:10 point

The challenge binary is missing the `/execise/` in the path:

    $ ./hackaday-u/session-one/exercises/c1

  Are you sure? yes | no

wrongbaud wrote 06/17/2020 at 01:28 point

Hey thanks for the heads up! The exercises and such will be ready by the course start date, right now things are still being reorganized!

  Are you sure? yes | no

Does this project spark your interest?

Become a member to follow this project and never miss any updates