Close
0%
0%

Introduction to Reverse Engineering with Ghidra

Learn how to reverse engineer software using Ghidra! This four-session course will walk you through the basics.

Public Chat
Similar projects worth following
The purpose of this course is to provide an introductory overview of how to reverse engineer software with Ghidra. This program will consist of multiple hands-on exercises and labs allowing the students to gain the practical skills necessary to reverse engineer software with Ghidra. All exercises will be written for a modern x86_64 target running Linux. After attending these sessions, students will be familiar with the basic analysis features that Ghidra provides and understand the steps involved to perform basic analysis of software programs.

The course will consist of four sessions in total. Each section will contain a video component, a lab component (to be completed after the video, utilizing the concepts illustrated), and an office hour component where the instructor will be available for questions.

Updates / Information

For regular updates regarding class materials, follow wrongbaud and voidstarsec on twitter.

Hardware Requirements

  • 8GB RAM 

Software Requirements

  1. Docker (or an Ubuntu 18.04 VM)
  2. The Ghidra SRE Tool

Getting Started

  1. Download Ghidra from here
  2. Download the exercises / Docker container from here
    • git clone https://github.com/wrongbaud/hackaday-u
  3. Build the docker container (Note: You can also use an Ubuntu 18.04 VM if you're doing this, skip to step 5)
    • cd hackaday-u/docker
      docker build . -t hackaday
  4. Test the Docker container (If using Ubuntu 18.04, skip to step 5!)
    • docker run --rm -it hackaday /bin/bash
  5. Run a challenge binary as a test!
    • root@522471199b16:/home/hackaday# ./hackaday-u/session-one/challenges/c1 
      Please supply the password!
      root@522471199b16:/home/hackaday# ./hackaday-u/session-one/challenges/c1 test
      Wrong answer, we'd never use test as the password!

Course Goals

  • Familiarize students with the basic concepts behind software reverse engineering
    • x86_64 Architecture Review
    • Identifying C constructs in assembly code
    • Disassembly vs Decompilation
  • Teach students how to use the Ghidra SRE tool to reverse engineer Linux based binaries
    • Basic navigation and usage
    • How to identify and reconstruct structures, local variables and other program components
  • Demonstrate and explain the methodologies used when approaching an unknown program with Ghidra
    • Where to start when looking at an unknown binary
    • How to quickly gain an understanding of an unknown program
  • Provide challenges and "crackme" exercises so that students gain hands on experience with Ghidra

Scheduling Detals

  • The course starts Monday, June 22 at 6:00 PM (EDT)
  • Class sessions will occur weekly on Mondays at 6:00 PM (EDT)
  • Office hours will be Thursday at 6:00 PM  (EDT)
  • There will be a total of four class sessions  and office hour sessions

Prerequisites / Resources

  • Class 3 video

    Lutetium2 days ago 0 comments

    Here is the mostly unedited video for the third class.

    This video is currently unlisted and will be edited and reposted at a later date. 

    See you at the Office Hours!

  • Class 2: Q&A

    wrongbaud6 days ago 0 comments

    • Where do we get the exercises? The only info I've gotten is on the Eventbrite.
    • I tried Gidra on two firmware images I had: an esp32 and STM32 dev board. Since it's a binary blob, it did not provide the ELF info on architecture, so I had to fill it out by hand, and there are many ARM options. I chose ARM Cortex, but it didn't seem to work that well. How do you pick arch from the many ARM options? What would be the right one for this firmware?
      • In order to determine the proper CPU architecture, you should start with any applicable datasheets. A lot of the MCUs in those series’ that were mentioned use Cortex cores, but analysis will fall short if you do not properly define the appropriate memory regions, which can be acquired from the relevant datasheet. 
    • Why arent functions like main() for c++ automatically set to the right parameters?
      • The decompiler tries not to make too many assumptions for these function prototypes and uses the context that is provided by the instructions in use - this allows things to be more generic and causes fewer failures, but also means that users have to sometimes identify and add the appropriate types. In short - you don't want to assume too much such that it breaks other use cases. 
    • Can you take thins like [rbp+8] and give them symbolic names for local variables?
      • Yes, if you right click the label that is being used, you can rename the variable to something different.
    • Also for the exercises, should we use docker on windows or do WSL and git clone the repo?
      • They have been tested within the docker container, and in an Ubuntu 18.04 VM, so I would recommend sticking with one of those two. If you have issues with docker, reach out and we’ll try to help you.

  • Class 2 Video

    Lutetium06/30/2020 at 17:11 0 comments

    Here is the mostly unedited video for the second class.

    This video is currently unlisted and will be edited into sections and reposted at a later date. 

  • Office Hour Questions 6/25/20

    wrongbaud06/26/2020 at 11:45 0 comments

    Office Hour Notes from 6/25/20

    Questions:

    • When is code obfuscation executed?
      • There are various levels of code obfuscation, sometimes the source code itself is obfuscated, and other times it’s applied to the machine code. 
    • What do you consider .NET stuff?  Is it a binary file or something else?
    • Do people obfuscate by handwriting assembly or are there obfuscating compilers?
      • Obfuscation can be performed in a number of ways. For example, there are obfuscating assemblers, and various compiler tricks that can be done to aid in obfuscation.
    • How often is obfuscation in play?
      • It depends on your target, you’ll find it often in games and other things that require some sort of DRM, but it’s less common when looking at embedded firmware images for example.
    • It would be great if you can give some pointer on how to identify packed or encrypted code using Ghidra
    • Any binary can be reversed?
      • Yes, technically speaking anything that contains machine code that can eventually be run by the CPU can be reverse engineered.
    • Do you have any resources for extracting binaries from a platform/uC?
    • On embedded systems do you often see heap being used? Or is deterministic memory (stack) more common?
      • This depends entirely on what the system is used for - if it’s running an RTOS or a Linux based OS, then you’re going to see heap usage. Smaller microcontrollers may not have space/resources to implement a memory allocator and will rely on statically sized buffers in SRAM. 
    • Is the memory for AH / AX / EAX / etc. shared? i.e. can you access 8 bits of AX by accessing AH?
      • Yes, the various representations of these registers can be used to access those specific size ranges. 
    • Is there a universal reference to the instruction set for x86_64?
      • Yes, the Intel instruction set architecture reference is linked on the course page. 
    • x86-64 has a flat 64bit memory model so RAM, as well as PCIe peripherals, can end up in memory space, correct?
      • Technically this is correct, however there are memory protections in place to try to prevent these regions from being accessed. The operating system / mmu will protect these regions of memory, as well as the drivers utilizing them from being accessed. 
    • What are 'high level' differences between Ghidra and Ida Pro? [understand it may just be OpenSource vs not]
      • There are many differences between the two, and we will go over these during the second class session!
    • Will we be touching on what to do if Ghidra can’t find cross-references because the pointers are some +value off from the virtual addresses in this course? (trying to reverse some firmware blob)
      • When looking at firmware blobs, properly creating a memory map is very important, and may be the reason why you’re having issues with XRefs.. This can be done from within ghidra by clicking on the memory viewer, or by writing a loader / script to perform this for you. It is important to also create relevant RAM regions when working with firmware images as these are often where the XRefs will be located. 
    • Thank you for the amazing tour of the tools, but what is the "goal" - what can we expect to do with all this? :) 
      • The goal of this course is to familiarize students with the concepts behind reverse engineering software, and provide a base understanding of how to use Ghidra to solve binary puzzles and challenges. 
      • By the end of this course, students will be comfortable loading x86_64 ELF files into Ghidra and be able to analyze them. 
    • I thought EABI was for embedded...
    Read more »

  • Class 1 video

    Lutetium06/23/2020 at 18:13 0 comments

    Here is the mostly unedited video for the first class.

    This video is currently unlisted and will be edited into sections and reposted at a later date. 

    https://youtu.be/rblRVBd2Xws

View all 5 project logs

Enjoy this project?

Share

Discussions

Ghidra-Server.org wrote 21 hours ago point

If you'd be interested in running a team collaboration focused Class over at https://www.ghidra-server.org/ we'd be happy to host a class playground and make some arrangement for swift accounts.

  Are you sure? yes | no

wrongbaud wrote 21 hours ago point

Thanks! We'll let you know when we kick off the next iteration of this course!

  Are you sure? yes | no

hexpwn wrote 21 hours ago point

Hi! where are you guys having discussions? Is there a Discord server or something like that? Are office hours streamed and if so where can I get a link to the class? Thanks!

  Are you sure? yes | no

hexpwn wrote 21 hours ago point

Ops, nevermind :) New to the platform... just found the chat feature.

  Are you sure? yes | no

Zach Kost-Smith wrote a day ago point

When you post solutions for these assignments, please post the source that produced these binaries.  Looking at the struct example I'm completely confused how any reasonable code could produce the assembly I'm seeing.

  Are you sure? yes | no

wrongbaud wrote a day ago point

When the class is over, we will provide solutions to those that signed up and can also include the source code for the exercises. One thing to keep in mind is that the exercises are written with the goal of testing your knowledge of the underlying concepts first, and sanity/cleanliness second! - they are meant to be challenging and somewhat bespoke!

  Are you sure? yes | no

Michael Fisher wrote 2 days ago point

Having issues fininding the struct in the exercises, in the struct file I see `RAX=>local_38,[RBP + -0x30]` however the slides it says to look for `rbp-0x10` but not seeing any minus offsets on the address other than than `RAX=>local_38,[RBP + -0x30]`. Am i just being dense?

  Are you sure? yes | no

wrongbaud wrote a day ago point

If you look at the notation, you'll notice that it's adding a negative value, so it's still a subtraction operation. The assembly listings in the slides were generated by Objdump and not Ghidra!

  Are you sure? yes | no

Michael Fisher wrote 3 days ago point

Quick question would it be possible to put a note on how to find entry points on microprocessor firmware?

  Are you sure? yes | no

wrongbaud wrote 3 days ago point

That will be dependent on your micro controller - if you have a datasheet, the information will be available there, otherwise you can use information inferred by the various startup instructions / interrupt vectors. Perhaps we can do a firmware RE class in the future for those interested.

  Are you sure? yes | no

clintr wrote 07/02/2020 at 03:51 point

Hi, could someone please help me to get the new exercises into the docker container?  I'm unfamiliar with docker and with git.  Here is what I did:

1. From within the hackaday-u directory, I ran "git pull"

  - I can see that the session-two folder was added, and some new exercises under session-one/exercises/.

2. From within hackaday-u/docker, I ran "docker run --rm -it hackaday /bin/bash" but in the shell the session-two folder and the updates to session-one/exercises weren't there

3. Exited the docker shell, used the docker dashboard to make sure no container was running

4. From within hackaday-u/docker, I ran "docker build . -t hackaday"

5. From within hackaday-u/docker, I ran "docker run --rm -it hackaday /bin/bash" again; still no luck.

Thanks

  Are you sure? yes | no

wrongbaud wrote 7 days ago point

You will likely have to remove the old container/image and rebuild, or from the respository within the running container you can run git pull - then you can commit those changes back into the container. 

  Are you sure? yes | no

leethobbit wrote 06/30/2020 at 02:37 point

Any chance you can provide high level "next steps" for someone who is primarily interested in games, particularly MMOs and emulation in C++?  There's so much to learn it can be challenging figuring out where to focus.

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 10:57 point

The core reversing skills are still going to be relevant for any game you're reversing. So starting there and working up towards more complex targets wouldn't hurt. Reversing C++ can be challenging but there are some good resources out there

  Are you sure? yes | no

Paul Williamson wrote 06/30/2020 at 01:46 point

If there's a preferred order for working on the exercises, it would be helpful to encode that in the filenames. Session one did for the first four, but session two didn't.

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 01:49 point

The exercises are to be done in the order that they are presented in the slides which are also in the repository.

  Are you sure? yes | no

clintr wrote 06/30/2020 at 01:32 point

Off-topic, but: could I have installed Ghidra and the JRE in the Docker container, and would there be any advantage to doing that?

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 01:49 point

You could have built Ghidra in the docker container, but as far as I understand, there not a way to run a GUI app such as Ghidra from a Docker container. 

  Are you sure? yes | no

clintr wrote 06/30/2020 at 01:30 point

How do I sign up for the office hour?

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 01:54 point

There is a zoom link to the office hours in the public chat here: https://hackaday.io/messages/room/288398

  Are you sure? yes | no

Curious AI wrote 06/30/2020 at 00:59 point

Hi, I got a segfault when trying to run the binaries from the repo; I'm running Ubuntu 20.04 from WSL. Are there additional dependencies I should be aware of?

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 01:48 point

They've been run and tested on Ubuntu 18.04 - which challenges are giving you a problem? The Docker container should run all of the exercises properly.

  Are you sure? yes | no

Curious AI wrote 06/30/2020 at 02:36 point

Had a segfault trying to run dobby out of elf-exercises. The binaries in the exercises folder are running fine. I'm using Ubuntu on Windows Subsystem for Linux, so it can't run Docker.

  Are you sure? yes | no

William Durand wrote 06/28/2020 at 12:57 point

Hey, thanks for the first session. Quick question about slide 35 (asm example): when we jump to `_greater`, why does `RAX` change from `0x3FFF` to `0x2FFF`?

  Are you sure? yes | no

wrongbaud wrote 06/28/2020 at 13:23 point

Great catch - it definitely should not do that - that was a PPT mistake, I'll edit them and re-upload later today

  Are you sure? yes | no

Chris Gloom wrote 06/28/2020 at 02:50 point

finished all the challenge cracks! Super fun. Thanks for showing that renaming argc and argv trick. It makes the decompiled C so much more readable.

  Are you sure? yes | no

wrongbaud wrote 06/30/2020 at 01:54 point

Nice work!

  Are you sure? yes | no

Chris Gloom wrote 06/24/2020 at 02:17 point

Got challenge three and four cracked! Super fun. Thanks for putting these together. Literally shouted out loud when I got them haha.

  Are you sure? yes | no

wrongbaud wrote 06/24/2020 at 16:39 point

Great work! We'll be putting out a few more later on today

  Are you sure? yes | no

Lazer.Coh3n wrote 06/23/2020 at 13:31 point

Will the videos be posted for later study?

I will be working during these hours...

  Are you sure? yes | no

wrongbaud wrote 06/23/2020 at 14:43 point

Yes they will be - the page will be updated with the video links when they are ready

  Are you sure? yes | no

Chris Gloom wrote 06/23/2020 at 05:27 point

For anyone confused about the adding a byte happening as I was to get the start of the string in the challenges, argv in C is an array of char pointers. The first pointer, 0 indexed, will always point to the name of the program itself and the second is the first command line argument passed. Char pointers are a byte long so we're accessing the user provided argument at address argv + 8.

Been a bit since I messed around with C so this is possibly only a revelation to me haha.

  Are you sure? yes | no

wrongbaud wrote 06/23/2020 at 12:40 point

Great point! We're going to talk about this in more detail next week, nice work!

  Are you sure? yes | no

Chris Gloom wrote 06/20/2020 at 23:23 point

joined the waitlist for session one and I'm in on session two. Is session one actually sold out or is this an eventbrite issue?

  Are you sure? yes | no

wrongbaud wrote 06/21/2020 at 11:14 point

I believe it is actually sold out, however we will be releasing the videos of the classes as well so you will still be able to access the material even if you're not present for the actual class!

  Are you sure? yes | no

thegink wrote 06/22/2020 at 09:48 point

I'm in the same boat - Is it possible for you to release the videos or course outline from session 1 before session 2 starts, so we aren't too far behind in the following sessions?

Thanks!

  Are you sure? yes | no

Zrocket wrote 06/20/2020 at 21:47 point

Is there any availability left for this class? I learned about it from the hackaday YouTube channel, and have been trying to figure out how to sign up. Thanks.

  Are you sure? yes | no

wrongbaud wrote 06/21/2020 at 11:14 point

The eventbrite page is here: https://www.eventbrite.com/e/hackaday-u-reverse-engineering-with-ghidra-tickets-109681391996 - we will be releasing the videos of the classes as well so you will still be able to access the material even if you're not present for the actual class.

  Are you sure? yes | no

Zrocket wrote 06/21/2020 at 18:06 point

Will the video's be released before the next class? Just wanting to know if it's worth signing up for the others if the first class is full -- you know, so you're caught up. Thanks again.

  Are you sure? yes | no

todd.c734 wrote 06/18/2020 at 14:54 point

Is each class a building on from the other class, or are there just 4 sessions to allow 30 people each session? Just wondering if I need to sign up for all 4 or just 1. 

  Are you sure? yes | no

wrongbaud wrote 06/18/2020 at 21:22 point

Each class will build off of the previous classes, and the videos will be posted here in case anyone needs to catch up or misses a class. 

  Are you sure? yes | no

Akash M wrote 06/18/2020 at 06:03 point

6pm EDT = 3am IST, which is like late night or early morning, is there any chance to change the timing which will be suitable for all people around the world?

  Are you sure? yes | no

wrongbaud wrote 06/18/2020 at 21:21 point

It is difficult to pick a time that works worldwide. The videos of the classes will be released on this page for everyone to use and follow along

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates