
About Quarkslab
Quarkslab builds cutting-edge cybersecurity solutions used by security-driven companies and institutions around the world. Our QShield product suite focuses on software protection and reverse engineering resistance across desktop, mobile, and embedded platforms.
We’re not in the cloud — we build real software, tested on real systems. If you enjoy diving deep into complex technical environments, automating smart test coverage, and owning quality end-to-end, read on.
Description
Explore how a Large Language Model (LLM) can assist human reverse engineers in understanding compiled binaries (x86/ARM). The goal is to link assembly to semantics, automatically infer behavior, identify key routines, and recognize cryptographic primitives.
During the internship you will work a project with some specific goals and milestones.
Reproduce existing research such as “Machine-Language Model for Software Security” (see #bibliography below).
Build a full analysis pipeline (binary → disassembly (Ghidra/IDA/Bninja) → pseudo-code → embeddings → LLM-based interpretation.
Extend previous work by:
Adding an interactive assistant (chat-based RE helper).
Evaluating the tool on real binaries (malware, compiled open-source tools).
Measuring performance and accuracy of semantic inference.
What you will do
During the internship you will work a project with some specific goals and milestones.
Reproduce existing research such as “Machine-Language Model for Software Security” (see #bibliography below).
Build a full analysis pipeline (binary → disassembly (Ghidra/IDA/Bninja) → pseudo-code → embeddings → LLM-based interpretation.
Extend previous work by:
Adding an interactive assistant (chat-based RE helper).
Evaluating the tool on real binaries (malware, compiled open-source tools).
Measuring performance and accuracy of semantic inference.
Expected Results
A prototype tool that describes binary behavior using an LLM.
Quantitative evaluation (accuracy of function descriptions).
Qualitative evaluation of usefulness for human analysts.
Required Skills
Programing: Python (intermediate)
Reverse engineering (intermediate)
Assembly and binary structures(intermediate)
Prompt engineering & use of LLM APIs (basic)
Bibliography
Zhang Chao et al., Machine-Language Models for Software Security
Shang et. al, BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models.
Microsoft Research, CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation.
Assignment
Get the apksigner app.
Build a simple pipeline to decompile → analyze → LLM → synthesize.
In a short document, provide the resulting synthesis and 2 pages explaining how you built the pipeline.