Ghidra的脚本功能是其最大优势之一。通过GhidraScript API,可以自动化完成函数标注、字符串搜索、模式匹配等重复性工作,大幅提升逆向效率。
脚本环境
Ghidra支持两种脚本语言:
- Java:原生支持,性能最好,API最完整
- Python (Jython):Python 2.7语法,入门更友好
在Script Manager中可以直接编写和运行脚本,也可以通过命令行的Headless Analyzer执行。
运行方式
交互式运行:在CodeBrowser中打开 Window -> Script Manager,选择脚本运行。
Headless批量运行:
$GHIDRA_HOME/support/analyzeHeadless /path/to/project MyProject \
-import /path/to/binary \
-postScript my_script.py \
-scriptPath /path/to/scripts
这种方式可以对大量样本进行批量分析,非常适合恶意软件分析团队。
核心API
GhidraScript中最常用的几个对象和方法:
currentProgram
当前正在分析的程序对象,是所有操作的入口:
# 获取程序基本信息
name = currentProgram.getName()
lang = currentProgram.getLanguage().getLanguageID()
base_addr = currentProgram.getImageBase()
print("Program: {} | Language: {} | Base: {}".format(name, lang, base_addr))
函数操作
from ghidra.program.model.symbol import SourceType
fm = currentProgram.getFunctionManager()
# 遍历所有函数
for func in fm.getFunctions(True): # True = forward direction
entry = func.getEntryPoint()
name = func.getName()
body_size = func.getBody().getNumAddresses()
print("0x{}: {} ({} bytes)".format(entry, name, body_size))
# 获取指定地址的函数
addr = toAddr(0x00401000)
func = getFunctionAt(addr)
if func:
print("Found function: " + func.getName())
# 重命名函数
func.setName("my_decrypt_func", SourceType.USER_DEFINED)
指令遍历
# 从某个地址开始遍历指令
listing = currentProgram.getListing()
addr = toAddr(0x00401000)
inst = getInstructionAt(addr)
while inst is not None:
mnemonic = inst.getMnemonicString()
num_ops = inst.getNumOperands()
print("0x{}: {} (operands: {})".format(inst.getAddress(), mnemonic, num_ops))
inst = inst.getNext()
# 只看前20条
if inst and inst.getAddress().subtract(addr) > 100:
break
内存与数据读取
# 读取内存中的字节
mem = currentProgram.getMemory()
addr = toAddr(0x00402000)
buf = bytearray(16)
mem.getBytes(addr, buf)
print("Bytes at 0x402000: " + " ".join("{:02x}".format(b & 0xff) for b in buf))
# 读取字符串
data = getDataAt(addr)
if data and data.hasStringValue():
print("String: " + data.getValue())
实战脚本:自动标注可疑函数
下面这个脚本会扫描所有函数,找出包含加密相关常量的函数,并添加标签:
# auto_tag_crypto.py - 自动标注可能包含加密操作的函数
from ghidra.program.model.symbol import SourceType
# 常见加密算法常量
CRYPTO_CONSTANTS = {
0x67452301: "MD5/SHA1_INIT",
0xEFCDAB89: "MD5/SHA1_INIT",
0x98BADCFE: "MD5/SHA1_INIT",
0x10325476: "MD5/SHA1_INIT",
0x6A09E667: "SHA256_INIT",
0xBB67AE85: "SHA256_INIT",
0x5A827999: "SHA1_K",
0x6ED9EBA1: "SHA1_K",
0x8F1BBCDC: "SHA1_K",
0xCA62C1D6: "SHA1_K",
0x63707865: "CHACHA20/SALSA20", # "expa"
}
fm = currentProgram.getFunctionManager()
listing = currentProgram.getListing()
tagged_count = 0
for func in fm.getFunctions(True):
body = func.getBody()
found_constants = set()
inst = listing.getInstructionAt(body.getMinAddress())
while inst is not None and body.contains(inst.getAddress()):
# 检查指令中的立即数操作数
for i in range(inst.getNumOperands()):
for obj in inst.getOpObjects(i):
if hasattr(obj, 'getUnsignedValue'):
val = obj.getUnsignedValue()
if val in CRYPTO_CONSTANTS:
found_constants.add(CRYPTO_CONSTANTS[val])
inst = inst.getNext()
if found_constants:
# 给函数添加注释
comment = "CRYPTO: " + ", ".join(found_constants)
func.setComment(comment)
# 如果函数名是默认的FUN_开头,加个前缀
if func.getName().startswith("FUN_"):
new_name = "crypto_" + func.getName()
func.setName(new_name, SourceType.ANALYSIS)
tagged_count += 1
print("Tagged: {} @ 0x{} -> {}".format(
func.getName(), func.getEntryPoint(), comment))
print("\nDone. Tagged {} functions with crypto constants.".format(tagged_count))
实战脚本:批量提取字符串引用
分析恶意软件时,字符串是最重要的线索之一。这个脚本提取所有被代码引用的字符串,并按引用次数排序:
# extract_referenced_strings.py
from ghidra.program.model.data import StringDataType
listing = currentProgram.getListing()
ref_manager = currentProgram.getReferenceManager()
string_refs = {}
# 遍历所有已定义的数据
data_iter = listing.getDefinedData(True)
for data in data_iter:
if data.hasStringValue():
s = data.getValue()
if len(s) < 4: # 跳过太短的字符串
continue
addr = data.getAddress()
refs = ref_manager.getReferencesTo(addr)
ref_list = []
for ref in refs:
from_addr = ref.getFromAddress()
func = getFunctionContaining(from_addr)
func_name = func.getName() if func else "unknown"
ref_list.append((str(from_addr), func_name))
if ref_list:
string_refs[s] = ref_list
# 按引用次数排序输出
sorted_strings = sorted(string_refs.items(), key=lambda x: len(x[1]), reverse=True)
print("=== Referenced Strings (sorted by ref count) ===\n")
for s, refs in sorted_strings[:50]: # Top 50
print("[{}x] \"{}\"".format(len(refs), s[:80]))
for from_addr, func_name in refs:
print(" <- {} ({})".format(from_addr, func_name))
print()
Headless自动化流水线
结合Headless Analyzer,可以构建自动化分析流水线。下面是一个示例脚本框架:
#!/bin/bash
# batch_analyze.sh - 批量分析目录下的所有PE文件
GHIDRA_HOME="/opt/ghidra"
PROJECT_DIR="/tmp/ghidra_projects"
SCRIPT_DIR="/home/user/ghidra_scripts"
SAMPLE_DIR="/samples/malware"
mkdir -p "$PROJECT_DIR"
for sample in "$SAMPLE_DIR"/*.exe; do
name=$(basename "$sample" .exe)
echo "Analyzing: $name"
"$GHIDRA_HOME/support/analyzeHeadless" \
"$PROJECT_DIR" "batch_$name" \
-import "$sample" \
-postScript auto_tag_crypto.py \
-postScript extract_referenced_strings.py \
-scriptPath "$SCRIPT_DIR" \
-deleteProject \
> "/tmp/nesults_${name}.txt" 2>&1
done
小结
Ghidra脚本的几个要点:
currentProgram是入口,通过它获取函数管理器、内存、符号表等toAddr()将整数转为地址对象getFunctionAt()/getInstructionAt()/getDataAt()是最常用的查询方法- 修改操作(重命名、添加注释)需要在事务中进行,GhidraScript会自动处理
- Headless模式下的批量分析是Ghidra相比IDA的一大优势
掌握了脚本编写,Ghidra就不只是一个GUI工具,而是一个强大的逆向分析平台。