少女祈祷中...

Joern

一个源码分析工具
功能:解析C/C++/java源代码并提供代码的中间图表示,包含:
  • Abstract Syntax Trees (AST) 抽象语法树
  • Control Flow Graphs (CFG) 控制流图
  • Control Dependence Graphs (CDG) 控制依赖图
  • Data Dependence Graphs (DDG) 数据依赖图
  • Program Dependence graphs (PDG) 程序依赖图
  • Code Property Graphs (CPG14) 代码属性图
  • Entire graph, i.e. convert to a different graph format (ALL)

环境配置与安装

System
  • WSL2 (Ubuntu 22.04)
Java:
  • openjdk version “11.0.15” 2022-04-19
  • OpenJDK Runtime Environment (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1)
  • OpenJDK 64-Bit Server VM (build 11.0.15+10-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)
Packages:
  • unzip
Installation
1
2
3
4
mkdir joern && cd joern # optional
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod u+x joern-install.sh
./joern-install.sh --interactive

导入源码&新建工程

  • method1:fromstring
1
2
joern> importCode.$Languange.fromString("$Code")
res0: Cpg = Cpg (Graph [$number nodes])

e.g.

image-20220727155126355
  • method 2: frompath
1
joern> importCode(inputPath="$path", projectName="$name")

解析c源码并输出对应的cpg的.dot文件

1
joern> cpg.method($name).dotCpg14.l

可视化.dot并导出为svg

使用VS Code 插件 Graphviz Interactive Preview

利用python脚本批量处理源代码(CPD&PDG)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import subprocess
import os
import shutil
import pandas as pd
from tqdm import tqdm
JOERNPATH="$JOERNPATH"
root_dir = './data'
source_dir = "$src"

import subprocess

def parse_source_code_to_dot(file_path,f,
out_dir_pdg='/parsed/dot/pdg/',out_dir_cpg='/parsed/dot/cpg/'):
root_path = './data'
try :
os.makedirs(root_path+out_dir_pdg)
os.makedirs(root_path+out_dir_cpg)
except:
pass
out_dir_cpg=root_path + '/parsed/dot/cpg/'

shell_str = "sh " + JOERNPATH + "./joern-parse " + file_path
subprocess.call(shell_str, shell=True)

shell_export_cpg = "sh " + JOERNPATH + "joern-export " + "--repr cpg14 --out " + out_dir_cpg + f.split('.')[0] + os.sep
subprocess.call(shell_export_cpg, shell=True)

写入到json:

1
2
3
4
5
6
7
8
9
10
11
import json
import gzip

path = 'data/poj104/test.gzip'
with gzip.open(path, 'r') as fin:
json_bytes = fin.read()
json_str = json_bytes.decode('utf-8')
objs = json.loads(json_str)

with open('json.json','w',encoding='utf-8') as file:
file.write(json.dumps(objs,indent=2,ensure_ascii=False))