Gymbo
Gymbo: Gradient-based Symbolic Execution

Gymbo is a Proof of Concept for a Gradient-based Symbolic Execution Engine, implemented from scratch. Building on recent advancements that utilize gradient descent to solve SMT formulas [1, 2], Gymbo leverages gradient descent to discover input values that fulfill each path constraint during symbolic execution.

Compared to other prominent symbolic execution tools, Gymbo's implementation is notably simpler and more compact. We hope that this project will assist individuals in grasping the fundamental principles of symbolic execution and SMT problem-solving through gradient descent.

One practical usage of Gymbo is debugging ML models like neural networks to detect unexpected behaviors. For example, you can generate adversarial examples with Gymbo by converting neural networks to imperative programs.

Another unique feature of Gymbo is that it can track the probabilistic symbolic variables. We adopt the PBRANCH algorithm proposed in [3] and currently support the discrete uniform, bernoulli, and binomial distributions.

Please note that Gymbo is currently under development and may have bugs.

Your feedback and contributions are always greatly appreciated.

Install

git clone https://github.com/Koukyosyumei/Gymbo.git
./script/build.sh

Input Source Code Grammar

Gymbo presently supports C-like programs with the following BNF grammar:

program = stmt*
stmt = expr ";"
| "{" stmt* "}"
| "if" "(" expr ")" stmt ("else" stmt)?
| "return" expr ";"
relational = add ("<" add | "<=" add | ">" add | ">=" add)*
add = mul ("+" mul | "-" mul)*
mul = unary ("*" unary | "/" unary)*
unary = ("+" | "-")? primary
primary = num | ident | "(" expr ")"
Node * stmt(Token *&token, char *user_input)
Parse and construct an AST node representing a statement.
Definition: parser.h:397
Node * assign(Token *&token, char *user_input)
Parses an assignment statement from a C-like language program.
Definition: parser.h:223
Node * mul(Token *&token, char *user_input)
Parse and construct an AST node representing multiplication or division.
Definition: parser.h:333
Node * relational(Token *&token, char *user_input)
Parses a relational expression from a C-like language program.
Definition: parser.h:284
Node * add(Token *&token, char *user_input)
Parse and construct an AST node representing addition or subtraction.
Definition: parser.h:310
Node * equality(Token *&token, char *user_input)
Parses an equality expression from a C-like language program.
Definition: parser.h:262
Node * expr(Token *&token, char *user_input)
Parses an expression from a C-like language program.
Definition: parser.h:210
Node * primary(Token *&token, char *user_input)
Parse and construct an AST node representing primary expressions.
Definition: parser.h:372
Node * unary(Token *&token, char *user_input)
Parse and construct an AST node representing unary operations.
Definition: parser.h:356
Node * logical(Token *&token, char *user_input)
Parses a logical expression from a C-like language program.
Definition: parser.h:240

Please note that Gymbo currently ignores / when solving path constraints.

Internal Algorithm

Gymbo converts the path constraint into a numerical loss function, which becomes negative only when the path constraint is satisfied. Gymbo uses the following transformation rule:

!(a) => -a + eps
(a < b) => a - b + eps
(a <= b) => a - b
(a > b) => b - a + eps
(a >= b) => b - a
(a == b) => abs(a - b)
(a != b) => -abs(a - b) + eps
(a && b) => max(a, b)
(a || b) => min(a, b)

Here, eps is the smallest positive value of the type for a and b.

Optionally, Gymbo can use DPLL (SAT solver) to decide the assignment for each unique term, sometimes resulting in better scalability.

CLI Tool

gymbo command accepts the following command-line options:

  • -d: Set the maximum depth for symbolic execution (default: 256).
  • -v: Set the verbosity level (default: 1). Use -1 for minimal output.
  • -i: Set the number of iterations for gradient descent (default: 100).
  • -a: Set the step size for gradient descent (default: 1.0).
  • -e: Set the eps for the differentiable expression (default: 1.0).
  • -t: Set the maximum number of trials of gradient descent (default: 3)
  • -l: Set the minimum value of initial parameters (default: -10)
  • -h: Set the maximum value of initial parameters (default: 10)
  • -s: Set the random seed (default: 42)
  • -p: (optional) If set, use DPLL to determine the assignment for each term. Otherwise, solve the loss function directly transformed from the path constraints.
./gymbo "if (a < 3) if (a > 4) return 1;" -v 0
>Compiling the input program...
>Start Symbolic Execution...
>---------------------------
>Result Summary
>#Total Path Constraints: 4
>#SAT: 3
>#UNSAT: 1
>List of UNSAT Path Constraints
># var_1 < 3 and 4 < var_1 and 1

libgymbo: Header-only Library

Since gymbo consists of the header-only library, you can easily create your own symbolic execution tool.

#include "libgymbo/gd.h"
#include "libgymbo/type.h"
char user_input[] = "if (a < 3) return 1;"
std::vector<gymbo::Node *> code;
gymbo::GDOptimizer optimizer(num_itrs, step_size);
gymbo::PathConstraintsTable cache_constraints;
// tokenize the input source code
gymbo::Token *token = gymbo::tokenize(user_input);
// generate AST from the tokens
gymbo::generate_ast(token, user_input, code);
// construct virtual stack machine from AST
gymbo::compile_ast(code, prg);
// execute gradient-based symbolie execution
gymbo::SExecutor executor(optimizer, maxSAT, maxUNSAT, max_num_trials,
ignore_memory, use_dpll, verbose_level);
executor.run(prg, target_pcs, init, max_depth);
Implementation of compiler.
Implementation of gradient descent optimizer.
void compile_ast(std::vector< Node * > code, Prog &prg)
Compile the Abstract Syntax Tree (AST) into a sequence of instructions.
Definition: compiler.h:133
void generate_ast(Token *&token, char *user_input, std::vector< Node * > &code)
Parses a C-like language program into an AST.
Definition: parser.h:439
Token * tokenize(char *user_input, std::unordered_map< std::string, int > &var_counter)
Tokenizes a given string and returns a linked list of tokens.
Definition: tokenizer.h:222
std::vector< Instr > Prog
Alias for a program, represented as a vector of instructions.
Definition: type.h:143
std::unordered_map< std::string, std::pair< bool, std::unordered_map< int, float > >> PathConstraintsTable
Alias for a table of path constraints.
Definition: type.h:1189
Implementation of parser.
Gradient Descent Optimizer for Symbolic Path Constraints.
Definition: gd.h:28
Represents a derived class for symbolic execution engine for deterministic programs.
Definition: symbolic.h:452
Struct representing the symbolic state of the symbolic execution.
Definition: type.h:1042
Structure representing a token.
Definition: tokenizer.h:89
Implementation of tokenizer.
Implementatations of basic alias, types, and classes.

Python API

Install

pip install git+https://github.com/Koukyosyumei/Gymbo

pylibgymbo: Python Wrapper for libgymbo

import pylibgymbo as plg
inp = "a = 1; if (a == 1) return 2;"
var_counter, prg = plg.gcompile(inp)
optimizer = plg.GDOptimizer(num_itrs, step_size, ...)
executor = plg.SExecutor(optimizer, maxSAT, maxUNSAT, max_num_trials,
ignore_memory, use_dpll, verbose_level)
executor.run(prg, target_pcs, init, max_depth)

pymlgymbo: Debugging Machine Learning Models

We offer some helper functions within pymlgymbo library to convert the ML models of famous Python library like sklearn and PyTorch to the program that gymbo can process.

The following code generates the adversarial examples against a neural network, as proposed in [3]

from sklearn.neural_network import MLPClassifier
import pylibgymbo as plg
import pymlgymbo as pmg
clf = MLPClassifier(activation="relu")
clf.fit(X_train, y_train)
mlp_code = pmg.dump_sklearn_MLP(clf, feature_names)
adv_condition = (
"("
+ " || ".join(
[f"(y_{c} > y_{y_pred})" for c in range(len(clf.classes_)) if
y_pred != c]
)
+ ")"
)
optimizer = plg.GDOptimizer(num_itrs, step_size, ...)
var_counter, prg = plg.gcompile(mlp_code)
executor = plg.SExecutor(optimizer, maxSAT, maxUNSAT, max_num_trials,
ignore_memory, use_dpll, verbose_level)
executor.run(prg, target_pcs, init, max_depth)

Acknowledgement

Gymbo is entirely implemented in C++ and requires only standard libraries. The process of compiling from source code to stack machines is based on the implementation of rui314/chibicc [4], while the symbolic execution approach is inspired by beala/symbolic [5].

Reference

- [1] Chen, Peng, Jianzhong Liu, and Hao Chen. "Matryoshka: fuzzing deeply
nested branches." Proceedings of the 2019 ACM SIGSAC Conference on Computer
and Communications Security. 2019.
- [2] Minghao Liu, Kunhang Lv, Pei Huang, Rui Han, Fuqi Jia, Yu Zhang, Feifei
Ma, Jian Zhang. "NRAgo: Solving SMT(NRA) Formulas with Gradient-based
Optimization." IEEE/ACM International Conference on Automated Software
Engineering, 2023
- [3] Gopinath, Divya, et al. "Symbolic execution for importance analysis and
adversarial generation in neural networks." 2019 IEEE 30th International
Symposium on Software Reliability Engineering (ISSRE). IEEE, 2019.
- [4] https://github.com/rui314/chibicc
- [5] https://github.com/beala/symbolic