-->, which executes a recursive query on the given condition. This recursive operator allows you to write a Semgrep rule that effectively crawls the codebase on a condition you specify, letting you build chains such as function call chains or class inheritance chains.
Understanding recursive join mode
In the background, join rules turn captured metavariables into database table columns. For example, a rule with$FUNCTIONNAME, $FUNCTIONCALLED, and $PARAMETER is a table similar to the following:
$FUNCTIONNAME | $FUNCTIONCALLED | $PARAMETER |
|---|---|---|
| getName | writeOutput | user |
| getName | lookupUser | uid |
| lookupUser | databaseQuery | uid |
python-callgraph.$CALLER --> python-callgraph.$CALLEE produces a table below. Notice how function_1 appears with function_4 and function_5 as callees, even though it is not directly called.
$CALLER | $CALLEE |
|---|---|
| function_1 | function_2 |
| function_1 | function_4 |
| function_1 | function_5 |
| function_1 | |
| function_2 | function_4 |
| function_2 | function_5 |
| function_3 | function_5 |
| function_4 | function_5 |
| function_5 |
Example rule
Itβs important to think of a join mode rule as βasking questions about the whole projectβ, rather than looking for a single pattern. For example, to find an SQL injection, you need to understand a few things about the project:- Is there any user input?
- Do any functions manually build an SQL string using function input?
- Can the user input reach the function that manually builds the SQL string?
on: conditions, in order, read as follows:
- Recursively generate a pseudo callgraph on
$CALLERto$CALLEE. - Match when a method with user input has a
$SINKthat is the$CALLERin the pseudo-callgraph. - Match when the
$CALLEEis the$METHODNAMEof a method that uses a parameter to construct an SQL string.
$RETURNTYPE | $USERINPUTMETHOD | $TYPE | $PARAMETER | $OBJ | $SINK |
|---|---|---|---|---|---|
| β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
| LoginResponse | login | LoginRequest | input | user | token |
| LoginResponse | login | LoginRequest | input | User | getUser |
| β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
$RETURNTYPE | $METHODNAME | $TYPE | $PARAMETER | $SQLSTATEMENT |
|---|---|---|---|---|
| β¦ | β¦ | β¦ | β¦ | β¦ |
| User | fetch | String | un | select * from users where username = β |
| β¦ | β¦ | β¦ | β¦ | β¦ |
$CALLER | $CALLEE |
|---|---|
| β¦ | β¦ |
| login | getUser |
| login | fetch |
| getUser | fetch |
| β¦ | β¦ |
- Match when a method with user input has a CALLER in the pseudo-callgraph.
| β¦ | user-input.$SINK | == | callgraph.$CALLER | β¦ |
|---|---|---|---|---|
| β¦ | getUser | == | getUser | β¦ |
- Match when the METHODNAME of a method that uses a parameter to construct an SQL string.
| β¦ | callgraph.$CALLEE | == | formatted-sql.$METHODNAME | β¦ |
|---|---|---|---|---|
| β¦ | fetch | == | fetch | β¦ |
Limitations
Join mode only works on the metavariable contents, which means itβs fundamentally operating with text strings and not code constructs. There will be some false positives if similarly-named metavariables are extracted.Use cases
- Approximating callgraphs in a project
- Approximating class inheritance