Structural search lets you match richer syntax patterns specifically in code and structured data formats like JSON. It can be awkward or difficult to match code blocks or nested expressions with regular expressions. To meet this challenge we’ve introduced a new and easier way to search code that operates more closely on a program’s parse tree. We use Comby syntax for structural matching. Below you’ll find examples and notes for this language-aware search functionality.
fmt.Sprintf function is a popular print function in Go. Here is a pattern
that matches all the arguments in
fmt.Sprintf calls in our code:
... part is special syntax that matches all characters inside the
(...). Let’s look at two interesting variants of
matches in our codebase. Here’s one:
fmt.Sprintf("must be authenticated as an admin (%s)", isSiteAdminErr.Error())
Note that to match this code we didn’t have to do any special thinking about
handling the parentheses
(%s) that happen inside the first string argument,
or the nested parentheses that form part of
Error(). Unlike regular
expressions, no “overmatching” can happen and the match will always respect
balanced parentheses. With regular expressions, taking care to match the closing
parentheses for this call could, in general, really complicate matters.
Here is a second match:
fmt.Sprintf( "rest/api/1.0/projects/%s/repos/%s/pull-requests/%d", pr.ToRef.Repository.Project.Key, pr.ToRef.Repository.Slug, pr.ID, )
Here we didn’t have to do any special thinking about matching contents that
spread over multiple lines. The
... syntax by default matches across newlines.
Structural search supports various balanced syntax like
a language-aware way. This allows it to match large, logical blocks or expressions
without the limitations of typical line-based regular expression patterns.
... above is an alias for a canonical syntax
hole is a descriptive identifier for the matched content. Identifiers are
useful when expressing that matched content should be equal (see the
:[v.], :[v.] example below). See additional
||match zero or more characters in a lazy fashion. When
||match an arbitrary regular expression
||match one or more alphanumeric characters and underscore.|
||match zero or more characters up to a newline, including the newline.|
||match only whitespace characters, excluding newlines.|
||match one or more alphanumeric characters and punctuation like
Note: to match the string
... literally, use regular expression patterns like
Rules. Comby supports rules to
express equality constraints or pattern-based matching. Comby rules are not
officially supported in Sourcegraph yet. We are in the process of making that
happen and are taking care to address stable performance and usability. That
said, you can explore rule functionality with an experimental
Below you’ll find more examples. Also see our blog post for additional examples.
Match stringy data
Taking the original
fmt.Sprintf(...) example, let’s modify the original
pattern slightly to match only if the first argument is a string. We do this by
adding string quotes around
.... Adding quotes communicates structural
context and changes how the hole behaves: it will match the contents of a
single string delimited by
". It won’t match multiple strings like
Some matched examples are:
fmt.Sprintf("external service not found: %v", e.id)
fmt.Sprintf("%s/campaigns/%s", externalURL, string(campaignID))
Holes stop matching based on the first fragment of syntax that comes after it, similar to lazy regular expression matching. So, we could write:
fmt.Sprintf(:[first], :[second], ...)
to match all functions with three or more arguments, matching the the
second arguments based on the contextual position around the commas.
Match equivalent expressions
Using the same identifier in multiple holes adds a constraint that both of the matched values must be syntactically equal. So, the pattern:
return :[v.], :[v.]
will match code where a pair of identifier-like syntax in the
return statement are the same. For example,
return true, true,
return nil, nil, or
return 0, 0.
Structural search also works on structured data, like JSON. Use patterns to declaratively describe pieces of data to match. For example the pattern:
matches all parts of a JSON document that have a member
"exclude" where the value is an array of items.
Current functionality and configuration
Structural search behaves differently to plain text search in key ways. We are continually improving functionality of this new feature, so please note the following:
Only indexed repos. Structural search can currently only be performed on indexed repositories. See configuration for more details if you host your own Sourcegraph installation. Our service hosted at sourcegraph.com indexes approximately 200,000 of the most popular repositories on GitHub. Other repositories are currently unsupported. To see whether a repository on your instance is indexed, visit
langkeyword is semantically significant. Adding the
langkeyword informs the parser about language-specific syntax for comments, strings, and code. This makes structural search more accurate for that language. For example,
fmt.Sprintf(...) lang:go. If
langis omitted, we perform a best-effort to infer the language based on matching file extensions, or fall back to a generic structural matcher.
Saved search are not supported. It is not currently possible to save structural searches.
Matching blocks in indentation-sensitive languages. It’s not currently possible to match blocks of code that are indentation-sensitive. This is a feature planned for future work.