Part I of this series looked at the challenges of handling newlines in Kotlin. But newlines aren’t the only difficulty in building a parser for the language. Kotlin’s grammar has ambiguities and, in some cases, requires unbounded lookahead to parse certain constructs. In this post, we’ll walk through examples of Kotlin code that demonstrate ambiguity or require unbounded lookahead, and discuss how both the Kotlin compiler and the specification grammar (written in ANTLR) handle them.
Parsing concepts: lookahead and ambiguity
Parsing is a search problem. Given a grammar and an input, a parser builds a tree that spans the entire input according to the grammar. As in any search problem, we want to minimize choices (and backtracking) to stay efficient. To do this, a parser ideally only looks ahead a few tokens at a time when deciding which rule to apply.
Most programming languages do not have fixed-lookahead grammars (e.g., LL(k)), and may require unbounded lookahead, or techniques to deal with ambiguity. An input is ambiguous if it has more than one valid parse. For example, 1 + 2 * 3
could be parsed as (1 + (2 * 3))
or as ((1 + 2) * 3)
. The second parse must be discarded because it violates the standard arithmetic rules of operator precedence. Operator precedence is the usual solution for disambiguating expressions, but beyond that, tools often rely on ad-hoc rules or heuristics. For instance, ANTLR resolves non-precedence ambiguities by using the order of rules defined in the grammar.
Parsers that support unbounded lookahead explore all the grammar rules using some form of backtracking, which can take exponential time in the worst case (in contrast to fixed-lookahead parsers, which run in linear time). To avoid exponential blowup, such parsers often memoize intermediate results, reducing runtime to polynomial.
Writing grammars that work with fixed lookahead is often tedious and often impossible. In practice, most parsers solve these issues during parsing rather than at the grammar level. Broadly speaking, there are two strategies:
Extending a recursive-descent parser with custom handling for ambiguities and unbounded lookahead. This strategy usually requires language-specific tricks but gives fine-grained control over performance. The Kotlin compiler has a hand-written parser that takes this approach.
Using a general parser with disambiguation. This strategy uses backtracking (or an equivalent mechanism like splitting the stacks) to explore all parsing paths, and can handle any context-free grammar. ANTLR, used for Kotlin’s specification grammar, is one such parser. Note that while ANTLR’s internal machinery for exploring parsing paths is complex, at a conceptual level it is equivalent to backtracking with memoization of partial results.
The Kotlin compiler and ANTLR handle backtracking differently. The Kotlin compiler tightly controls backtracking, applying it only in specific cases. ANTLR supports backtracking in its core engine: the parser explores possibilities until it finds a valid parse. This makes Kotlin’s approach significantly more efficient in practice.
Examples
Modifiers and annotations
Most languages, such as Java, allow annotations and modifiers (like public
or static
) only at the beginning of declarations (classes, methods, parameters, and so on). In contrast, Kotlin allows modifiers and annotations to appear not only at the beginning, but also in the middle of declarations or even inside expressions. This flexibility makes parsing declarations more complicated.
Consider the following Kotlin class declaration with a private constructor:
abstract class A private constructor(i: Int)
After the parser reads the identifier A
and encounters the private
keyword, it can’t, from that token alone, decide whether to continue parsing the class declaration or stop and move on to the next declaration. In this case, it should parse the whole thing as a single class declaration.
Compare this with the next snippet, where the parser should instead stop after A
and treat the following as a function declaration (no newline after A
is needed here):
abstract class A private fun f() = 1
Since an arbitrary number of modifiers and annotations can appear here, the parser must read all of them and then look ahead to the next token (fun
or constructor
) to decide how to proceed. This is a case where unbounded lookahead is required to parse Kotlin. The Kotlin compiler handles it with a mark/rollback mechanism for backtracking when parsing modifiers. The Kotlin specification grammar, written in ANTLR, supports unbounded lookahead by exploring both alternatives in parallel: it first attempts to parse the code as a single class declaration, and if that fails (as in the second example), it backtracks and parses it as a class declaration followed by a function declaration.
Note that these examples are not ambiguous: although unbounded lookahead is required to determine where the class declaration ends, only one parse ultimately succeeds.
Infix operators and newlines
The next example shows a case where Kotlin enforces newline placement to resolve the ambiguity, and is closely related to the discussion of newlines in Part I.
In Kotlin, any three consecutive identifiers can be parsed as an infix expression, which corresponds to a function call. For example:
infix fun String.op(i: Int): Int = this.length + i
fun main() {
s op i
}
Here, s op 1
is parsed as an infix expression, equivalent to the desugared form s.op(1)
. But what happens if we insert newlines between the parts of the expression?
First, consider this version, with newlines before both the operator and the second operand:
fun main() {
s
op
i
}
This code is parsed as three separate expressions, one per line, rather than a single infix expression. (It produces a compile error because identifier op
is not resolved.)
If we put a newline only before the second operand:
fun main() {
s op
i
}
It parses the same way as the original version without newlines and is equivalent to s.op(1)
.
From the parser’s point of view, all these interpretations are possible. To resolve the ambiguity, Kotlin enforces a specific rule: a newline is allowed only after the operator. As a result, any two consecutive identifiers, followed (optionally) by a newline and then another identifier, are always parsed as an infix expression. The Kotlin specification grammar does the same.
Function calls with explicit type parameters
Let’s look at the following Kotlin code. What does it do?
g(f<Int, String>(1, "a"))
To most readers, it’s obviously a call to function f
(e.g. fun <K, T> f(k: K, t: T)
) with explicit type parameters. However, according to Kotlin’s grammar, this is not the only valid interpretation. Consider a slight variation:
val a = 1
val b = 2
val c = 3
val d = 4
f(a < b, c > d)
This looks like a function call with two boolean arguments, but in reality it fails to compile:
[NO_VALUE_FOR_PARAMETER] No value passed for parameter 'b'.
The reason is that in Kotlin the pattern a<b,c>
is always parsed as a generic function call, even when it could also be read as two expressions passed as arguments.
This is an ambiguity in Kotlin syntax: both interpretations are syntactically valid. Disambiguating requires type information, but that’s not available during parsing. In practice, there are three ways to handle this:
Parse both variants. Then, during type checking, discard the incorrect parse tree. While elegant in theory, this requires a general parser (GLR, GLL) that produces a parse forest, which is significantly more complex and slower.
Use a general parser with disambiguation. ANTLR does this: it supports backtracking and explores all parsing paths until it finds the first successful one, chosen based on rule order. In the Kotlin specification grammar, the second example is parsed as a function call with two comparison expressions as arguments. This shows that the Kotlin specification grammar is more permissive than the Kotlin compiler’s parser because it uses a global backtracking strategy.
Commit to a single parse. This is the Kotlin compiler’s approach: it always interprets
a<b,c>
as a generic call, regardless of what follows. Unlike the case for modifiers, backtracking here would be expensive and difficult to implement. The compiler’s strategy is essentially greedy: commit early and raise a parse error if parsing can’t proceed.
Andrey Breslav, the original creator of Kotlin, in a talk noted that this ambiguity is why other languages, such as Java and Rust, adopted a more awkward but unambiguous syntax for generic calls. In Java, generics must appear before the function name after a qualifier, e.g., Collections.<String>unmodifiableList()
. In Rust, generic parameters require ::
, e.g., f::<String>()
. Both languages avoided the issue by enforcing an unambiguous grammar.
Parameter annotations and function types
This example is similar to the previous one: it shows how the Kotlin compiler’s greedy disambiguation can lead to a parse error with the added twist that whitespace changes the result. Consider:
fun myFunc(f: @Composable () -> Unit) {}
Here we define a function that takes a lambda parameter annotated with @Composable
. But if we remove the space between the annotation and the parameter list, we get a parse error:
fun myFunc(f: @Composable() -> Unit) {}
This case is not ambiguous: @Composable() -> Unit
can only be parsed one way. However, as with generic function calls, the Kotlin compiler must decide after seeing ()
whether it belongs to the annotation (a call-like annotation) or whether the annotation is just a marker. The compiler always treats ()
as part of the annotation, which causes the parse error here. The ANTLR specification grammar doesn’t have this issue, since it explores both parsing paths and accepts the valid one.
Together with the previous example, this shows how greedy disambiguation can sometimes cause parse errors. In the Kotlin compiler, using backtracking for such cases would complicate the implementation and add runtime overhead. The authors therefore accepted the tradeoff: some valid programs result in parse errors.
Class declarations with body and lambda expressions
Another ambiguity in Kotlin involves class bodies and lambdas. Consider the following code:
fun f() {
class A
{
val i = 1
}
}
Is this a class declaration without a body (class A
) followed by a lambda expression? Or is this a class declaration with a body containing a single field i
?
The Kotlin compiler prefers the second interpretation: a class with a body. This is straightforward to implement in a recursive-descent parser by checking for the {
token after parsing the class signature. The ANTLR Kotlin specification grammar produces the same result. ANTLR disambiguates this case by rule order: it explores the grammar depth-first, and once it can successfully parse the whole input as a class declaration, it stops and does not explore other interpretations.
Delegated specifiers with overrides
The final example is an ambiguity where the Kotlin compiler produces the expected result, but the ANTLR specification grammar does not:
class Derived(b: Base) : Base by b {
override fun f(): String = "Derived"
}
This code is ambiguous and can be parsed in two ways:
{}
defines the class body ofDerived
.{}
is the last lambda argument to methodb
, makingb { ... }
the delegation expression.
The Kotlin compiler chooses the first interpretation, but the ANTLR grammar produces the second. This shows that while tools like ANTLR make it easy to write unrestricted grammars, relying on rule order for disambiguation can sometimes result in the wrong result.
Conclusions
In this post, we looked at several Kotlin constructs that are ambiguous or require unbounded lookahead to parse. These examples show that when a language adopts complex syntax, parsing itself becomes a complex engineering problem, one that requires balancing trade-offs between performance and expressiveness. A prime example is the Kotlin compiler, which imposes certain syntactic limitations to avoid costly backtracking.
The Kotlin compiler’s parser is efficient because it controls precisely how unbounded lookahead and ambiguities are handled. ANTLR, on the other hand, explores all possibilities and stops at the first successful parse. As a result, the Kotlin compiler’s parser is significantly more efficient than ANTLR. There are also cases where the two differ in how they resolve ambiguities. In the end, the Kotlin compiler parser is the authoritative source of truth for how Kotlin is parsed.