Description
Top-level expressions are special because they are able to cause certain side-effects that strongly affect program behavior and that ordinary expressions can't cause. These are:
- Binding resolution (
global
,const
,import
,using
) - Method definition
Fortunately, that's all. (Type definitions could be on this list, but the only tricky part of their behavior is assigning a global name, which is equivalent to item 1.)
The most desirable behavior for any statements not inside methods is "interpreter-like": all statements should see all effects of all previous statements. Unfortunately this is at odds with compiling fast code, which would like to assume a fixed world state (i.e. world counter). This tension has led to several bugs and tricky design decisions, such as:
Code behaving differently based on whether it gets compiled: #2586 #24566
Top-level code breaking inference's fixed-world assumption: #24316
Problems with binding effects: #18933 #22984 #12010
Three general kinds of solutions are possible: (1) Change top-level semantics to enable optimizations, (2) make optimizations sound with respect to top-level semantics, or (3) don't optimize top-level expressions. (1) is likely to be unpopular, since it would lead to things like:
f(x) = 1
begin
f(x::Int) = 2
f(1) # gives 1
end
due to inference's usual fixed-world assumption. That leads us to (2). But it wouldn't be able to optimize very much. Consider a case like
for i = 1:100
include("file$i.jl")
f(1)
end
where we can't reasonably assume anything about what f(1)
does.
That brings us to (3). In practice, it is basically never useful to infer and compile a top-level expression. The only exceptions are casual benchmarking (@time
), and maybe an occasional long-running loop in the REPL. So to compromise, for a long time we've been compiling top-level expressions if they have loops. But that's a rather blunt instrument, since compiling loops that e.g. call eval
is not worthwhile, and in other cases is unsound (#24316).
I'm not sure what to do, but overall I think we should optimize top-level code much less often (which might improve load times as well). One idea is to add a @compile
or @fast
macro that turns off top-level semantics, basically equivalent to wrapping the code in a 0-arg function and calling it.
Thoughts?