Context
Velox evaluates SQL expressions as trees of functions. A query like
if(array_gte(a, b), multiply(x, y), 0) compiles into a tree where each node
processes an entire vector of rows at a time. When a query runs slowly, the
first question usually is: which function is consuming the most CPU? Is it the
expensive array comparison, or the cheap arithmetic called millions of times?
This problem is even more prominent in use cases like training data loading,
when very long and deeply nested expression trees are common, and jobs may run
for many hours, or days; in such cases, the CPU usage of even seemingly
short-lived functions may add up to substantial overhead. Without a detailed
per-function CPU usage breakdown, you may be left guessing — or worse,
optimizing the wrong thing.