-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I don't remember the exact correct term here so bear with me.
I thought that one of polars performance optimizations was elimination of common subexpressions and only visiting values once.
by elimnation of common subexpressions I mean that
if you have two expressions that call "value_counts()", value counts is only called once. But the implementation of expressions seems to prohibit that, or I haven't found the polars.rs facility to call.
By visiting values only once I mean
pl.select([
pl.col('a').abs().alias('abs'),
pl.col('a').log(base=2).alias('log')])
would result in only one scan of the 'a' column and each function called on that value, not two scans of the column.
But given how the expressions are written, where the plugin writer controls the iteration over chunked arrays, this doesn't seem possible.
Did I just make up polars features in my head that don't exist? If polars does have those features, how do we access them?