Skip to content

How do polars plugins work performance optimizations #70

@paddymul

Description

@paddymul

I don't remember the exact correct term here so bear with me.

I thought that one of polars performance optimizations was elimination of common subexpressions and only visiting values once.

by elimnation of common subexpressions I mean that
if you have two expressions that call "value_counts()", value counts is only called once. But the implementation of expressions seems to prohibit that, or I haven't found the polars.rs facility to call.

By visiting values only once I mean

pl.select([
    pl.col('a').abs().alias('abs'), 
    pl.col('a').log(base=2).alias('log')])

would result in only one scan of the 'a' column and each function called on that value, not two scans of the column.

But given how the expressions are written, where the plugin writer controls the iteration over chunked arrays, this doesn't seem possible.

Did I just make up polars features in my head that don't exist? If polars does have those features, how do we access them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions