bach.DataFrame.materialize
materialize
(node_name='manual_materialize', inplace=False, limit=None, distinct=False, materialization=Materialization.CTE)Create a copy of this DataFrame with as base_node the current DataFrame’s state.
This effectively adds a node to the underlying SqlModel graph. Generally adding nodes increases
the size of the generated SQL query. But this can be useful if the current DataFrame contains
expressions that you want to evaluate before further expressions are build on top of them. This might
make sense for very large expressions, or for non-deterministic expressions (e.g. see
SeriesUuid.random()
). Additionally, materializing as a temporary table can
improve performance in some instances.
Note this function does NOT query the database or materializes any data in the database. It merely
changes the underlying SqlModel graph, which gets executed by data transfer functions (e.g.
to_pandas()
)
TODO: a known problem is that DataFrames with ‘json_postgres’ columns cannot be fully materialized.
Parameters
node_name
– The name of the node that’s going to be createdinplace
– Perform operation on self ifinplace=True
, or create a copy.limit
(Optional[Any]) – The limit (slice, int) to apply.distinct
(bool) – Apply distinct statement ifdistinct=True
materialization
(Union[sql_models.model.Materialization, str]) – Set the materialization of the SqlModel in the graph. Only Materialization.CTE / ‘cte’ and Materialization.TEMP_TABLE / ‘temp_table’ are supported.
Returns
DataFrame with the current DataFrame’s state as base_node
Return type
Calling materialize() resets the order of the dataframe. Call sort_values()
again on
the result if order is important.