bach.DataFrame.materialize

materialize

(node_name='manual_materialize', inplace=False, limit=None, distinct=False, materialization=Materialization.CTE)

[source]

Create a copy of this DataFrame with as base_node the current DataFrame’s state.

This effectively adds a node to the underlying SqlModel graph. Generally adding nodes increases the size of the generated SQL query. But this can be useful if the current DataFrame contains expressions that you want to evaluate before further expressions are build on top of them. This might make sense for very large expressions, or for non-deterministic expressions (e.g. see SeriesUuid.random()). Additionally, materializing as a temporary table can improve performance in some instances.

Note this function does NOT query the database or materializes any data in the database. It merely changes the underlying SqlModel graph, which gets executed by data transfer functions (e.g. to_pandas())

TODO: a known problem is that DataFrames with ‘json_postgres’ columns cannot be fully materialized.

Parameters

  • node_name – The name of the node that’s going to be created
  • inplace – Perform operation on self if inplace=True, or create a copy.
  • limit (Optional[Any]) – The limit (slice, int) to apply.
  • distinct (bool) – Apply distinct statement if distinct=True
  • materialization (Union[sql_models.model.Materialization, str]) – Set the materialization of the SqlModel in the graph. Only Materialization.CTE / ‘cte’ and Materialization.TEMP_TABLE / ‘temp_table’ are supported.

Returns

DataFrame with the current DataFrame’s state as base_node

Return type

bach.dataframe.DataFrame

note

Calling materialize() resets the order of the dataframe. Call sort_values() again on the result if order is important.