bach.DataFrame.from_pandas
classmethod from_pandas
(engine, df, convert_objects=True, name='loaded_data', materialization='cte', *, if_exists='fail')Instantiate a new DataFrame based on the content of a Pandas DataFrame.
The index of the Pandas DataFrame is set to the index of the DataFrame. Only single level index is supported. Supported dtypes are ‘int32’, ‘int64’, ‘float64’, ‘string’, ‘datetime64[ns]’, ‘bool’. Other columns are converted to supported data types if possible, as long as convert_objects is True.
How the data is loaded depends on the chosen materialization:
- ‘table’: This will first write the data to a database table using pandas
pandas.DataFrame.to_sql()
method. - ‘cte’: The data will be represented using a common table expression of the form
select * from values
in future queries. The ‘table’ method requires database write access. The ‘cte’ method is side-effect free and doesn’t interact with the database at all. However the ‘cte’ method is only suitable for small quantities of data. For anything over a dozen kilobytes of data it is recommended to store the data in a table in the database first (e.g. by specifying ‘table’).
There are some small differences between how the different materializations handle NaN values. e.g. ‘cte’ does not support those for non-numeric columns, whereas ‘table’ converts them to ‘NULL’.
Parameters
engine
(sqlalchemy.engine.base.Engine) – an sqlalchemy engine for the database.df
(pandas.core.frame.DataFrame) – Pandas DataFrame to instantiate as DataFrame.convert_objects
(bool) – If True (default), columns of type ‘object’ are converted to ‘string’ using the pd.convert_dtypes() method where possible. Additionally, if materialization = ‘cte’, we’ll try to convert types for which there is a Bach Series type.name
(str) – * For ‘table’ materialization: name of the table that Pandas will write the data to.- For ‘cte’ materialization: name of the node in the underlying SqlModel graph.
materialization
(str) – {‘cte’, ‘table’}. How to materialize the data.if_exists
(str) – {‘fail’, ‘replace’, ‘append’}. Only applies tomaterialization=’table’
How to behave if the table already exists:- fail: Raise a ValueError.
- replace: Drop the table before inserting new values.
- append: Insert new values to the existing table.
Returns
A DataFrame based on a pandas DataFrame
Return type
This method is only suited for small quantities of data.