modelhub.ModelHub.get_objectiv_dataframe

get_objectiv_dataframe

(\*, db_url=None, table_name=None, start_date=None, end_date=None, bq_credentials_path=None, bq_credentials=None, with_sessionized_data=True, session_gap_seconds=1800, identity_resolution=None, anonymize_unidentified_users=True)

[source]

Sets data from sql table into an bach.DataFrame object.

The created DataFrame points to where the data is stored in the sql database, makes several transformations and sets the right data types for all columns. As such, the models from the model hub can be applied to a DataFrame created with this method.

Parameters

  • db_url (Optional[str]) – the url that indicate database dialect and connection arguments. If not given, env DSN is used to create one. If that’s not there, the default of ‘postgresql://objectiv:@localhost:5432/objectiv’ will be used.
  • table_name (Optional[str]) – the name of the sql table where the data is stored. Will default to ‘events’ for bigquery and ‘data’ for other engines.
  • start_date (Optional[str]) – first date for which data is loaded to the DataFrame. If None, data is loaded from the first date in the sql table. Format as ‘YYYY-MM-DD’.
  • end_date (Optional[str]) – last date for which data is loaded to the DataFrame. If None, data is loaded up to and including the last date in the sql table. Format as ‘YYYY-MM-DD’.
  • bq_credentials_path (Optional[str]) – path for BigQuery credentials. If db_url is for BigQuery engine, this parameter or bq_credentials is required. When both are given, bq_credentials wins.
  • bq_credentials (Optional[str]) – The json from the credentials file. If db_url is for BigQuery engine, this parameter or bq_credentials_path is required. When both are given, bq_credentials wins.
  • with_sessionized_data (bool) – Indicates if DataFrame must include session_id and session_hit_number calculated series.
  • session_gap_seconds (int) – Amount of seconds to be use for identifying if events were triggered or not during the same session.
  • identity_resolution (Optional[str]) – Identity id to be used for identifying users based on IdentityContext. If no value is provided, then the user_id series will contain the value from the cookie_id column (a UUID).
  • anonymize_unidentified_users (bool) – Indicates if unidentified users are required to be anonymized by setting user_id value to NULL. Otherwise, original UUID value from the cookie will remain.

Returns

bach.DataFrame with Objectiv data.

note

If with_sessionized_data is True, Objectiv data will include session_id (int64) and session_hit_number (int64) series.