(data, time_period='monthly', event_type=None, start_date=None, end_date=None, percentage=False, display=True)


Finds the number of users in a given cohort who are active at a given time period, where time is computed with respect to the beginning of each cohort. The “active user” is the user who made an action that we are interested in that time period. Users are divided into mutually exclusive cohorts, which are then tracked over time. In our case users are assigned to a cohort based on when they made their first action that we are interested in.

Returns the retention matrix dataframe, it represents users retained across cohorts:

  • index value represents the cohort
  • columns represent the number of given date period since the current cohort
  • values represent number (or percentage) of unique active users of a given cohort

One can calculate the retention matrix for a given time range, for that one can specify start_date a/o end_date. N.B. the users’ activity starts to be traced from the first date the user is seen in the data.


  • data (bach.dataframe.DataFrame) – bach.DataFrame to apply the method on.
  • time_period (str) – can be ‘daily’, ‘weekly’, ‘monthly’ or ‘yearly’.
  • event_type (str) – the event/action that we are interested in. Must be a valid event_type (either parent or child). if None we take all the events generated by the user.
  • start_date (str) – start date of the retention matrix, e.g. ‘2022-04-01’ if None take all the data.
  • end_date (str) – end date of the retention matrix, e.g. ‘2022-05-01’ if None take all the data.
  • percentage – if True calculate percentage with respect to the number of a users in the cohort, otherwise it leaves the absolute values.
  • display – if display==True visualize the retention matrix as a heat map


retention matrix bach DataFrame.

Return type