bach.DataFrame.drop_duplicates
drop_duplicatesβ
(subset=None, keep='first', ignore_index=False, sort_by=None, ascending=True)β[source]
Return a dataframe with duplicated rows removed based on all series labels or a subset of labels.
Parametersβ
subset
(Optional[Union[str, Sequence[str]]]) β series label or sequence of labels. Duplications to be dropped are based on the combination of the subset of series. If not provided, all series labels will be used by default.keep
(Union[str, bool]) β Supported values: βfirstβ, βlastβ and False. Determines which duplicates to keep:first
: drop all occurrences except the first onelast
: drop all occurrences except the last one- False: drops all duplicates
If no value is provided, first occurrences will be kept by default.
ignore_index
(bool) β if true, drops indexes of the resultsort_by
(Optional[Union[str, Sequence[str]]]) β series label or sequence of labels used to sort values. Sorting of values is needed since result might be non-deterministic when keep == βfirstβ or keep == βlastβ. If not provided:
- If dataframe has already an order_by, first and last values will be performed based on it
- Else all series not considered in duplication will be used instead.
ascending
(Union[bool, List[bool]]) β Whether to sort ascending (True) or descending (False). If this is a list, then theby
must also be a list andlen(ascending) == len(by)
.
Returnsβ
a new dataframe with dropped duplicates.