Series

class bach.Series

(engine, base_node, index, name, expression, group_by, instance_dtype, order_by=None, **kwargs)

[source]

Series is an abstract class. An instance of Series represents a column of data. Specific subclasses are used to represent specific types of data and enable operations on that data.

It can be used as a separate object to just deal with a single list of values. There are many standard operations on Series available to do operations like add or subtract, to create aggregations like nunique() or count(), or to create new sub-Series, like unique().

Reference by function

Creation / re-framing

Series.to_frame()Create a DataFrame with the index and data from this Series.
Series.copy()Return a copy of this Series.

Value accessors

Series.head([n])Get the first n rows from this Series as a pandas.Series.
Series.to_pandas([limit])Get the data from this series as a pandas.Series :param limit: The limit to apply, either as a max amount of rows or a slice.
Series.array.array property accessor akin pandas.Series.array
Series.valueRetrieve the actual single value of this series.

Attributes and underlying data

Axes

Series.nameGet this Series' name
Series.indexGet this Series' index dictionary {name: Series}
Series.group_byGet this Series' group_by, if any.
Series.order_byGet the series expressions for sorting this Series.

Types

Series.dtypeThe dtype of this Series.
Series.astype(dtype)Convert this Series to another type.

Sql Model

Series.base_nodeGet this Series' base_node
Series.materialize([node_name, limit, ...])Create a copy of this Series with as base_node the current Series's state.
Series.view_sql()

Comparison and set operations

Series.all_values()For every row in this Series, do multiple evaluations where all sub-evaluations should be True
Series.any_value()For every row in this Series, do multiple evaluations where any sub-evaluation should be True
Series.exists()Boolean operation that returns True if there are one or more values in this Series
Series.isin(other)Evaluate for every row in this series whether the value is contained in other
Series.isnull()Evaluate for every row in this series whether the value is missing or NULL.
Series.notnull()Evaluate for every row in this series whether the value is not missing or NULL.

Conversion, reshaping, sorting

Series.reset_index([level, drop])Drops the current index.
Series.sort_index(*[, ascending])Sort this Series by its index.
Series.sort_values(*[, ascending])Sort this Series by its values.
Series.fillna(other)Fill any NULL value with the given constant or other compatible Series
Series.append(other[, ignore_index])Append rows of other series to the caller series.
Series.drop_duplicates([keep])Return a series with duplicated rows removed.
Series.dropna()Removes rows with missing values.
Series.unstack([level, fill_value, aggregation])Pivot a level of the index labels.

Function application, aggregation & windowing

Series.agg(func[, group_by])Apply one or more aggregation functions to this Series.
Series.aggregate(func[, group_by])Alias for agg().
Series.apply_func(func, *args, **kwargs)Apply the given functions to this Series.

Computations & descriptive stats

All types

Series.describe([percentiles, ...])Returns descriptive statistics, it will vary based on what is provided
Series.count([partition, skipna])Returns the amount of rows in each partition or for all values if none is given.
Series.min([partition, skipna])Returns the minimum value in each partition or for all values if none is given.
Series.max([partition, skipna])Returns the maximum value in each partition or for all values if none is given.
Series.median([partition, skipna])Returns the median in each partition or for all values if none is given.
Series.mode([partition, skipna])Returns the mode in each partition or for all values if none is given.
Series.nunique([partition, skipna])Returns the amount of unique values in each partition or for all values if none is given.
Series.value_counts([normalize, sort, ...])Returns a series containing counts per unique value

Window

Series.window_first_value([window])Returns value evaluated at the row that is the first row of the window frame.
Series.window_lag([offset, default, window])Returns value evaluated at the row that is offset rows before the current row within the window
Series.window_nth_value(n[, window])Returns value evaluated at the row that is the n'th row of the window frame.
Series.window_lead([offset, default, window])Returns value evaluated at the row that is offset rows after the current row within the window.
Series.window_last_value([window])Returns value evaluated at the row that is the last row of the window frame.
Series.window_row_number([window])Returns the number of the current row within its window, counting from 1.
Series.window_rank([window])Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group.
Series.window_dense_rank([window])Returns the rank of the current row, without gaps; this function effectively counts peer groups.
Series.window_percent_rank([window])Returns the relative rank of the current row, that is (rank - 1) / (total partition rows - 1).
Series.window_ntile([num_buckets, window])Returns an integer ranging from 1 to the argument value, dividing the partition as equally as possible.
Series.window_cume_dist([window])Returns the cumulative distribution, that is (number of partition rows preceding or peers with current row) / (total partition rows).