Json

class bach.SeriesJson

(engine, base_node, index, name, expression, group_by, instance_dtype, order_by=None, **kwargs)

[source]

A Series that represents the JSON type and its specific operations.

Database support and types

  • On Postgres this utilizes the native ‘jsonb’ database type.
  • On Athena this utilizes the native ‘json’ database type.
  • On BigQuery this utilizes the generic ‘STRING’ database type.
note

On Postgres, SeriesJson does not use the ‘json’ database type, but the ‘jsonb’ type, As the ‘json’ type has limited functionality.

This class is the standard and recommended type to use for handling json like data. Having said that, there is a special SeriesJsonPostgres type that uses the ‘json’ data type on Postgres, but internally that casts all data to ‘jsonb’ too.

Getting data

It is possible to get a selection of data from the json in the json type column. For selecting data from json, arrays and objects are supported. The data can be selected using .json[] on the json column

Selecting data from an array is based on position. It works similar to slicing through python lists.

note

Slicing is only possible if all values in the column are lists or None.

Selecting from objects is possible by key.

Examples:

In:
pdf
Out:
                                        jsonb_column
0 ["a","b","c"]
1 ["d","e","f","g"]
2 [{"h":"i","j":"k"},{"l":["m","n","o"]},{"p":"q"}]
In:
df = DataFrame.from_pandas(engine, pdf)
df['jsonb_column'] = df.jsonb_column.astype('json')
# load some json strings and convert them to jsonb type
# slice and show with .head()
df.jsonb_column.json[:2].head()
Out:
_index_0
0 [a, b]
1 [d, e]
2 [{'h': 'i', 'j': 'k'}, {'l': ['m', 'n', 'o']}]
Name: jsonb_column, dtype: object
In:
df.jsonb_column.json[1].head()
Out:
_index_0
0 b
1 e
2 {'l': ['m', 'n', 'o']}
Name: jsonb_column, dtype: object
In:
# selecting from objects is done by entering a key:
df.jsonb_column.json[1].json['l'].head()
Out:
_index_0
0 None
1 None
2 [m, n, o]
Name: jsonb_column, dtype: object

A last case is selecting based on the objects in an array. With this method, a dict is passed in the .json[] selector. The value of the first match with the dict to the objects in a json array is returned for the .json[] selector. A match is when all key/value pairs of the dict are found in an object. This can be used for selecting a subset of a json array with objects.

In:
# selecting from arrays by searching objects in the array.
df.jsonb_column.json[:{"j":"k"}].head()
Out:
_index_0
0 []
1 []
2 [{'h': 'i', 'j': 'k'}]
Name: jsonb_column, dtype: object
In:
# or:
df.jsonb_column.json[{"l":["m","n","o"]}:].head()
Out:
_index_0
0 []
1 []
2 [{'l': ['m', 'n', 'o']}, {'p': 'q'}]
Name: jsonb_column, dtype: object
SeriesJson.jsonGet access to json operations via the class that's returned through this accessor.