pyarrow.parquet.SortingColumn#

class pyarrow.parquet.SortingColumn(int column_index, bool descending=False, bool nulls_first=False)#

Bases: object

Sorting specification for a single column.

Returned by RowGroupMetaData.sorting_columns() and used in ParquetWriter to specify the sort order of the data.

Parameters:
column_indexint

Index of column that data is sorted by.

descendingbool, default False

Whether column is sorted in descending order.

nulls_firstbool, default False

Whether null values appear before valid values.

Notes

Column indices are zero-based, refer only to leaf fields, and are in depth-first order. This may make the column indices for nested schemas different from what you expect. In most cases, it will be easier to specify the sort order using column names instead of column indices and converting using the from_ordering method.

Examples

In other APIs, sort order is specified by names, such as:

>>> sort_order = [('id', 'ascending'), ('timestamp', 'descending')]

For Parquet, the column index must be used instead:

>>> import pyarrow.parquet as pq
>>> [pq.SortingColumn(0), pq.SortingColumn(1, descending=True)]
[SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False)]

Convert the sort_order into the list of sorting columns with from_ordering (note that the schema must be provided as well):

>>> import pyarrow as pa
>>> schema = pa.schema([('id', pa.int64()), ('timestamp', pa.timestamp('ms'))])
>>> sorting_columns = pq.SortingColumn.from_ordering(schema, sort_order)
>>> sorting_columns
(SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False))

Convert back to the sort order with to_ordering:

>>> pq.SortingColumn.to_ordering(schema, sorting_columns)
((('id', 'ascending'), ('timestamp', 'descending')), 'at_end')
__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

from_ordering(cls, Schema schema, sort_keys)

Create a tuple of SortingColumn objects from the same arguments as pyarrow.compute.SortOptions.

to_dict(self)

Get dictionary representation of the SortingColumn.

to_ordering(Schema schema, sorting_columns)

Convert a tuple of SortingColumn objects to the same format as pyarrow.compute.SortOptions.

Attributes

column_index

"Index of column data is sorted by (int).

descending

Whether column is sorted in descending order (bool).

nulls_first

Whether null values appear before valid values (bool).

column_index#

“Index of column data is sorted by (int).

descending#

Whether column is sorted in descending order (bool).

classmethod from_ordering(cls, Schema schema, sort_keys, null_placement=u'at_end')#

Create a tuple of SortingColumn objects from the same arguments as pyarrow.compute.SortOptions.

Parameters:
schemaSchema

Schema of the input data.

sort_keysSequence of (name, order) tuples

Names of field/column keys (str) to sort the input on, along with the order each field/column is sorted in. Accepted values for order are “ascending”, “descending”.

null_placement{‘at_start’, ‘at_end’}, default ‘at_end’

Where null values should appear in the sort order.

Returns:
sorting_columnstuple of SortingColumn
nulls_first#

Whether null values appear before valid values (bool).

to_dict(self)#

Get dictionary representation of the SortingColumn.

Returns:
dict

Dictionary with a key for each attribute of this class.

static to_ordering(Schema schema, sorting_columns)#

Convert a tuple of SortingColumn objects to the same format as pyarrow.compute.SortOptions.

Parameters:
schemaSchema

Schema of the input data.

sorting_columnstuple of SortingColumn

Columns to sort the input on.

Returns:
sort_keystuple of (name, order) tuples
null_placement{‘at_start’, ‘at_end’}