Dataset#
Factory functions#
|
Open a dataset. |
|
Create a FileSystemDataset from a _metadata file created via pyarrow.parquet.write_metadata. |
|
Specify a partitioning scheme. |
|
Reference a column of the dataset. |
|
Expression representing a scalar value. |
|
Write a dataset to a given format and partitioning. |
Classes#
|
FileFormat for CSV files. |
Scan-specific options for CSV fragments. |
|
|
FileFormat for JSON files. |
|
FileFormat for Parquet |
|
Parquet format specific options for reading. |
|
Scan-specific options for Parquet fragments. |
A Fragment representing a parquet file. |
|
|
A Partitioning based on a specified Schema. |
|
A Partitioning for "/$key=$value/" nested directories as found in Apache Hive. |
|
A Partitioning based on a specified Schema. |
|
Collection of data fragments and potentially child datasets. |
|
A Dataset of file fragments. |
|
Influences the discovery of filesystem paths. |
Create a DatasetFactory from a list of paths with schema inspection. |
|
|
A Dataset wrapping child datasets. |
|
Fragment of data from a Dataset. |
Scan options specific to a particular fragment and scan operation. |
|
|
A combination of a record batch and the fragment it came from. |
|
A materialized scan operation with context and options bound. |
A logical expression to be evaluated against some input. |
|
|
A Dataset wrapping in-memory data. |
|
Metadata information about files written as part of a dataset write operation |
Helper functions#
|
Extract partition keys (equality constraints between a field and a scalar) from an expression as a dict mapping the field's name to its value. |