pyarrow.parquet.write_metadata#

pyarrow.parquet.write_metadata(schema, where, metadata_collector=None, filesystem=None, **kwargs)[source]#

Write metadata-only Parquet file from schema. This can be used with write_to_dataset to generate _common_metadata and _metadata sidecar files.

Parameters:
schemapyarrow.Schema
wherestr or pyarrow.NativeFile
metadata_collectorlist

where to collect metadata information.

filesystemFileSystem, default None

If nothing passed, will be inferred from where if path-like, else where is already a file-like object so no filesystem is needed.

**kwargsdict,

Additional kwargs for ParquetWriter class. See docstring for ParquetWriter for more information.

Examples

Generate example data:

>>> import pyarrow as pa
>>> table = pa.table({'n_legs': [2, 2, 4, 4, 5, 100],
...                   'animal': ["Flamingo", "Parrot", "Dog", "Horse",
...                              "Brittle stars", "Centipede"]})

Write a dataset and collect metadata information.

>>> metadata_collector = []
>>> import pyarrow.parquet as pq
>>> pq.write_to_dataset(
...     table, 'dataset_metadata',
...      metadata_collector=metadata_collector)

Write the _common_metadata parquet file without row groups statistics.

>>> pq.write_metadata(
...     table.schema, 'dataset_metadata/_common_metadata')

Write the _metadata parquet file with row groups statistics.

>>> pq.write_metadata(
...     table.schema, 'dataset_metadata/_metadata',
...     metadata_collector=metadata_collector)