pyarrow.record_batch#
- pyarrow.record_batch(data, names=None, schema=None, metadata=None)#
Create a pyarrow.RecordBatch from another Python data structure or sequence of arrays.
- Parameters:
- data
dict
,list
,pandas.DataFrame
, Arrow-compatibletable
A mapping of strings to Arrays or Python lists, a list of Arrays, a pandas DataFame, or any tabular object implementing the Arrow PyCapsule Protocol (has an
__arrow_c_array__
or__arrow_c_device_array__
method).- names
list
, defaultNone
Column names if list of arrays passed as data. Mutually exclusive with ‘schema’ argument.
- schema
Schema
, defaultNone
The expected schema of the RecordBatch. If not passed, will be inferred from the data. Mutually exclusive with ‘names’ argument.
- metadata
dict
or Mapping, defaultNone
Optional metadata for the schema (if schema not passed).
- data
- Returns:
See also
Examples
>>> import pyarrow as pa >>> n_legs = pa.array([2, 2, 4, 4, 5, 100]) >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"]) >>> names = ["n_legs", "animals"]
Construct a RecordBatch from a python dictionary:
>>> pa.record_batch({"n_legs": n_legs, "animals": animals}) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"] >>> pa.record_batch({"n_legs": n_legs, "animals": animals}).to_pandas() n_legs animals 0 2 Flamingo 1 2 Parrot 2 4 Dog 3 4 Horse 4 5 Brittle stars 5 100 Centipede
Creating a RecordBatch from a list of arrays with names:
>>> pa.record_batch([n_legs, animals], names=names) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
Creating a RecordBatch from a list of arrays with names and metadata:
>>> my_metadata={"n_legs": "How many legs does an animal have?"} >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"] >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'How many legs does an animal have?'
Creating a RecordBatch from a pandas DataFrame:
>>> import pandas as pd >>> df = pd.DataFrame({'year': [2020, 2022, 2021, 2022], ... 'month': [3, 5, 7, 9], ... 'day': [1, 5, 9, 13], ... 'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) >>> pa.record_batch(df) pyarrow.RecordBatch year: int64 month: int64 day: int64 n_legs: int64 animals: string ---- year: [2020,2022,2021,2022] month: [3,5,7,9] day: [1,5,9,13] n_legs: [2,4,5,100] animals: ["Flamingo","Horse","Brittle stars","Centipede"]
>>> pa.record_batch(df).to_pandas() year month day n_legs animals 0 2020 3 1 2 Flamingo 1 2022 5 5 4 Horse 2 2021 7 9 5 Brittle stars 3 2022 9 13 100 Centipede
Creating a RecordBatch from a pandas DataFrame with schema:
>>> my_schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> pa.record_batch(df, my_schema).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' pandas: ... >>> pa.record_batch(df, my_schema).to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede