pyarrow.ipc.RecordBatchStreamReader#
- class pyarrow.ipc.RecordBatchStreamReader(source, *, options=None, memory_pool=None)[source]#
Bases:
_RecordBatchStreamReader
Reader for the Arrow streaming binary format.
- Parameters:
- sourcebytes/buffer-like,
pyarrow.NativeFile
, or file-like Python object Either an in-memory buffer, or a readable file object. If you want to use memory map use MemoryMappedFile as source.
- options
pyarrow.ipc.IpcReadOptions
Options for IPC deserialization. If None, default values will be used.
- memory_pool
MemoryPool
, defaultNone
If None, default memory pool is used.
- sourcebytes/buffer-like,
Methods
__init__
(source, *[, options, memory_pool])cast
(self, target_schema)Wrap this reader with one that casts each batch lazily as it is pulled.
close
(self)Release any resources associated with the reader.
from_batches
(Schema schema, batches)Create RecordBatchReader from an iterable of batches.
from_stream
(data[, schema])Create RecordBatchReader from a Arrow-compatible stream object.
Iterate over record batches from the stream along with their custom metadata.
read_all
(self)Read all record batches as a pyarrow.Table.
read_next_batch
(self)Read next RecordBatch from the stream.
Read next RecordBatch from the stream along with its custom metadata.
read_pandas
(self, **options)Read contents of stream to a pandas.DataFrame.
Attributes
- cast(self, target_schema)#
Wrap this reader with one that casts each batch lazily as it is pulled. Currently only a safe cast to target_schema is implemented.
- Parameters:
- target_schema
Schema
Schema to cast to, the names and order of fields must match.
- target_schema
- Returns:
- RecordBatchReader
- close(self)#
Release any resources associated with the reader.
- static from_batches(Schema schema, batches)#
Create RecordBatchReader from an iterable of batches.
- Parameters:
- schema
Schema
The shared schema of the record batches
- batches
Iterable
[RecordBatch
] The batches that this reader will return.
- schema
- Returns:
- readerRecordBatchReader
- static from_stream(data, schema=None)#
Create RecordBatchReader from a Arrow-compatible stream object.
This accepts objects implementing the Arrow PyCapsule Protocol for streams, i.e. objects that have a
__arrow_c_stream__
method.
- iter_batches_with_custom_metadata(self)#
Iterate over record batches from the stream along with their custom metadata.
- Yields:
RecordBatchWithMetadata
- read_next_batch(self)#
Read next RecordBatch from the stream.
- Returns:
- Raises:
- StopIteration:
At end of stream.
- read_next_batch_with_custom_metadata(self)#
Read next RecordBatch from the stream along with its custom metadata.
- Returns:
- batch
RecordBatch
- custom_metadata
KeyValueMetadata
- batch
- Raises:
- StopIteration:
At end of stream.
- read_pandas(self, **options)#
Read contents of stream to a pandas.DataFrame.
Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.
- Parameters:
- **options
Arguments to forward to
Table.to_pandas()
.
- Returns:
- stats#
Current IPC read statistics.