Data Types and Schemas#

Factory Functions#

These should be used to create Arrow data types and schemas.

null()

Create instance of null type.

bool_()

Create instance of boolean type.

int8()

Create instance of signed int8 type.

int16()

Create instance of signed int16 type.

int32()

Create instance of signed int32 type.

int64()

Create instance of signed int64 type.

uint8()

Create instance of unsigned int8 type.

uint16()

Create instance of unsigned uint16 type.

uint32()

Create instance of unsigned uint32 type.

uint64()

Create instance of unsigned uint64 type.

float16()

Create half-precision floating point type.

float32()

Create single-precision floating point type.

float64()

Create double-precision floating point type.

time32(unit)

Create instance of 32-bit time (time of day) type with unit resolution.

time64(unit)

Create instance of 64-bit time (time of day) type with unit resolution.

timestamp(unit[, tz])

Create instance of timestamp type with resolution and optional time zone.

date32()

Create instance of 32-bit date (days since UNIX epoch 1970-01-01).

date64()

Create instance of 64-bit date (milliseconds since UNIX epoch 1970-01-01).

duration(unit)

Create instance of a duration type with unit resolution.

month_day_nano_interval()

Create instance of an interval type representing months, days and nanoseconds between two dates.

binary(int length=-1)

Create variable-length or fixed size binary type.

string()

Create UTF8 variable-length string type.

utf8()

Alias for string().

large_binary()

Create large variable-length binary type.

large_string()

Create large UTF8 variable-length string type.

large_utf8()

Alias for large_string().

binary_view()

Create a variable-length binary view type.

string_view()

Create UTF8 variable-length string view type.

decimal128(int precision, int scale=0)

Create decimal type with precision and scale and 128-bit width.

decimal256(int precision, int scale=0)

Create decimal type with precision and scale and 256-bit width.

list_(value_type, int list_size=-1)

Create ListType instance from child data type or field.

large_list(value_type)

Create LargeListType instance from child data type or field.

list_view(value_type)

Create ListViewType instance from child data type or field.

large_list_view(value_type)

Create LargeListViewType instance from child data type or field.

map_(key_type, item_type[, keys_sorted])

Create MapType instance from key and item data types or fields.

struct(fields)

Create StructType instance from fields.

dictionary(index_type, value_type, ...)

Dictionary (categorical, or simply encoded) type.

run_end_encoded(run_end_type, value_type)

Create RunEndEncodedType from run-end and value types.

fixed_shape_tensor(DataType value_type, shape)

Create instance of fixed shape tensor extension type with shape and optional names of tensor dimensions and indices of the desired logical ordering of dimensions.

opaque(DataType storage_type, ...)

Create instance of opaque extension type.

field(name[, type, nullable, metadata])

Create a pyarrow.Field instance.

schema(fields[, metadata])

Construct pyarrow.Schema from collection of fields.

from_numpy_dtype(dtype)

Convert NumPy dtype to pyarrow.DataType.

Utility Functions#

unify_schemas(schemas, *[, promote_options])

Unify schemas by merging fields by name.

Type Classes#

Do not instantiate these classes directly. Instead, call one of the factory functions above.

DataType()

Base class of all Arrow data types.

DictionaryType

Concrete class for dictionary data types.

ListType

Concrete class for list data types.

LargeListType

Concrete class for large list data types (like ListType, but with 64-bit offsets).

MapType

Concrete class for map data types.

StructType

Concrete class for struct data types.

UnionType

Base class for union data types.

TimestampType

Concrete class for timestamp data types.

Time32Type

Concrete class for time32 data types.

Time64Type

Concrete class for time64 data types.

FixedSizeBinaryType

Concrete class for fixed-size binary data types.

Decimal128Type

Concrete class for decimal128 data types.

Decimal256Type

Concrete class for decimal256 data types.

Field()

A named field, with a data type, nullability, and optional metadata.

Schema()

A named collection of types a.k.a schema.

RunEndEncodedType

Concrete class for run-end encoded types.

Specific classes and functions for extension types.

ExtensionType(DataType storage_type, ...)

Concrete base class for Python-defined extension types.

PyExtensionType(DataType storage_type)

Concrete base class for Python-defined extension types based on pickle for (de)serialization.

register_extension_type(ext_type)

Register a Python extension type.

unregister_extension_type(type_name)

Unregister a Python extension type.

Canonical extension types implemented by PyArrow.

FixedShapeTensorType

Concrete class for fixed shape tensor extension type.

OpaqueType

Concrete class for opaque extension type.

Type Checking#

These functions are predicates to check whether a DataType instance represents a given data type (such as int32) or general category (such as “is a signed integer”).

is_boolean(t)

Return True if value is an instance of type: boolean.

is_integer(t)

Return True if value is an instance of type: any integer.

is_signed_integer(t)

Return True if value is an instance of type: signed integer.

is_unsigned_integer(t)

Return True if value is an instance of type: unsigned integer.

is_int8(t)

Return True if value is an instance of type: int8.

is_int16(t)

Return True if value is an instance of type: int16.

is_int32(t)

Return True if value is an instance of type: int32.

is_int64(t)

Return True if value is an instance of type: int64.

is_uint8(t)

Return True if value is an instance of type: uint8.

is_uint16(t)

Return True if value is an instance of type: uint16.

is_uint32(t)

Return True if value is an instance of type: uint32.

is_uint64(t)

Return True if value is an instance of type: uint64.

is_floating(t)

Return True if value is an instance of type: floating point numeric.

is_float16(t)

Return True if value is an instance of type: float16 (half-precision).

is_float32(t)

Return True if value is an instance of type: float32 (single precision).

is_float64(t)

Return True if value is an instance of type: float64 (double precision).

is_decimal(t)

Return True if value is an instance of type: decimal.

is_decimal128(t)

Return True if value is an instance of type: decimal128.

is_decimal256(t)

Return True if value is an instance of type: decimal256.

is_list(t)

Return True if value is an instance of type: list.

is_large_list(t)

Return True if value is an instance of type: large list.

is_fixed_size_list(t)

Return True if value is an instance of type: fixed size list.

is_list_view(t)

Return True if value is an instance of type: list view.

is_large_list_view(t)

Return True if value is an instance of type: large list view.

is_struct(t)

Return True if value is an instance of type: struct.

is_union(t)

Return True if value is an instance of type: union.

is_nested(t)

Return True if value is an instance of type: nested type.

is_run_end_encoded(t)

Return True if value is an instance of type: run-end encoded.

is_temporal(t)

Return True if value is an instance of type: date, time, timestamp or duration.

is_timestamp(t)

Return True if value is an instance of type: timestamp.

is_date(t)

Return True if value is an instance of type: date.

is_date32(t)

Return True if value is an instance of type: date32 (days).

is_date64(t)

Return True if value is an instance of type: date64 (milliseconds).

is_time(t)

Return True if value is an instance of type: time.

is_time32(t)

Return True if value is an instance of type: time32.

is_time64(t)

Return True if value is an instance of type: time64.

is_duration(t)

Return True if value is an instance of type: duration.

is_interval(t)

Return True if value is an instance of type: interval.

is_null(t)

Return True if value is an instance of type: null.

is_binary(t)

Return True if value is an instance of type: variable-length binary.

is_unicode(t)

Alias for is_string.

is_string(t)

Return True if value is an instance of type: string (utf8 unicode).

is_large_binary(t)

Return True if value is an instance of type: large variable-length binary.

is_large_unicode(t)

Alias for is_large_string.

is_large_string(t)

Return True if value is an instance of type: large string (utf8 unicode).

is_binary_view(t)

Return True if value is an instance of type: variable-length binary view.

is_string_view(t)

Return True if value is an instance of type: variable-length string (utf-8) view.

is_fixed_size_binary(t)

Return True if value is an instance of type: fixed size binary.

is_map(t)

Return True if value is an instance of type: map.

is_dictionary(t)

Return True if value is an instance of type: dictionary-encoded.

is_primitive(t)

Return True if value is an instance of type: primitive type.