Installing PyArrow#
System Compatibility#
PyArrow is regularly built and tested on Windows, macOS and various Linux distributions. We strongly recommend using a 64-bit system.
Python Compatibility#
PyArrow is currently compatible with Python 3.9, 3.10, 3.11, 3.12 and 3.13.
Using Conda#
Install the latest version of PyArrow from conda-forge using Conda:
conda install -c conda-forge pyarrow
Note
While the pyarrow
conda-forge package is
the right choice for most users, both a minimal and maximal variant of the
package exist, either of which may be better for your use case. See
Differences between conda-forge packages.
Using Pip#
Install the latest version from PyPI (Windows, Linux, and macOS):
pip install pyarrow
If you encounter any importing issues of the pip wheels on Windows, you may need to install the Visual C++ Redistributable for Visual Studio 2015.
Warning
On Linux, you will need pip >= 19.0 to detect the prebuilt binary packages.
Installing nightly packages or from source#
See Python Development.
Dependencies#
Optional dependencies
NumPy 1.16.6 or higher.
pandas 1.0 or higher,
cffi.
Additional packages PyArrow is compatible with are fsspec and pytz, dateutil or tzdata package for timezones.
tzdata on Windows#
While Arrow uses the OS-provided timezone database on Linux and macOS, it requires a
user-provided database on Windows. To download and extract the text version of
the IANA timezone database follow the instructions in the C++
Runtime Dependencies or use pyarrow utility function
pyarrow.util.download_tzdata_on_windows()
that does the same.
By default, the timezone database will be detected at %USERPROFILE%\Downloads\tzdata
.
If the database has been downloaded in a different location, you will need to set
a custom path to the database from Python:
>>> import pyarrow as pa
>>> pa.set_timezone_db_path("custom_path")
Differences between conda-forge packages#
On conda-forge, PyArrow is published as three separate packages, each providing varying levels of functionality. This is in contrast to PyPi, where only a single PyArrow package is provided.
The purpose of this split is to minimize the size of the installed package for
most users (pyarrow
), provide a smaller, minimal package for specialized use
cases (pyarrow-core
), while still providing a complete package for users who
require it (pyarrow-all
). What was historically pyarrow
on
conda-forge is now pyarrow-all
, though most
users can continue using pyarrow
.
The pyarrow-core
package includes the following functionality:
Compute Functions (i.e.,
pyarrow.compute
)Streaming, Serialization, and IPC (i.e.,
pyarrow.ipc
)Filesystem Interface (i.e.,
pyarrow.fs
. Note: It’s planned to move cloud fileystems (i.e., S3, GCS, etc) intopyarrow
in a future release though Local FS will remain inpyarrow-core
.)File formats: Arrow/Feather, JSON, CSV, ORC (but not Parquet)
The pyarrow
package adds the following:
Acero (i.e.,
pyarrow.acero
)Tabular Datasets (i.e.,
pyarrow.dataset
)Parquet (i.e.,
pyarrow.parquet
)Substrait (i.e.,
pyarrow.substrait
)
Finally, pyarrow-all
adds:
Arrow Flight RPC and Flight SQL (i.e.,
pyarrow.flight
)Gandiva (i.e.,
pyarrow.gandiva
)
The following table lists the functionality provided by each package and may be useful when deciding to use one package over another or when Creating A Custom Selection.
Component |
Package |
pyarrow-core |
pyarrow |
pyarrow-all |
Core |
pyarrow-core |
✓ |
✓ |
✓ |
Parquet |
libparquet |
✓ |
✓ |
|
Dataset |
libarrow-dataset |
✓ |
✓ |
|
Acero |
libarrow-acero |
✓ |
✓ |
|
Substrait |
libarrow-substrait |
✓ |
✓ |
|
Flight |
libarrow-flight |
✓ |
||
Flight SQL |
libarrow-flight-sql |
✓ |
||
Gandiva |
libarrow-gandiva |
✓ |
Creating A Custom Selection#
If you know which components you need and want to control what’s installed, you
can create a custom selection of packages to include only the extra features you
need. For example, to install pyarrow-core
and add support for reading and
writing Parquet, install libparquet
alongside pyarrow-core
:
conda install -c conda-forge pyarrow-core libparquet
Or if you wish to use pyarrow
but need support for Flight RPC:
conda install -c conda-forge pyarrow libarrow-flight