Skip to content

BUG: Broken bool supports in pandas' quantile by NumPy's percentile behaviour change #41792

Open
@HyukjinKwon

Description

@HyukjinKwon
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
pd.DataFrame({"i": [0, 1, 2], "b": [False, False, True], "s": ["x", "y", "z"]}).quantile(q=0.5, numeric_only=True)

Problem description

numpy/numpy#16273 (comment) broke the case of NumPy's percentile with bools which causes to break pandas pandas' quantile too. If this is considered as not a bug, I would expect pandas to handle it with numeric_only. Cross filed issue in NumPy: numpy/numpy#19154

Currently it throws an exception as below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../python3.8/site-packages/pandas/core/frame.py", line 9266, in quantile
    result = data._mgr.quantile(
  File "/.../python3.8/site-packages/pandas/core/internals/managers.py", line 491, in quantile
    block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/.../python3.8/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
    result = nanpercentile(
  File "/.../python3.8/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
    return np.percentile(values, q, axis=axis, interpolation=interpolation)
  File "<__array_function__ internals>", line 5, in percentile
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3818, in percentile
    return _quantile_unchecked(
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
    r = func(a, **kwargs)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
    r = _lerp(x_below, x_above, weights_above, out=out)
  File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
    diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Expected Output

i    1.0
b    0.0
Name: 0.5, dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 2cb96529396d93b46abab7bbc73a208e708c642e
python           : 3.8.8.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.4.0
Version          : Darwin Kernel Version 20.4.0: Thu Apr 22 21:46:47 PDT 2021; root:xnu-7195.101.2~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.4
numpy            : 1.20.3
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 21.0.1
setuptools       : 52.0.0.post20210125
Cython           : None
pytest           : 6.2.4
hypothesis       : None
sphinx           : 3.0.4
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.3
IPython          : 7.23.1
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : None
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : 4.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.6.3
sqlalchemy       : 1.4.14
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions