Skip to content

DOC: move relevant whatsnew changes from 2.3.0 to 2.3.1 file #61698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 0 additions & 35 deletions doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,39 +31,6 @@ Other enhancements
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for :class:`StringDtype` columns (:issue:`60633`)
- The :meth:`~Series.sum` reduction is now implemented for :class:`StringDtype` columns (:issue:`59853`)

.. ---------------------------------------------------------------------------
.. _whatsnew_230.notable_bug_fixes:

Notable bug fixes
~~~~~~~~~~~~~~~~~

These are bug fixes that might have notable behavior changes.

.. _whatsnew_230.notable_bug_fixes.string_comparisons:

Comparisons between different string dtypes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy

object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)

in determining the result dtype when there are different string dtypes compared. Some examples:

- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.

.. _whatsnew_230.api_changes:

API changes
~~~~~~~~~~~

- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
Index (:issue:`60797`)

.. ---------------------------------------------------------------------------
.. _whatsnew_230.deprecations:

Expand All @@ -85,8 +52,6 @@ Numeric

Strings
^^^^^^^
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
- Bug in :meth:`Series.__pos__` and :meth:`DataFrame.__pos__` where an ``Exception`` was not raised for :class:`StringDtype` with ``storage="pyarrow"`` (:issue:`60710`)
- Bug in :meth:`Series.rank` for :class:`StringDtype` with ``storage="pyarrow"`` that incorrectly returned integer results with ``method="average"`` and raised an error if it would truncate results (:issue:`59768`)
- Bug in :meth:`Series.replace` with :class:`StringDtype` when replacing with a non-string value was not upcasting to ``object`` dtype (:issue:`60282`)
Expand Down
56 changes: 51 additions & 5 deletions doc/source/whatsnew/v2.3.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,57 @@ including other versions of pandas.
{{ header }}

.. ---------------------------------------------------------------------------
.. _whatsnew_231.enhancements:
.. _whatsnew_231.string_fixes:

Improvements and fixes for the StringDtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_231.string_fixes.string_comparisons:

Comparisons between different string dtypes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy

object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)

in determining the result dtype when there are different string dtypes compared. Some examples:

- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.

.. _whatsnew_231.string_fixes.ignore_empty:

Index set operations ignore empty RangeIndex and object dtype Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
Index (:issue:`60797`).

This ensures that combining such empty Index with strings will infer the string dtype
correctly, rather than defaulting to ``object`` dtype. For example:

.. code-block:: python

>>> pd.options.mode.infer_string = True
>>> df = pd.DataFrame()
>>> df.columns.dtype
dtype('int64') # default RangeIndex for empty columns
>>> df["a"] = [1, 2, 3]
>>> df.columns.dtype
<StringDtype(na_value=nan)> # new columns use string dtype instead of object dtype

.. _whatsnew_231.string_fixes.bugs:

Bug fixes
^^^^^^^^^
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)

Enhancements
~~~~~~~~~~~~
-

.. _whatsnew_231.regressions:

Expand All @@ -26,7 +72,7 @@ Fixed regressions

Bug fixes
~~~~~~~~~
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_231.other:
Expand Down