-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Fix Index.equals between object and string #61541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
), | ||
], | ||
) | ||
def test_index_equals_different_string_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead use the fixture string_dtype_no_object
throughout these tests.
@@ -5481,11 +5481,7 @@ def equals(self, other: Any) -> bool: | |||
# quickly return if the lengths are different | |||
return False | |||
|
|||
if ( | |||
isinstance(self.dtype, StringDtype) | |||
and self.dtype.na_value is np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition was added in #56106, I think the na_value
part was added just to be conservative.
s_str = Series([4, 5, 6], index=idx.astype(dtype)) | ||
|
||
expected = Series([True, True, True], index=["a", "b", "c"]) | ||
result = s_obj < s_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check s_str > s_obj
.
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.Description of the code change on
Index.equals
On the main branch,
Index.equals
castsself
toobject
only whenself.dtype.na_value
isnp.nan
. The comparison actually succeeds whenself.dtype.na_value
isnp.nan
as below.However, since doc stated that
dtype
is not compared,self
should be casted regardless ofself.dtype.na_value
so thatself
could be compared with other dtypes as desired.Description of the code change on
test_mixed_col_index_dtype
using_infer_string
has been removed since I think thatresult
should bestring
regardless ofusing_infer_string
. This is becaus of the code change made onIndex.equals
- sinceIndex.equals
considerdf1.columns
is equal todf2.colums
,Index.intersection
returnsself
(which isstring
). You could see the result becomesobject
(which is the dtype ofdf2
) in case ofresult = df2 + df1
. On the main branch, on the other hand,Index.intersection
returnsobject
becauseIndex.equals
returnsFalse
, and then bothself
andother
are cast toobject
by_find_common_type_compat
. (seeL3287
at pandas/core/indexes/base.py)pandas/pandas/core/indexes/base.py
Lines 3286 to 3290 in 25e6462