-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16911: [C++] Add Equals method to Partitioning #13567
Conversation
|
cpp/src/arrow/dataset/partition.cc
Outdated
if (this == &other) { | ||
return true; | ||
} | ||
return checked_cast<const DefaultPartitioning&>(other).type_name() == type_name() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do the checked cast here, it'll be wrong if the other partitioning isn't the same type. You don't use any methods from the derived type anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lidavidm that a checked_cast should be avoided. If you received a pointer type Partitioning*
you could do a dynamic_cast
instead of the type_name
comparison. However, a failed dynamic_cast
is an exception when you have a reference type and it's probably better to just compare type_name
.
The comparison to type_name
is basically doing the same thing as a cast check anyways.
The reason a checked_cast
is a bad idea is that it should be ok for a user to do:
DefaultPartitioning one;
KeyValuePartitioning two(schema);
if (one.Equals(two)) {
std::cout << "equal" << std::endl;
}
This Equals
method should always return false so this toy example is somewhat meaningless. However, it could be useful if the actual partitions were based on file metadata or something dynamically calculated. If there is a checked_cast
then, instead of false, we will get an abort in debug mode and potentially a segmentation fault in release mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes your absolutely right, sorry I missed this. Thanks for pointing this out.
cpp/src/arrow/dataset/partition.cc
Outdated
if (this == &other) { | ||
return true; | ||
} | ||
const auto& direct_part = | ||
::arrow::internal::checked_cast<const DirectoryPartitioning&>(other); | ||
return type_name() == direct_part.type_name() && KeyValuePartitioning::Equals(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (this == &other) { | |
return true; | |
} | |
const auto& direct_part = | |
::arrow::internal::checked_cast<const DirectoryPartitioning&>(other); | |
return type_name() == direct_part.type_name() && KeyValuePartitioning::Equals(other); | |
return type_name() == direct_part.type_name() && KeyValuePartitioning::Equals(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and ditto below (since KeyValuePartitioning::Equals does the this == &other
check already
cpp/src/arrow/dataset/partition.cc
Outdated
if (this == &other) { | ||
return true; | ||
} | ||
return checked_cast<const DefaultPartitioning&>(other).type_name() == type_name() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lidavidm that a checked_cast should be avoided. If you received a pointer type Partitioning*
you could do a dynamic_cast
instead of the type_name
comparison. However, a failed dynamic_cast
is an exception when you have a reference type and it's probably better to just compare type_name
.
The comparison to type_name
is basically doing the same thing as a cast check anyways.
The reason a checked_cast
is a bad idea is that it should be ok for a user to do:
DefaultPartitioning one;
KeyValuePartitioning two(schema);
if (one.Equals(two)) {
std::cout << "equal" << std::endl;
}
This Equals
method should always return false so this toy example is somewhat meaningless. However, it could be useful if the actual partitions were based on file metadata or something dynamically calculated. If there is a checked_cast
then, instead of false, we will get an abort in debug mode and potentially a segmentation fault in release mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @vibhatha, here are a couple more comments.
LGTM after Antoine's comments are addressed. |
@vibhatha This CI failure looks unrelated, can you take a look? https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2042 |
@pitrou seems like related /arrow/cpp/src/arrow/dataset/partition_test.cc
[2024](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2025)
In file included from /usr/include/x86_64-linux-gnu/c++/6/bits/c++allocator.h:33:0,
[2025](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2026)
from /usr/include/c++/6/bits/allocator.h:46,
[2026](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2027)
from /usr/include/c++/6/string:41,
[2027](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2028)
from /usr/include/c++/6/stdexcept:39,
[2028](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2029)
from /usr/include/c++/6/array:39,
[2029](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2030)
from /usr/include/c++/6/tuple:39,
[2030](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2031)
from /usr/include/c++/6/functional:55,
[2031](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2032)
from /arrow/cpp/src/arrow/dataset/partition.h:22,
[2032](https://github.com/apache/arrow/runs/7351561711?check_suite_focus=true#step:6:2033)
from /arrow/cpp/src/arrow/dataset/partition_test.cc:18: |
I will look into this. |
Yes, I meant related, sorry :-) |
Antoine is fixing the Protobuf issue in #13634 |
@pitrou Thanks for catching the wrong |
Benchmark runs are scheduled for baseline = 0fda96c and contender = d07dc75. d07dc75 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@pitrou Is this something to worry about? I cannot trace the failure? |
Adding
Equals
method toPartitioning
class and extended classes. Also include a few test cases.