Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
StatisticsV2: initial statistics framework redesign (#14699)
* StatisticsV2: initial definition and validation method implementation * Implement mean, median and standard deviation extraction for StatsV2 * Move stats_v2 to `physical-expr` package * Introduce `ExprStatisticGraph` and `ExprStatisticGraphNode` * Split the StatisticsV2 and statistics graph locations, prepare the infrastructure for stats top-down propagation and final bottom-up calculation * Calculate variance instead of std_dev * Create a skeleton for statistics bottom-up evaluation * Introduce high-level test for 'evaluate_statistics()' * Refactor result distribution computation during the statistics evaluation phase; add compute_range function * Always produce Unknown distribution in non-mentioned combination cases, todos for the future * Introduce Bernoulli distribution to be used as result of comparisons and inequations distribution combinations * Implement initial statistics propagation of Uniform and Unknown distributions with known ranges * Implement evaluate_statistics for logical not and unary negation operator * Fix and add tests; make fmt happy * Add integration test, implement conversion into Bernoulli distribution for Eq and NotEq * Finish test, small cleanup * minor improvements * Update stats.rs * Addressing review comments * Implement median colmputation for Gaussian-Gaussian pair * Update stats_v2.rs * minor improvements * Addressing second review comments, part 1 * Return true in other cases * Finish addressing review requrests, part 2 * final clean-up * bug fix * final clean-up * apply reverse logic in stats framework as well * Update cp_solver.rs * revert data.parquet * Apply suggestions from code review * Update datafusion/physical-expr-common/src/stats_v2.rs * Update datafusion/physical-expr-common/src/stats_v2.rs * Apply suggestions from code review Fix links * Fix compilation issue * Fix mean/median formula for exponential distribution * casting + exp dir + remove opt's + is_valid refractor * Update stats_v2_graph.rs * remove inner mod * last todo: bernoulli propagation * Apply suggestions from code review * Apply suggestions from code review * prop_stats in binary * Update binary.rs * rename intervals * block explicit construction * test updates * Update binary.rs * revert renaming * impl range methods as well * Apply suggestions from code review * Apply suggestions from code review * Update datafusion/physical-expr-common/src/stats_v2.rs * Update stats_v2.rs * fmt * fix bernoulli or eval * fmt * Review * Review Part 2 * not propagate * clean-up * Review Part 3 * Review Part 4 * Review Part 5 * Review Part 6 * Review Part 7 * Review Part 8 * Review Part 9 * Review Part 10 * Review Part 11 * Review Part 12 * Review Part 13 * Review Part 14 * Review Part 15 | Fix equality comparisons between uniform distributions * Review Part 16 | Remove unnecessary temporary file * Review Part 17 | Leave TODOs for real-valued summary statistics * Review Part 18 * Review Part 19 | Fix variance calculations * Review Part 20 | Fix range calculations * Review Part 21 * Review Part 22 * Review Part 23 * Review Part 24 | Add default implementations for evaluate_statistics and propagate_statistics * Review Part 25 | Improve docs, refactor statistics graph code * Review Part 26 * Review Part 27 * Review Part 28 | Remove get_zero/get_one, simplify propagation in statistics graph * Review Part 29 * Review Part 30 | Move statistics-combining functions to core module, polish tests * Review Part 31 * Review Part 32 | Module reorganization * Review Part 33 * Add tests for bernoulli and gaussians combination * Incorporate community feedback * Fix merge issue --------- Co-authored-by: Sasha Syrotenko <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
- Loading branch information