Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Repartition files by rows + allow selecting a dense subset of rows from a file #2441

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Feb 20, 2025

No description provided.

Copy link
Contributor

github-actions bot commented Feb 20, 2025

Benchmarks: TPC-H on NVME

Table of Results
name PR c643700 base ad5d9af ratio (PR/base) unit
tpch_q01/arrow 85418941 8.53671e+07 1.00061 ns
tpch_q02/arrow 44634864 4.41435e+07 1.01113 ns
tpch_q03/arrow 36336668 3.66923e+07 0.990309 ns
tpch_q04/arrow 33406468 3.23916e+07 1.03133 ns
tpch_q05/arrow 60667308 6.28312e+07 0.965561 ns
tpch_q06/arrow 8582547 8.89434e+06 0.964945 ns
tpch_q07/arrow 107797447 1.04619e+08 1.03038 ns
tpch_q08/arrow 62408621 5.87955e+07 1.06145 ns
tpch_q09/arrow 92160341 9.0231e+07 1.02138 ns
tpch_q10/arrow 58521050 5.70289e+07 1.02616 ns
tpch_q11/arrow 26777903 2.67216e+07 1.00211 ns
tpch_q12/arrow 37604372 3.30576e+07 1.13754 ns
tpch_q13/arrow 26882056 2.649e+07 1.0148 ns
tpch_q14/arrow 11638356 1.28794e+07 0.903643 ns
tpch_q15/arrow 26217666 2.75796e+07 0.950618 ns
tpch_q16/arrow 22596079 2.27632e+07 0.992658 ns
tpch_q17/arrow 81627907 8.67869e+07 0.940556 ns
tpch_q18/arrow 162333727 1.74306e+08 0.931312 ns
tpch_q19/arrow 27996459 2.85352e+07 0.98112 ns
tpch_q20/arrow 37769303 4.03426e+07 0.936215 ns
tpch_q21/arrow 159921628 1.68862e+08 0.947054 ns
tpch_q22/arrow 17300040 1.73577e+07 0.99668 ns
tpch_q01/parquet 146916277 1.42195e+08 1.0332 ns
tpch_q02/parquet 98967252 1.0007e+08 0.988976 ns
tpch_q03/parquet 113266480 1.06498e+08 1.06356 ns
tpch_q04/parquet 62466956 6.19587e+07 1.0082 ns
tpch_q05/parquet 121539455 1.16046e+08 1.04734 ns
tpch_q06/parquet 29404344 2.95597e+07 0.994745 ns
tpch_q07/parquet 153506707 1.5889e+08 0.966117 ns
tpch_q08/parquet 156520578 1.54143e+08 1.01542 ns
tpch_q09/parquet 196805543 2.11199e+08 0.931848 ns
tpch_q10/parquet 152910165 1.64675e+08 0.928558 ns
tpch_q11/parquet 47763121 4.90728e+07 0.973311 ns
tpch_q12/parquet 87835224 8.86813e+07 0.990459 ns
tpch_q13/parquet 186606072 2.1172e+08 0.881382 ns
tpch_q14/parquet 49863282 5.14535e+07 0.969095 ns
tpch_q15/parquet 87835285 8.43486e+07 1.04134 ns
tpch_q16/parquet 46699724 4.85431e+07 0.962025 ns
tpch_q17/parquet 146341926 1.62059e+08 0.903015 ns
tpch_q18/parquet 237439483 2.55447e+08 0.929506 ns
tpch_q19/parquet 84181940 8.525e+07 0.987472 ns
tpch_q20/parquet 95648683 1.06661e+08 0.896753 ns
tpch_q21/parquet 205257992 2.20685e+08 0.930096 ns
tpch_q22/parquet 49468919 5.15235e+07 0.960123 ns
tpch_q01/vortex-file-compressed 69052135 6.80009e+07 1.01546 ns
tpch_q02/vortex-file-compressed 53915798 4.67919e+07 1.15225 ns
tpch_q03/vortex-file-compressed 46667181 4.01443e+07 1.16249 ns
tpch_q04/vortex-file-compressed 36564788 3.16429e+07 1.15554 ns
tpch_q05/vortex-file-compressed 65179712 6.03278e+07 1.08043 ns
tpch_q06/vortex-file-compressed 17511867 1.76884e+07 0.990021 ns
tpch_q07/vortex-file-compressed 103798786 9.95216e+07 1.04298 ns
tpch_q08/vortex-file-compressed 76601826 6.86218e+07 1.11629 ns
tpch_q09/vortex-file-compressed 108765371 9.33148e+07 1.16558 ns
tpch_q10/vortex-file-compressed 63918655 5.90823e+07 1.08186 ns
tpch_q11/vortex-file-compressed 28343185 2.53829e+07 1.11662 ns
tpch_q12/vortex-file-compressed 51638387 4.96739e+07 1.03955 ns
tpch_q13/vortex-file-compressed 39235977 3.06382e+07 1.28062 ns
tpch_q14/vortex-file-compressed 23053619 2.18818e+07 1.05355 ns
tpch_q15/vortex-file-compressed 42664726 4.04557e+07 1.0546 ns
tpch_q16/vortex-file-compressed 33240746 2.85791e+07 1.16312 ns
tpch_q17/vortex-file-compressed 84941953 8.58779e+07 0.989102 ns
tpch_q18/vortex-file-compressed 157020890 1.52972e+08 1.02647 ns
tpch_q19/vortex-file-compressed 36797286 3.30542e+07 1.11324 ns
tpch_q20/vortex-file-compressed 50164670 4.39309e+07 1.1419 ns
tpch_q21/vortex-file-compressed 156594748 1.4063e+08 1.11352 ns
tpch_q22/vortex-file-compressed 22635618 3.00505e+07 0.753254 ns

Copy link
Contributor

github-actions bot commented Feb 20, 2025

Benchmarks: TPC-H on S3

Table of Results
name PR c643700 base ad5d9af ratio (PR/base) unit
tpch_q01/parquet 311984464 3.12755e+08 0.997537 ns
tpch_q02/parquet 728958492 7.72587e+08 0.943529 ns
tpch_q03/parquet 465346785 4.72118e+08 0.985658 ns
tpch_q04/parquet 253649575 2.52649e+08 1.00396 ns
tpch_q05/parquet 627249080 6.29619e+08 0.996235 ns
tpch_q06/parquet 195746471 1.99938e+08 0.979034 ns
tpch_q07/parquet 696377157 6.80029e+08 1.02404 ns
tpch_q08/parquet 847988822 8.6262e+08 0.983039 ns
tpch_q09/parquet 749891295 7.5459e+08 0.993774 ns
tpch_q10/parquet 615166036 5.83334e+08 1.05457 ns
tpch_q11/parquet 320583464 3.00422e+08 1.06711 ns
tpch_q12/parquet 298099857 3.10371e+08 0.960462 ns
tpch_q13/parquet 450329064 4.26702e+08 1.05537 ns
tpch_q14/parquet 280285760 2.95068e+08 0.949903 ns
tpch_q15/parquet 503172952 5.21735e+08 0.964422 ns
tpch_q16/parquet 280791943 3.03776e+08 0.924338 ns
tpch_q17/parquet 463524787 4.36533e+08 1.06183 ns
tpch_q18/parquet 625191631 6.27283e+08 0.996666 ns
tpch_q19/parquet 332935633 3.15285e+08 1.05598 ns
tpch_q20/parquet 571674691 5.6453e+08 1.01266 ns
tpch_q21/parquet 737532064 7.05018e+08 1.04612 ns
tpch_q22/parquet 294609779 2.91848e+08 1.00946 ns
tpch_q01/vortex-file-compressed 334913825 4.35687e+08 0.768702 ns
tpch_q02/vortex-file-compressed 469414948 4.1592e+08 1.12862 ns
tpch_q03/vortex-file-compressed 558084461 5.80831e+08 0.960837 ns
tpch_q04/vortex-file-compressed 546862454 6.2891e+08 0.86954 ns
tpch_q05/vortex-file-compressed 508761249 4.92012e+08 1.03404 ns
tpch_q06/vortex-file-compressed 383583642 5.20397e+08 0.737098 ns
tpch_q07/vortex-file-compressed 620318243 6.95792e+08 0.891529 ns
tpch_q08/vortex-file-compressed 656856990 6.97143e+08 0.942213 ns
tpch_q09/vortex-file-compressed 674863885 6.03279e+08 1.11866 ns
tpch_q10/vortex-file-compressed 527341944 5.23837e+08 1.00669 ns
tpch_q11/vortex-file-compressed 218417228 1.62326e+08 1.34554 ns
tpch_q12/vortex-file-compressed 717057726 9.51261e+08 0.753797 ns
tpch_q13/vortex-file-compressed 218928286 1.46589e+08 1.49348 ns
tpch_q14/vortex-file-compressed 342567225 4.30723e+08 0.79533 ns
tpch_q15/vortex-file-compressed 699139749 1.011e+09 0.691532 ns
tpch_q16/vortex-file-compressed 237269674 1.94334e+08 1.22093 ns
tpch_q17/vortex-file-compressed 350318154 3.94484e+08 0.888041 ns
tpch_q18/vortex-file-compressed 477707173 4.28197e+08 1.11562 ns
tpch_q19/vortex-file-compressed 368831829 4.21179e+08 0.875713 ns
tpch_q20/vortex-file-compressed 562092049 6.03081e+08 0.932034 ns
tpch_q21/vortex-file-compressed 1160559425 1.33253e+09 0.870946 ns
tpch_q22/vortex-file-compressed 194212349 1.44532e+08 1.34374 ns

Copy link
Contributor

github-actions bot commented Feb 20, 2025

Benchmarks: Clickbench on NVME

Table of Results
name PR c643700 base ad5d9af ratio (PR/base) unit
clickbench_q00/parquet 3001459 2.20253e+06 1.36273 ns
clickbench_q01/parquet 62628838 6.21442e+07 1.0078 ns
clickbench_q02/parquet 119064079 1.1917e+08 0.999112 ns
clickbench_q03/parquet 86874064 8.67453e+07 1.00148 ns
clickbench_q04/parquet 675518724 6.6344e+08 1.01821 ns
clickbench_q05/parquet 744336944 7.17967e+08 1.03673 ns
clickbench_q06/parquet 2324884 2.18397e+06 1.06452 ns
clickbench_q07/parquet 64352476 6.61922e+07 0.972207 ns
clickbench_q08/parquet 780328494 7.3584e+08 1.06046 ns
clickbench_q09/parquet 1069124897 1.03897e+09 1.02902 ns
clickbench_q10/parquet 270551756 2.60929e+08 1.03688 ns
clickbench_q11/parquet 312180954 3.14127e+08 0.993804 ns
clickbench_q12/parquet 808630790 7.56003e+08 1.06961 ns
clickbench_q13/parquet 1025534332 1.03132e+09 0.994387 ns
clickbench_q14/parquet 744390008 7.41639e+08 1.00371 ns
clickbench_q15/parquet 746366230 7.65294e+08 0.975267 ns
clickbench_q16/parquet 1660864366 1.69935e+09 0.977355 ns
clickbench_q17/parquet 1466242627 1.4407e+09 1.01773 ns
clickbench_q18/parquet 3107927850 3.08037e+09 1.00895 ns
clickbench_q19/parquet 68317215 6.76373e+07 1.01005 ns
clickbench_q20/parquet 1138113355 1.13998e+09 0.998363 ns
clickbench_q21/parquet 1298281361 1.27474e+09 1.01846 ns
clickbench_q22/parquet 1932552381 1.90232e+09 1.01589 ns
clickbench_q23/parquet 7953365304 7.93907e+09 1.0018 ns
clickbench_q24/parquet 452132199 4.54945e+08 0.993817 ns
clickbench_q25/parquet 393646964 3.99748e+08 0.984739 ns
clickbench_q26/parquet 501283091 5.08983e+08 0.984871 ns
clickbench_q27/parquet 1592456617 1.59176e+09 1.00044 ns
clickbench_q28/parquet 11833477752 1.18017e+10 1.00269 ns
clickbench_q29/parquet 429561059 4.36611e+08 0.983853 ns
clickbench_q30/parquet 692246956 6.84596e+08 1.01118 ns
clickbench_q31/parquet 730506315 7.18788e+08 1.0163 ns
clickbench_q32/parquet 2783245121 2.78834e+09 0.998171 ns
clickbench_q33/parquet 3081181559 2.98081e+09 1.03367 ns
clickbench_q34/parquet 2964011839 2.91431e+09 1.01705 ns
clickbench_q35/parquet 886140801 8.95309e+08 0.989759 ns
clickbench_q36/parquet 179546861 1.82719e+08 0.982638 ns
clickbench_q37/parquet 86100814 8.83918e+07 0.974081 ns
clickbench_q38/parquet 112279071 1.14353e+08 0.981863 ns
clickbench_q39/parquet 316584250 3.38005e+08 0.936627 ns
clickbench_q40/parquet 52231109 5.29727e+07 0.986 ns
clickbench_q41/parquet 49818132 5.04662e+07 0.987159 ns
clickbench_q42/parquet 70175228 7.03751e+07 0.99716 ns
clickbench_q00/vortex-file-compressed 2238062 2.30749e+06 0.969911 ns
clickbench_q01/vortex-file-compressed 25676132 2.5386e+07 1.01143 ns
clickbench_q02/vortex-file-compressed 57873952 5.71857e+07 1.01204 ns
clickbench_q03/vortex-file-compressed 63785527 5.94789e+07 1.07241 ns
clickbench_q04/vortex-file-compressed 612258449 6.10923e+08 1.00219 ns
clickbench_q05/vortex-file-compressed 702481061 6.24753e+08 1.12441 ns
clickbench_q06/vortex-file-compressed 2323074 2.31586e+06 1.00311 ns
clickbench_q07/vortex-file-compressed 34954215 3.33189e+07 1.04908 ns
clickbench_q08/vortex-file-compressed 713231795 6.94155e+08 1.02748 ns
clickbench_q09/vortex-file-compressed 831159928 8.82082e+08 0.942271 ns
clickbench_q10/vortex-file-compressed 207256378 1.65799e+08 1.25004 ns
clickbench_q11/vortex-file-compressed 224970479 1.7872e+08 1.25879 ns
clickbench_q12/vortex-file-compressed 584651608 5.41983e+08 1.07873 ns
clickbench_q13/vortex-file-compressed 868793082 8.06678e+08 1.077 ns
clickbench_q14/vortex-file-compressed 550873907 5.07558e+08 1.08534 ns
clickbench_q15/vortex-file-compressed 715964298 7.18577e+08 0.996364 ns
clickbench_q16/vortex-file-compressed 1444752023 1.45224e+09 0.994842 ns
clickbench_q17/vortex-file-compressed 1376963747 1.38443e+09 0.994604 ns
clickbench_q18/vortex-file-compressed 2796693379 2.89351e+09 0.966539 ns
clickbench_q19/vortex-file-compressed 52502149 4.05729e+07 1.29402 ns
clickbench_q20/vortex-file-compressed 909316423 8.59688e+08 1.05773 ns
clickbench_q21/vortex-file-compressed 1002738514 9.4442e+08 1.06175 ns
clickbench_q22/vortex-file-compressed 1305830840 1.29215e+09 1.01059 ns
clickbench_q23/vortex-file-compressed 2277960765 2.02429e+09 1.12531 ns
clickbench_q24/vortex-file-compressed 296366156 2.51108e+08 1.18023 ns
clickbench_q25/vortex-file-compressed 296712105 2.47348e+08 1.19957 ns
clickbench_q26/vortex-file-compressed 372695580 3.04062e+08 1.22572 ns
clickbench_q27/vortex-file-compressed 1412321476 1.39156e+09 1.01492 ns
clickbench_q28/vortex-file-compressed 10486214662 1.08159e+10 0.969517 ns
clickbench_q29/vortex-file-compressed 711035135 6.81453e+08 1.04341 ns
clickbench_q30/vortex-file-compressed 493263899 4.5185e+08 1.09165 ns
clickbench_q31/vortex-file-compressed 512224781 4.72912e+08 1.08313 ns
clickbench_q32/vortex-file-compressed 2661047740 2.85839e+09 0.93096 ns
clickbench_q33/vortex-file-compressed 2491326535 2.55121e+09 0.976527 ns
clickbench_q34/vortex-file-compressed 2496801852 2.54218e+09 0.98215 ns
clickbench_q35/vortex-file-compressed 971960378 9.81061e+08 0.990724 ns
clickbench_q36/vortex-file-compressed 187358278 1.00597e+08 1.86246 ns
clickbench_q37/vortex-file-compressed 108633897 5.71389e+07 1.90123 ns
clickbench_q38/vortex-file-compressed 51032667 7.42365e+07 0.687433 ns
clickbench_q39/vortex-file-compressed 280515741 1.69523e+08 1.65474 ns
clickbench_q40/vortex-file-compressed 45761648 2.98298e+07 1.53409 ns
clickbench_q41/vortex-file-compressed 44388733 3.15297e+07 1.40784 ns
clickbench_q42/vortex-file-compressed 65992274 4.87451e+07 1.35382 ns

Copy link

cloudflare-workers-and-pages bot commented Feb 20, 2025

Deploying vortex-bench with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2b4cdec
Status: ✅  Deploy successful!
Preview URL: https://59a9608f.vortex-bench.pages.dev
Branch Preview URL: https://adamg-repartition-files.vortex-bench.pages.dev

View logs

Copy link

codspeed-hq bot commented Feb 20, 2025

CodSpeed Performance Report

Merging #2441 will not alter performance

Comparing adamg/repartition-files (2b4cdec) with develop (dbb734c)

Summary

✅ 765 untouched benchmarks

@AdamGS AdamGS changed the title [WIP] Repartition files by rows + allow selecting a dense subset of rows from a file feat: Repartition files by rows + allow selecting a dense subset of rows from a file Feb 20, 2025
@AdamGS AdamGS marked this pull request as ready for review February 20, 2025 17:40
@AdamGS
Copy link
Contributor Author

AdamGS commented Feb 21, 2025

With the help of the script from #2452, @robert3005's excel skills, I charted the memory usage per query in our two big SQL benchmarks, I added the picture here and I can also provide the spreadsheet with raw data if anyone is interested.
My main takeaways:

  1. Memory usage is slightly higher on TPCH (which we really should start running at a higher scale factor)
  2. In clickbench we get more of a mixed outcome, but there are multiple cases where memory usage is lower by multiple GB (23, 27, 28, 33, 34)

Screenshot 2025-02-21 at 12 33 42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant