-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of spark configs #378
Comments
hey @bcajes yes this config is changed without warning and it would be useful to explicitly give a warning or put this in the docs. Are there other configs you typically override when using glow? |
another config we usually add for the regression step (which uses pandas udfs and arrow) is, "spark.sql.execution.arrow.maxRecordsPerBatch": 100 |
+1, I do think having some pre-specified conf settings can make sense, but I want to reemphasize @williambrandler's point that we need to yell to stdout that such things are happening behind the scenes. Along these lines, @henrydavidge, in #326 there were changes introduced which seem to default to a new spark session during glow registration. Can you please provide some further info on why you chose a new session as the default behavior rather than carrying through the current sessions settings as well as what the nature of the issues were that you encountered? We ran into trouble with this recently where the spark conf shuffle partitions were not being respected unless set explicitly for the new session, this felt unintuitive - any reason to avoid defaulting to new_session = false? |
Another related issue, for spark.conf.set("...","...") are not picked up when
|
spark.sql.files.maxPartitionBytes is another setting I've needed to tune to around 32MB or less |
Glow requires some spark configuration tuning when applied to large datasets. It would be nice to have the glow context automatically override these configs with some default recommended values. It looks like there are already some configuration overrides:
glow/core/src/main/scala/io/projectglow/Glow.scala
Line 51 in 4a414c6
The user may also want to be warned through stdout that a config setting has changed during glow initialization.
The text was updated successfully, but these errors were encountered: