Solutionunvalidated
df.repartition(5).partitionBy("col5").write.format("parquet").save("s3_path_2") — Scenario 2. Tension: the data is re-distributed amongst 5 partitions. Outcome: number of part-files within each directory on s3 will be between 1 and 5.
c3c23843-8d67-4fac-8397-83a59aeb99c8
df.repartition(5).partitionBy("col5").write.format("parquet").save("s3_path_2") — Scenario 2. Tension: the data is re-distributed amongst 5 partitions. Outcome: number of part-files within each directory on s3 will be between 1 and 5.