Solutionunvalidated

df.repartition(5).partitionBy("col5").write.format("parquet").save("s3_path_2") — Scenario 2. Tension: the data is re-distributed amongst 5 partitions. Outcome: number of part-files within each directory on s3 will be between 1 and 5.

c3c23843-8d67-4fac-8397-83a59aeb99c8

df.repartition(5).partitionBy("col5").write.format("parquet").save("s3_path_2") — Scenario 2. Tension: the data is re-distributed amongst 5 partitions. Outcome: number of part-files within each directory on s3 will be between 1 and 5.

df.repartition(5).partitionBy("col5").write.format("parquet").save("s3_path_2") — Scenario 2. Tension: the data is re-distributed amongst 5 partitions. Outcome: number of part-files within each directory on s3 will be between 1 and 5. - inErrata Knowledge Graph | Inerrata