Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance of fs2.io.file.Files[IO].walk() #3329

Closed
sven42 opened this issue Nov 2, 2023 · 4 comments · Fixed by #3383
Closed

Bad performance of fs2.io.file.Files[IO].walk() #3329

sven42 opened this issue Nov 2, 2023 · 4 comments · Fixed by #3383
Labels

Comments

@sven42
Copy link

sven42 commented Nov 2, 2023

I am migrating a tool to use fs2 (v3.9.2).
The first step is just to collect about 26000 files in a directory tree for further processing.

Here are the timings of this first step for several solutions:

  • java.nio.file.Files#walkFileTree: 0.5s
  • DeprecatedFilesApi.walk: 0.7s
  • fs2.io.file.Files#walk: 4s
@sven42 sven42 added the bug label Nov 2, 2023
@mpilquist
Copy link
Member

Thanks for the report. Are you using Files[IO].walk(path) or one of the overrides? Could you also describe the directory tree a bit - branching factor, depth, etc.

@sven42
Copy link
Author

sven42 commented Nov 25, 2023

Thanks for your reply.

I am using Files[IO].walk(path)
The directory contains 26.000 files in 3.300 directories.
The branching factor is 5 and max depth is 8.

I hope this helps.

@djspiewak
Copy link
Member

This doesn't entirely surprise me. walk has a lot of Stream.eval and essentially zero batching. A much faster approach would be to, at each locus, gather all children and partition them, batch-emitting anything which can be emitted and individually recursing anything which can't. The number of suspensions then should be O(n) in the number of directories within the hierarchy, whereas right now it is proportional to files + directories.

@djspiewak
Copy link
Member

Minor update: this is so slow that I had to trim down the depth used in benchmarking in order to make it tractable on my laptop. Here's what I'm using to test:

@State(Scope.Thread)
class FilesBenchmark {

  private[this] var target: Path = _

  @Setup
  def setup() = {
    val file = File.createTempFile("fs2-benchmarks-", "-walk")
    file.delete()
    file.mkdir()
    target = Path(file.toString)    // ewwwww

    val MaxDepth = 4
    val Names = 'A'.to('E').toList.map(_.toString)

    def loop(cwd: File, depth: Int): Unit = {
      if (depth < MaxDepth) {
        Names foreach { name =>
          val sub = new File(cwd, name)
          sub.mkdir()
          loop(sub, depth + 1)
        }
      } else if (depth == MaxDepth) {
        Names foreach { name =>
          val sub = new File(cwd, name)
          sub.createNewFile()
          loop(sub, depth + 1)
        }
      }
    }

    loop(file, 0)
  }

  @Benchmark
  def countFiles(): Int = {
    Files[IO]
      .walk(target)
      .chunks
      .map(_.size)
      .fold(0)(_ + _)
      .compile
      .lastOrError
      .unsafeRunSync()
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants