Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat: make it concurrent #593

Merged
merged 38 commits into from
Jul 27, 2023
Merged

Conversation

denizsurmeli
Copy link
Contributor

@denizsurmeli denizsurmeli commented Jul 14, 2023

This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible.

The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link.

Resolves #245.

@denizsurmeli denizsurmeli marked this pull request as ready for review July 21, 2023 07:37
@denizsurmeli denizsurmeli requested a review from a team as a code owner July 21, 2023 07:37
@denizsurmeli denizsurmeli requested review from igungor and seruman and removed request for a team July 21, 2023 07:37
@denizsurmeli
Copy link
Contributor Author

denizsurmeli commented Jul 21, 2023

I have reduced the test size in here.

contents, expected := getSequentialFileContent(16 * mb)

The reason for this is that since we have more test cases, we should aim to show that the feature/fix works while maintaining a reasonable time for the tests to run.

s5cmd/e2e/cat_test.go

Lines 50 to 75 in 5b55085

{
name: "cat remote object with lower part size and higher concurrency",
cmd: []string{
"cat",
"-p",
"1",
"-c",
"10",
},
expected: expected,
},
{
name: "cat remote object with json flag lower part size and higher concurrency",
cmd: []string{
"--json",
"cat",
"-p",
"1",
"-c",
"10",
}, expected: expected,
assertOps: []assertOp{
jsonCheck(true),
},
},
}

Lowering the file size and adjusting the partSize and concurrency on the tests both show the feature works correctly while maintaining a feasible duration for the test.

@igungor
Copy link
Member

igungor commented Jul 27, 2023

🥇

@igungor igungor merged commit 4292ecc into peak:master Jul 27, 2023
ahmethakanbesel pushed a commit to ahmethakanbesel/s5cmd that referenced this pull request Jul 28, 2023
This PR adds a new io.WriterAt adapter for non-seekable writers. It uses an internal linked list to order the incoming chunks. The implementation is independent from the download manager of aws-sdk-go, and because of that currently it can not bound the memory usage. In order to limit the memory usage, we would have had to write a custom manager other than the aws-sdk-go's implementation, which seemed unfeasible.

The new implementation is about %25 percent faster than the older implementation for a 9.4 GB file with partSize=50MB and concurrency=20 parameters, with significantly higher memory usage, on average it uses 0.9 GB of memory and at most 2.1 GB is observed. Obviously, the memory usage and performance is dependent on the partSize-concurrency configuration and the link.

Resolves peak#245

Co-authored-by: İbrahim Güngör <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

's5cmd cat' sub command is not using concurrent connections
3 participants