-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encoding and bits_per_sample option to save function #1226
Conversation
f0f83a0
to
5d6f45e
Compare
44ea317
to
ae897e6
Compare
throw std::runtime_error("Unsupported dtype."); | ||
namespace { | ||
|
||
std::tuple<sox_encoding_t, unsigned> get_save_encoding_for_wav( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function implements the mapping from user-provided encoding configuration to the actual encoding configuration for "wav" and "amb" format.
throw std::runtime_error(message.str()); | ||
} | ||
|
||
std::tuple<sox_encoding_t, unsigned> get_save_encoding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function implements the mapping from user-provided encoding configuration to the actual encoding configuration.
d017b5b
to
c1c8ed3
Compare
Two general comments:
|
// Convert to sox_sample_t (int32_t) and write to buffer | ||
SOX_SAMPLE_LOCALS; | ||
const auto dtype = tensor_.dtype(); | ||
if (dtype == torch::kFloat32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you want you should be able to use a switch statement for this as well
case Encoding::NOT_PROVIDED: | ||
switch (bits_per_sample) { | ||
case BitDepth::NOT_PROVIDED: | ||
if (dtype == torch::kFloat32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you should be able to use the switch statement for these types too, if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which type is it? I am getting the following errors; I tried with c10::kFloat
and others but got the same result
../../torchaudio/csrc/sox/utils.cpp:219:23: error: switch quantity not an integer
switch(dtype) {
^
../../torchaudio/csrc/sox/utils.cpp:220:25: error: could not convert ‘torch::kFloat32’ from ‘const c10::ScalarType’ to ‘<type error>’
case torch::kFloat32:
^~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the conversion to ScalarType, which will also guarantee that the TypeMeta is indeed a ScalarType (which all your cases seem to assume).
This commit sends a new event to GA that tracks tutorial clicks with the tutorial URL Co-authored-by: Brandon Green <[email protected]>
The current release version's "sox_io" backend's
save
function changes the encoding of the audio file based on the dtype of the Tensor. For example, if the dtype is "float32", then it will be saved as 32bit floating point PCM. This behavior was taken from SciPy'sscipy.io.wavefile.write
function. However it was pointed out that this is inconvenient for torchaudio users. Because most torchaudio's functionality works on float32 Tensor yet, the common audio formats typically retains only 16 bit, such as 16 bit signed integer PCM.To resolve the inconvenience while keeping the functionality to support different encodings, this PR
encoding
andbits_per_sample
parameters tosave
function.For non-compressed format (such as "wav"), it defaults to 16-bit signed integer PCM. (This is BC-breaking behavior if users were dumping Tensor object without converting to the matching dtype)The default encoding/bit depth are now determined by dtype, which is not BC-breaking.Note in #1204, we attempted to solve the issue by adding
dtype
argument, but after giving more thoughts I concluded that this approach is limited and the design is not intuitive, so we will replace thedtype
argument withencoding
andbits_per_sample
. This will allow other encodings such as mu-law and a-law.For the detailed behavior of the
encoding
andbits_per_sample
, see the table at the bottom of this description.Along the way, the implementation of type conversion is updated to use the implementations that
libsox
provides. So the new implementation ofsave
function yields the same result assox
command.Table: the provided values and the actual values to be used
encoding
orbits_per_sample
option, the encoding configuration is determined bydtype
.RuntimeError
mp3
andvorbis
, do no support changingencoding
andbits_per_sample
, so if users provide some values, they will be ignored with warning.Provided arguments
Resulting value (if differ)
format
encoding
bits_per_sample
encoding
bits_per_sample
wav, amb
None
None
*
*
8
PCM_U
16
PCM_S
24
PCM_S
32
PCM_S
PCM_S
None
32
16
24
32
PCM_U
None
8
8
PCM_F
None
32
32
ULAW
None
8
8
ALAW
None
8
8
FLAC
N/A
None
24
8
16
24
Shpere
None
None
PCM_S
32
8
PCM_S
16
PCM_S
24
PCM_S
32
PCM_S
PCM_S
None
32
8
16
24
32
ULAW
None
8
8
ALAW
None
8
8
16
24
32