Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The returned value from ZSTD_getFrameContentSize is not checked. #46

Closed
mkitti opened this issue Jul 25, 2024 · 6 comments · Fixed by #47
Closed

The returned value from ZSTD_getFrameContentSize is not checked. #46

mkitti opened this issue Jul 25, 2024 · 6 comments · Fixed by #47

Comments

@mkitti
Copy link
Contributor

mkitti commented Jul 25, 2024

The return value of ZSTD_getFrameContentSize is not checked before being passed to malloc.

int dest_size = ZSTD_getFrameContentSize(source_ptr, source_size);
dest_ptr = (char *)malloc((size_t)dest_size);

Here is the relevant excerpt from the Zstd manual as of version 1.5.1

#define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1)
#define ZSTD_CONTENTSIZE_ERROR (0ULL - 2)
unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSize);
src should point to the start of a ZSTD encoded frame.
srcSize must be at least as large as the frame header.
hint : any size >= ZSTD_frameHeaderSize_max is large enough.
@return : - decompressed size of src frame content, if known
- ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined
- ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid magic number, srcSize too small)
note 1 : a 0 return value means the frame is valid but "empty".
note 2 : decompressed size is an optional field, it may not be present, typically in streaming mode.
When return==ZSTD_CONTENTSIZE_UNKNOWN, data to decompress could be any size.
In which case, it's necessary to use streaming mode to decompress data.
Optionally, application can rely on some implicit limit,
as ZSTD_decompress() only needs an upper bound of decompressed size.
(For example, data could be necessarily cut into blocks <= 16 KB).
note 3 : decompressed size is always present when compression is completed using single-pass functions,
such as ZSTD_compress(), ZSTD_compressCCtx() ZSTD_compress_usingDict() or ZSTD_compress_usingCDict().
note 4 : decompressed size can be very large (64-bits value),
potentially larger than what local system can handle as a single memory segment.
In which case, it's necessary to use streaming mode to decompress data.
note 5 : If source is untrusted, decompressed size could be wrong or intentionally modified.
Always ensure return value fits within application's authorized limits.
Each application can set its own limits.
note 6 : This function replaces ZSTD_getDecompressedSize()

The return value from ZSTD_getFrameContentSize could be either ZSTD_CONTENTSIZE_UNKNOWN or ZSTD_CONTENTSIZE_ERROR. Preferably the value should be bounded by the expected destination buffer size rather than letting this be unconstrained from the data.

xref: google/neuroglancer#625

@manzt
Copy link
Owner

manzt commented Jul 29, 2024

Thanks for pointing this out. Not sure when I'll be able to implement a fix. Happy to accept a PR.

@mkitti
Copy link
Contributor Author

mkitti commented Jul 31, 2024

I have a few ideas about how to fix this. How do you do exception handling?

@mkitti
Copy link
Contributor Author

mkitti commented Aug 5, 2024

Could we use -fexceptions as described here?

https://emscripten.org/docs/porting/exceptions.html

@mkitti
Copy link
Contributor Author

mkitti commented Aug 10, 2024

Minimal reproducer:

#include "emscripten/bind.h"
#include "emscripten/val.h"
#include <iostream>

using namespace emscripten;

val compress(std::string source, int level);
val decompress(std::string source);

int main() {
    unsigned char data[] = {40, 181, 47, 253, 0, 88, 97, 0, 0, 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33};

    std::cout << decompress(std::string(reinterpret_cast<char *>(data), sizeof(data))).as<std::string>() << std::endl;

    return 0;
}
$ emcc zstd_codec.cpp zstd_codec_bug.cpp -I node_modules/zstd/lib -lzstd -L node_modules/zstd/build/lib -lembind -o test_zstd_codec_bug.js 

$ node test_zstd_codec_bug.js 
/Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:121
      throw ex;
      ^

RangeError: Invalid typed array length: 4294967222
    at new Uint8Array (<anonymous>)
    at Object.decodeMemoryView (/Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:1765:16)
    at __emval_take_value (/Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:2197:43)
    at wasm://wasm/0029c96e:wasm-function[30]:0x15b6
    at wasm://wasm/0029c96e:wasm-function[40]:0x19bc
    at wasm://wasm/0029c96e:wasm-function[91]:0x271d
    at wasm://wasm/0029c96e:wasm-function[130]:0x3964
    at /Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:618:12
    at callMain (/Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:5547:15)
    at doRun (/Users/kittisopikulm/Documents/src/numcodecs.js/codecs/zstd/test_zstd_codec_bug.js:5597:23)

Node.js v20.16.0

@mkitti
Copy link
Contributor Author

mkitti commented Aug 10, 2024

Bonus byte sequence to test:

    unsigned char data3[] = {40, 181, 47, 253, 0, 88, 36, 2, 0, 164, 3, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 1, 0, 58, 252, 223, 115, 5, 5, 76, 0, 0, 8, 115, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 107, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 99, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 91, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 83, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 75, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 67, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 117, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 109, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 101, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 93, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 85, 1, 0, 252, 255, 57, 16, 2, 76, 0, 0, 8, 77, 1, 0, 252, 255, 57, 16, 2, 77, 0, 0, 8, 69, 1, 0, 252, 127, 29, 8, 1};

It should decompress to ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz` repeatedly for 32768 repeats.

@manzt
Copy link
Owner

manzt commented Aug 13, 2024

awesome, thanks for taking a stab at this!

Could we use -fexceptions as described here?
emscripten.org/docs/porting/exceptions.html

Looks reasonable to me, but this disclaimer is interesting:

Note that this option has relatively high overhead, but it will work on all JavaScript engines with WebAssembly support. You can reduce the overhead by specifying a list of allowed functions in which exceptions are enabled, see the EXCEPTION_CATCHING_ALLOWED setting.

I'm curious what the overhead refers to (i.e, size of bundle or runtime cost)? It seems there is some native wasm exceptions as an option as well, but I'm not sure how widely supported they are across runtimes (browsers probably our main target)

@manzt manzt closed this as completed in #47 Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants