Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First cut at native C++ MLT loader #441

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
0c55afc
ugh cmake
TimSylvester Jan 8, 2025
3e67bcc
protozero
TimSylvester Jan 10, 2025
19e0181
more
TimSylvester Jan 10, 2025
49c61d9
tileset metadata
TimSylvester Jan 14, 2025
87f22f0
decoding
TimSylvester Jan 17, 2025
481c82d
format
TimSylvester Jan 17, 2025
41a17bf
refactor, more decoding
TimSylvester Jan 17, 2025
f62d031
refactor buffer management
TimSylvester Jan 17, 2025
3e1cef7
eliminate template specializations, add support for on-the-fly type c…
TimSylvester Jan 20, 2025
dd41336
add fastpfor
TimSylvester Jan 20, 2025
72394eb
bug
TimSylvester Jan 21, 2025
f4b7c4f
more decoding
TimSylvester Jan 23, 2025
f6a10a8
string and property decoding, refactor, add noexcepts
TimSylvester Jan 24, 2025
9e35424
tests
TimSylvester Jan 24, 2025
3536fa5
Property decoding, Morton decoding, reuse intermediate buffers, bette…
TimSylvester Jan 29, 2025
4788f6b
More geometry decoding, fix geometry error, more validity checks.
TimSylvester Jan 30, 2025
408f92f
more parsing
TimSylvester Jan 31, 2025
cdde3cc
Add json output, move headers
TimSylvester Feb 1, 2025
cd4d9d6
Write GeoJSON for validation
TimSylvester Feb 3, 2025
8cff4b2
Split properties to features.
TimSylvester Feb 4, 2025
e490d56
Fix vectorized delta decoding, add test cases
TimSylvester Feb 4, 2025
7bdbc31
Fix multi-polygon geometry
TimSylvester Feb 4, 2025
c5931be
Handle nullable property values better
TimSylvester Feb 5, 2025
68d7e87
Revise diff process to handle numbers represented as strings.
TimSylvester Feb 6, 2025
87b55ce
Enforce non-copyable and normalize declarations.
TimSylvester Feb 6, 2025
598c8a7
Fix bit count bug.
TimSylvester Feb 6, 2025
f86d371
Exclude BOMs from strings
TimSylvester Feb 6, 2025
05deab1
cleanup and refactor
TimSylvester Feb 6, 2025
505f7aa
cleanup
TimSylvester Feb 6, 2025
e3c64d8
configure pre-commit
louwers Feb 7, 2025
9c28198
Fix compilation macOS ARM
louwers Feb 7, 2025
7927d0e
Add basic C++ CI
louwers Feb 7, 2025
3eb39fe
Fix tag simde
louwers Feb 7, 2025
0be9c56
Use EXCLUDE_FROM_ALL, update deps, clean up protozero inclusion
louwers Feb 7, 2025
958baa3
Finish factory pattern for geometry
TimSylvester Feb 7, 2025
88ed9d9
use `static_assert`
TimSylvester Feb 7, 2025
de45c5b
remove extra config
TimSylvester Feb 7, 2025
c06f3d3
eliminate unused compile-time option
TimSylvester Feb 7, 2025
e56eef0
move import
TimSylvester Feb 7, 2025
68c83fd
add missing headers
TimSylvester Feb 7, 2025
407a364
satisfy gcc?
TimSylvester Feb 7, 2025
1d264d2
replace C string functions with STL stuff
TimSylvester Feb 7, 2025
2cf8f56
fix dependent type name
TimSylvester Feb 7, 2025
482d74c
build issues
TimSylvester Feb 7, 2025
7ec0f13
warning
TimSylvester Feb 7, 2025
495b3a7
remove positive exception specifications
TimSylvester Feb 7, 2025
818f3b7
only write new json when a diff is detected
TimSylvester Feb 7, 2025
89f6309
add libfsst
TimSylvester Feb 7, 2025
553ea8f
Merge branch 'main' into cpp
louwers Feb 8, 2025
15d5714
Port FSST test case, not even close to passing
TimSylvester Feb 10, 2025
3882b14
run pre-commit
louwers Feb 11, 2025
57863aa
remove obsolete file
TimSylvester Feb 12, 2025
5882858
Add JSON tool and README
TimSylvester Feb 12, 2025
73e7434
Refactor to eliminate sketchy vector casts
TimSylvester Feb 12, 2025
7ba42e6
remove logging
TimSylvester Feb 12, 2025
4e20475
add standard build properties
TimSylvester Feb 13, 2025
8158d31
fix relative path
TimSylvester Feb 13, 2025
c50c80d
Make protozero support optional
TimSylvester Feb 14, 2025
8df8392
Make protozero metadata decoding header-only
TimSylvester Feb 14, 2025
b5a1297
Convert `FetchContent` dependencies to submodules
TimSylvester Feb 14, 2025
c0531eb
fetch submodules in CI
TimSylvester Feb 14, 2025
a3868f1
export dependency target
TimSylvester Feb 14, 2025
0cb9b97
move submodules to root
TimSylvester Feb 17, 2025
ca9247b
Exclude dependency targets
TimSylvester Feb 17, 2025
fa942dd
simplify namespaces
TimSylvester Feb 17, 2025
081acaf
Pass protozero message by reference
TimSylvester Feb 19, 2025
f7a4c2f
allow for lvalue arguments
TimSylvester Feb 19, 2025
195e7e9
undo
TimSylvester Feb 19, 2025
6367feb
Fix a bug due to compiler or platform-dependent treatment of negative…
TimSylvester Feb 20, 2025
90c22df
fix off-by-one bug
TimSylvester Feb 21, 2025
232f66c
Implement decoding of FastPFor into 64-bit results
TimSylvester Mar 6, 2025
2440304
Fix string decoding error when the dictionary is empty
TimSylvester Mar 6, 2025
01236c1
Additional checks so that invalid input fails in a more useful way
TimSylvester Mar 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions cpp/include/mlt/metadata/stream.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,28 @@

namespace mlt::metadata::stream {

enum class DictionaryType {
enum class DictionaryType : std::uint32_t {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need to specify values because 0, 1, 2, 3, etc. is the default.

NONE = 0,
SINGLE = 1,
SHARED = 2,
VERTEX = 3,
MORTON = 4,
FSST = 5,
VALUE_COUNT = 6,
};

enum class LengthType {
enum class LengthType : std::uint32_t {
VAR_BINARY = 0,
GEOMETRIES = 1,
PARTS = 2,
RINGS = 3,
TRIANGLES = 4,
SYMBOL = 5,
DICTIONARY = 6,
VALUE_COUNT = 7,
};

enum class PhysicalLevelTechnique {
enum class PhysicalLevelTechnique : std::uint32_t {
NONE = 0,
/// Preferred, tends to produce the best compression ratio and decoding performance.
/// But currently limited to 32-bit integer.
Expand All @@ -39,29 +41,33 @@ enum class PhysicalLevelTechnique {
VARINT = 2,
/// Adaptive Lossless floating-Point Compression
ALP = 3,
VALUE_COUNT = 4,
};

enum class LogicalLevelTechnique {
enum class LogicalLevelTechnique : std::uint32_t {
NONE = 0,
DELTA = 1,
COMPONENTWISE_DELTA = 2,
RLE = 3,
MORTON = 4,
PSEUDODECIMAL = 5,
VALUE_COUNT = 6,
};

enum class OffsetType {
enum class OffsetType : std::uint32_t {
VERTEX = 0,
INDEX = 1,
STRING = 2,
KEY = 3,
VALUE_COUNT = 4,
};

enum class PhysicalStreamType {
enum class PhysicalStreamType : std::uint32_t {
PRESENT = 0,
DATA = 1,
OFFSET = 2,
LENGTH = 3,
VALUE_COUNT = 4,
};

class LogicalStreamType : public util::noncopyable {
Expand Down
14 changes: 9 additions & 5 deletions cpp/src/mlt/decode/geometry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,6 @@ class GeometryDecoder {
switch (geomStreamMetadata->getPhysicalLevelTechnique()) {
case PhysicalLevelTechnique::FAST_PFOR:
throw std::runtime_error("FastPfor encoding for geometries is not yet supported.");
break;
case PhysicalLevelTechnique::NONE:
case PhysicalLevelTechnique::ALP:
// TODO: other implementations are not clear on whether these are valid
Expand All @@ -109,6 +108,8 @@ class GeometryDecoder {
.decodeIntStream<std::uint32_t, std::uint32_t, std::int32_t, /*isSigned=*/true>(
tileData, geomColumn.vertices, *geomStreamMetadata);
break;
default:
throw std::runtime_error("Unsupported encoding for geometries: " + column.name);
};
break;
case DictionaryType::MORTON: {
Expand All @@ -119,6 +120,7 @@ class GeometryDecoder {
tileData, geomColumn.vertices, mortonStreamMetadata);
break;
}
default:
case DictionaryType::NONE:
case DictionaryType::SINGLE:
case DictionaryType::SHARED:
Expand All @@ -130,6 +132,8 @@ class GeometryDecoder {
}
case PhysicalStreamType::PRESENT:
break;
default:
throw std::runtime_error("Unsupported logical stream type: " + column.name);
}
}

Expand Down Expand Up @@ -309,11 +313,11 @@ class GeometryDecoder {
rings.clear();
}
} else {
if (partOffsetCounter + vertexOffsets.size() >= partOffsets.size() ||
ringOffsetsCounter + vertexOffsets.size() >= ringOffsets.size()) {
if (partOffsetCounter + numPolygons > partOffsets.size() ||
ringOffsetsCounter + numPolygons > ringOffsets.size()) {
throw std::runtime_error("geometry error");
}
for (std::size_t i = 0; i < vertexOffsets.size(); ++i) {
for (std::size_t i = 0; i < numPolygons; ++i) {
const auto numRings = partOffsets[partOffsetCounter++];
const auto numVertices = ringOffsets[ringOffsetsCounter++];
rings.reserve(numRings - 1);
Expand All @@ -322,7 +326,7 @@ class GeometryDecoder {
vertices, vertexOffsets, vertexOffsetsOffset, numVertices, true);
vertexOffsetsOffset += numVertices;

if (ringOffsetsCounter + numRings > ringOffsets.size()) {
if (ringOffsetsCounter + numRings - 1 > ringOffsets.size()) {
throw std::runtime_error("geometry error");
}
for (count_t j = 1; j < numRings; ++j) {
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/mlt/decode/string.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ class StringDecoder {
for (count_t i = 0; i < numStreams; ++i) {
auto streamMetadata = StreamMetadata::decode(tileData);
switch (streamMetadata->getPhysicalStreamType()) {
default:
throw std::runtime_error("Unsupported stream type");
case PhysicalStreamType::PRESENT:
throw std::runtime_error("Present stream not supported for string columns");
case PhysicalStreamType::OFFSET:
Expand Down
40 changes: 32 additions & 8 deletions cpp/src/mlt/metadata/stream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,35 @@
namespace mlt::metadata::stream {

namespace {
std::optional<LogicalStreamType> decodeLogicalStreamType(PhysicalStreamType physicalStreamType, int value) noexcept {
std::optional<LogicalStreamType> decodeLogicalStreamType(PhysicalStreamType physicalStreamType, int value) {
switch (physicalStreamType) {
case PhysicalStreamType::DATA:
return static_cast<DictionaryType>(value);
case PhysicalStreamType::OFFSET:
return static_cast<OffsetType>(value);
case PhysicalStreamType::LENGTH:
return static_cast<LengthType>(value);
default:
case PhysicalStreamType::DATA: {
const auto type = static_cast<DictionaryType>(value);
if (type < DictionaryType::VALUE_COUNT) {
return type;
}
break;
}
case PhysicalStreamType::OFFSET: {
const auto type = static_cast<OffsetType>(value);
if (type < OffsetType::VALUE_COUNT) {
return type;
}
break;
}
case PhysicalStreamType::LENGTH: {
const auto type = static_cast<LengthType>(value);
if (type < LengthType::VALUE_COUNT) {
return type;
}
break;
}
case PhysicalStreamType::PRESENT:
return {};
default:
break;
}
throw std::runtime_error("Invalid logical stream type: " + std::to_string(std::to_underlying(physicalStreamType)));
}
} // namespace

Expand Down Expand Up @@ -65,6 +82,13 @@ StreamMetadata StreamMetadata::decodeInternal(BufferStream& buffer) {
const auto logicalLevelTechnique2 = static_cast<LogicalLevelTechnique>((encodingsHeader >> 2) & 0x7);
const auto physicalLevelTechnique = static_cast<PhysicalLevelTechnique>(encodingsHeader & 0x3);

if (physicalStreamType >= PhysicalStreamType::VALUE_COUNT ||
logicalLevelTechnique1 >= LogicalLevelTechnique::VALUE_COUNT ||
logicalLevelTechnique2 >= LogicalLevelTechnique::VALUE_COUNT ||
physicalLevelTechnique >= PhysicalLevelTechnique::VALUE_COUNT) {
throw std::runtime_error("Invalid stream encoding");
}

using namespace util::decoding;
const auto [numValues, byteLength] = decodeVarints<std::uint32_t, 2>(buffer);

Expand Down