-
Notifications
You must be signed in to change notification settings - Fork 75
Comparing changes
Open a pull request
base repository: pytorch/tensorpipe
base: 42033c5437fc9c181dc9d0a32df600484e2b0685
head repository: pytorch/tensorpipe
compare: 9646e1a431997edb1579972cef196d8fb97a77a5
Commits on Aug 31, 2020
-
Fix error: ‘runtime_error’ is not a member of ‘std’ (#204)
Summary: Pull Request resolved: #204 Reviewed By: beauby Differential Revision: D23424221 Pulled By: lw fbshipit-source-id: 7a5913dc2b13c7e7dbff7eed4520536e35efa20a
Configuration menu - View commit details
-
Copy full SHA for 19c514d - Browse repository at this point
Copy the full SHA 19c514dView commit details
Commits on Sep 4, 2020
-
Fix benchmarks not picking up transports/channels
Summary: Fixes #163 Reviewed By: beauby Differential Revision: D23500308 fbshipit-source-id: 64626f9c9eed583f735bdf8df1c5ed64796ec235
Configuration menu - View commit details
-
Copy full SHA for fd3c05d - Browse repository at this point
Copy the full SHA fd3c05dView commit details -
Add options for multiple payloads/tensors to benchmarks
Summary: Needed to be able to get more realistic benchmarks, especially of the protobuf serialization part (more payloads/tensors, bigger protobufs). Reviewed By: beauby Differential Revision: D23500307 fbshipit-source-id: eeb965ba96db311130e235a0dfcd0da64498563d
Configuration menu - View commit details
-
Copy full SHA for 08c93ae - Browse repository at this point
Copy the full SHA 08c93aeView commit details -
Add libnop dependency and use for tensor descriptor
Summary: Include libnop as a dependency but, for now, keep protobuf too. Provide a set of helpers for libnop, primarily to perform type erasure. In order to build and validate these helpers, use them to (de)serialize the tensor descriptors produced/consumed by channels, as they are a simple and self-contained application. Wider migration from protobuf to libnop will come later. Reviewed By: heiner Differential Revision: D22763734 fbshipit-source-id: 75bf8211ba9f89a3ee62e518b9ac44017a4f707e
Configuration menu - View commit details
-
Copy full SHA for ab9fb4b - Browse repository at this point
Copy the full SHA ab9fb4bView commit details -
Replace protobuf with libnop in connections
Summary: This is the big change, where we swap out protobuf for libnop in the biggest usecases, namely the pipe and the channels. Luckily all these changes are identical so although this consists in many lines the complexity of this change is limited. What we do is: - change the interface of connections to provide a specialization of read/write for NopHolders, rather than for protobuf Messages. - convert all the proto definitions to C++ structs, defined in new nop_types.h files. - go through all usages of the protobufs and all callsites of read/write and update them. This change leaves behind some proto defs which are still being used by some unit tests, as these tests will be ported in later commits and those protos will be removed then. It also doesn't reimplement yet the specialized codepath that the SHM has for protobuf messages: it is left there, albeit "cut off", which means that SHM will fall back to the generic implementation and be slower. This will be temporary as in the next commit we'll port that specialized code and re-attach it. Reviewed By: heiner Differential Revision: D22763737 fbshipit-source-id: d44be13e827822662a0b0395aff05ece478dbc51
Configuration menu - View commit details
-
Copy full SHA for 4f6e241 - Browse repository at this point
Copy the full SHA 4f6e241View commit details -
Use specialized libnop reader/writer for SHM
Summary: We had specialized "zero-copy streams" for protobuf that allowed us to write/read directly to/from the shared-memory ringbuffer. Now we do the same for libnop with custom reader/writers. Reviewed By: beauby, heiner Differential Revision: D22763736 fbshipit-source-id: 734ceeb490fbcb4e3763ab04120393e460b56ea7
Configuration menu - View commit details
-
Copy full SHA for 19d82af - Browse repository at this point
Copy the full SHA 19d82afView commit details -
Summary: Now that no one uses protobuf anymore, remove all that is left. Reviewed By: heiner Differential Revision: D22763733 fbshipit-source-id: de6fd11af14a9f3a64f695a8c86fe7e9d179b86e
Configuration menu - View commit details
-
Copy full SHA for 60cb9e6 - Browse repository at this point
Copy the full SHA 60cb9e6View commit details -
Fix libnop headers being private in CMake (#206)
Summary: And a missing const_cast Pull Request resolved: #206 Reviewed By: beauby Differential Revision: D23540899 Pulled By: lw fbshipit-source-id: d97bbbfa541f8598aa351c9701b2507e3f7759a0
Configuration menu - View commit details
-
Copy full SHA for 49bcc3e - Browse repository at this point
Copy the full SHA 49bcc3eView commit details
Commits on Sep 5, 2020
-
Cuda same-machine channel (#175)
Summary: Introduce the `CudaIpc` channel, which handles exchanging CUDA tensors between processes on the same machine. It leverages NVLink when available. Pull Request resolved: #175 Reviewed By: lw Differential Revision: D22355812 Pulled By: beauby fbshipit-source-id: 0589cb75f58e1c5fcc165e7449074eeb5dc697cf
Configuration menu - View commit details
-
Copy full SHA for a6c5972 - Browse repository at this point
Copy the full SHA a6c5972View commit details
Commits on Sep 8, 2020
-
Fix CMake build of CudaIpc channel. (#208)
Summary: This wasn't caught by CI as we're currently not running GPU tests on CI. Pull Request resolved: #208 Reviewed By: lw Differential Revision: D23573767 Pulled By: beauby fbshipit-source-id: c948c85e79ecb7482ff2446b128845184ba09e21
Configuration menu - View commit details
-
Copy full SHA for 779225d - Browse repository at this point
Copy the full SHA 779225dView commit details
Commits on Sep 15, 2020
-
Summary: With the recent migration from protobuf to libnop we've simplified the set of methods we require the ringbuffer consumer/producer to provide. We also came up with a nice low-level general purpose interface to access the ringbuffer: accessing a contiguous region through either one ptr+len or two (if it wraps). This diff takes the chance we now have to simplify the ringbuffer's methods, starting by the consumer. It makes the "access a contiguous region" method the foundation of everything else: that's now the only place where the ringbuffer logic resides, and everything else builds on top of it like a stack. Note that "everything else" now boils down to two methods: one that, in addition, copies the data out of the ringbuffer into a user-provided buffer, and another one that, on top, deals with transactions for you. The advantage of this change is that we now have a very consistent experience of the ringbuffer, no matter what one tries to do with it (e.g., we still had some methods that didn't support zero-sized lengths). It also helps robustness: since the logic in the same one method is tested in all accesses, it gets much higher coverage. Reviewed By: beauby Differential Revision: D23566823 fbshipit-source-id: bd02e8217682fd5924194714b18de05c6492d9c1
Configuration menu - View commit details
-
Copy full SHA for 125e700 - Browse repository at this point
Copy the full SHA 125e700View commit details -
Summary: Same as the diff before, but for the producer. Reviewed By: beauby Differential Revision: D23566822 fbshipit-source-id: f0026c12d20c5630f57acd7b58d547328de497ce
Configuration menu - View commit details
-
Copy full SHA for feaf9f3 - Browse repository at this point
Copy the full SHA feaf9f3View commit details -
Clear up ownership for shm segments and ringbuffers
Summary: I found it odd that we were using shared_ptrs to store some parts of our ringbuffer/shm-segment code. I can see two reasons: - It allows us to have both the consumer and producer hold a shared_ptr to the ringbuffer, so that one could have multiple of them sharing the same ringbuffer, which would be kept alive as long as there were users. In practice I don't think this mattered: each ringbuffer had only one user, both "in the code" (one consumer or one producer, never both) and "logically" (it was owned by one side of the connection, or by reactor, ... and destroyed when they were gone). shared_ptrs weren't buying us anything and in fact were weakening the ownership model. - Its aliasing constructor allowed us to bind the lifetime of the shared memory segment to the one of the ringbuffer object that was stored in it, thus allowing us to keep a bunch of things alive by holding just one pointer. So in the end this resulted in a lot of implicit links, with the end user (the connection, the reactor, ...) holding just a shared_ptr to a consumer/producer and this in turn owning (in a pretend-shared way) a lot of resources, including mmapped memory and file descriptors. Here I try to make the ownership model more explicit, by separating the resource (the Segment class, which now becomes a movable RAII wrapper which owns the fd and the mmap) from the shallow and stateless helpers used to access it (the ringbuffer, which is a pair of pointers, and the consumer/producer, which are little more than a collection of methods). Consumers/producers are in fact now so simple that they don't have a common base class anymore and are created on-demand when needed. This removes the usage of shared_ptrs entirely, and I think clarifies the ownership and gives us stronger guarantees that resources will be cleaned up and when that'll occur. Also, it allows us to store the ringbuffer header on the stack or as a field of another object without having to do some shared_ptr contortions (an aliased shared_ptr with a custom empty destructor). This will come in handy later, for InfiniBand. Reviewed By: beauby Differential Revision: D23567125 fbshipit-source-id: c229c7f96655324787eda02c8545ab7bafadb885
Configuration menu - View commit details
-
Copy full SHA for a655116 - Browse repository at this point
Copy the full SHA a655116View commit details -
Promote SHM's fd & socket RAII wrappers to project-wide helpers
Summary: We had RAII wrappers for file descriptors and sockets in SHM which were quite helpful. In some places, where we couldn't make use of the Fd wrapper, we were "releasing" file descriptors and then having an other object re-own them, and in between these steps we were dealing with raw integers. Just like when dealing with raw pointers, that makes me queasy. As for the Socket wrapper, it had a very helpful method for sending fds over a unix domain socket, which we wanted to use in the tests, and in order to do so we had some tests depend on the SHM transport, which was weird. I think it's nicer to make these helpers available to the entire codebase, so that we can use the Fd wrapper throughout (note how we can now delete its `release` method! and note how the Segment now doesn't close its file descriptor on its own but simply holds a Fd which does it on its own) and we can use the Socket methods easily. Reusability of these components will become more interesting shortly, with the InfiniBand transport. Reviewed By: beauby Differential Revision: D23567124 fbshipit-source-id: ca33325a0da4aeb03cde24bb1912049911aa16f2
Configuration menu - View commit details
-
Copy full SHA for 0fbfc1a - Browse repository at this point
Copy the full SHA 0fbfc1aView commit details -
Summary: The recent simplification of the ringbuffer interface allows, in turn, some simplification in its users, i.e., the connection. The read and write operations were in fact following very similar logic, but doing it in very different ways (e.g., the read operation had an `if(error)` after each call to the ringbuffer, and it would clean up and return early in each of them; the write operation instead had an `if(!error)` after each call, and would have a single cleanup at the end). This diff removes some now obsolete code (handling impossible errors, spinlocking, ...), unifies the logic, making the read and write paths highly symmetric. It also tries to catch errors as early as possible (e.g., if the only "expected" error is ENODATA or ENOSPC, it filters that but throws on anything else), so that the logs we get are more informative. Reviewed By: beauby Differential Revision: D23575375 fbshipit-source-id: d8e6ef50b8410e8f812a04ffc4fb74960ac45bf8
Configuration menu - View commit details
-
Copy full SHA for f2d9432 - Browse repository at this point
Copy the full SHA f2d9432View commit details -
Avoid overly-strict atomics in ringbuffer (#217)
Summary: Pull Request resolved: #217 Disclaimer: I thought this would give us some (maybe small) perf gain, but it doesn't seem like that. Probably because x86 has a strong memory consistency model, and thus all atomics are very strict all the time. We might still see perf gains on other platform (think ARM), but we're not targeting them yet, so this is not really relevant. Regardless of the absence of perf gains, I still think this diff makes the code cleaner, as it specifies exactly the requirements we need. We used atomics for the head/tail of the ringbuffers, which in their default behavior use a sequentially consistent memory model, which is the strongest possible one: all sequentially consistent accesses on any memory in the system are ordered in a single global total order. I expected this would cause some overhead (in the code, or in the processor). We don't need such a strong property though: we only need some ordering guarantees around the order of accesses within the ringbuffer, where we control all users. So we can express all the properties we need using the "weaker" acquire/release memory order. See here for an explanation of memory orders: https://en.cppreference.com/w/cpp/atomic/memory_order Reviewed By: beauby Differential Revision: D23573343 fbshipit-source-id: 808e80d98d34eaa22fab14979bdcdc0a06419f6b
Configuration menu - View commit details
-
Copy full SHA for 583f74e - Browse repository at this point
Copy the full SHA 583f74eView commit details
Commits on Sep 16, 2020
-
Use RAII wrapper for mmapped memory
Summary: We already had the shm::Segment behave as a RAII wrapper for its mmapped pointer, but it was also doing other things on top. Extracting the mmapped ptr to its own class I think helps readability and reusability. (And we plan to reuse the mmapped pointer in the InfiniBand transport). Note how now the Segment can get rid of its copy&move assignment/constructor and its destructor, because the default ones will do the job. This diff also introduces a "pattern" for RAII objects, consisting of a class that wraps a unique_ptr with a custom deleter, implementing the acquisition of the resource in the constructor of the class and the release in the deleter. This then gives us a correct RAII class by delegating most of the logic to the unique_ptr, and highlighting the important logic. If we like such a pattern, I plan to use it more later on (e.g., for the ibverbs objects). Reviewed By: beauby Differential Revision: D23679003 fbshipit-source-id: 17276ceedb0ee327a0cdbe9d0d6f48ce475d730f
Configuration menu - View commit details
-
Copy full SHA for f5ea0ca - Browse repository at this point
Copy the full SHA f5ea0caView commit details -
Clean up shared memory Segment
Summary: This diff caps off the previous stack by doing a few final changes: - move some private stuff from the segment's header to its .cc file - fix the usage of hugepages flags: we were OR-ing them into the wrong argument (when fixing this the code started failing because hugepages are weird, so for now they are entirely disabled) - fix the SFINAE in the segment's factory methods: use, as guard in the signature, just the check of whether the type is an array or not, and do all other checks as static_asserts in the methods. also remove redundant checks. Reviewed By: beauby Differential Revision: D23679002 fbshipit-source-id: 1e805d91e17f1afcdb5436522fd4bfe0f4aea925
Configuration menu - View commit details
-
Copy full SHA for 51a10c5 - Browse repository at this point
Copy the full SHA 51a10c5View commit details
Commits on Sep 18, 2020
-
Move channel types from
channel::Channel::
tochannel::
. (#210)Summary: Pull Request resolved: #210 This is a first step towards introducing templated `Channel`/`Context` classes for CPU/CUDA tensors. Differential Revision: D23598034 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 3936c56e093119f01250927aafd9074488e50030
Configuration menu - View commit details
-
Copy full SHA for 560dfe6 - Browse repository at this point
Copy the full SHA 560dfe6View commit details
Commits on Sep 19, 2020
-
Add TENSORPIPE_SUPPORTS_CUDA flag. (#211)
Summary: Pull Request resolved: #211 + Add a generic `TENSORPIPE_SUPPORTS_CUDA` config macro (controlled by the `TP_USE_CUDA` CMake flag) intended to indicate support for CUDA tensors throughout TensorPipe. + Move definition of config macros (`TENSORPIPE_HAS_XXX`) to a separate config.h header to avoid circular include dependencies (having xxx.h including tensorpipe.h for accessing config macros and tensorpipe.h including xxx.h for convenience). Differential Revision: D23598039 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 4b3a0c0b329240a10dcf3b88a4d43507cf36c423
Configuration menu - View commit details
-
Copy full SHA for cecf865 - Browse repository at this point
Copy the full SHA cecf865View commit details
Commits on Sep 21, 2020
-
Fix GCC 7 freaking out on a field initializer (#220)
Summary: Not sure why, but GCC 7 complains if we initialize length to zero. We don't really need to do that (length will only be read when the pointer is non-null, and in that case it's explicitly set) so this fixes the issue. Pull Request resolved: #220 Reviewed By: beauby Differential Revision: D23803439 Pulled By: lw fbshipit-source-id: 39f84b58e4e7ab755740660249e8b74cf462f705
Configuration menu - View commit details
-
Copy full SHA for 122efb5 - Browse repository at this point
Copy the full SHA 122efb5View commit details -
Fix CMake build when SHM disabled. (#221)
Summary: Now that `Socket`/`Fd` have been promoted to `common/`, their implementation should be compiled regardless of whether TensorPipe is compiled with the SHM transport, otherwise some linkers will complain about `virtual ~Fd()` being an undefined symbol (cf. https://app.circleci.com/pipelines/github/pytorch/pytorch/216673/workflows/8221802b-16fd-42b2-b801-c105ab39ace1/jobs/7624520). In order for the implementation of `Socket` to build on OSX (which does not offer the `SOCK_NONBLOCK` flag for `socket()`), we have to manually set the socket to non-blocking via `fnctl`. Pull Request resolved: #221 Reviewed By: lw Differential Revision: D23809663 Pulled By: beauby fbshipit-source-id: 479e20682432d78ba8683d38ffd8a0bf275ced5d
Configuration menu - View commit details
-
Copy full SHA for 49c0a09 - Browse repository at this point
Copy the full SHA 49c0a09View commit details -
Make Channel API accept buffer structs rather than raw pointers. (#45…
…014) Summary: Pull Request resolved: pytorch/pytorch#45014 Pull Request resolved: #219 Pull Request resolved: #212 + Introduce buffer.h defining the buffer struct(s). The `CpuBuffer` struct is always defined, while the `CudaBuffer` struct is defined only when `TENSORPIPE_SUPPORTS_CUDA` is true. + Update all channels to take a `CpuBuffer` or `CudaBuffer` for `send`/`recv` rather than a raw pointer and a length. + Make the base `Channel`/`Context` classes templated on `TBuffer`, effectively creating two channel hierarchies (one for CPU channels, one for CUDA channels). + Update the Pipe and the generic channel tests to use the new API. So far, generic channel tests are CPU only, and tests for the CUDA IPC channel are (temporarily) disabled. A subsequent PR will take care of refactoring tests so that generic tests work for CUDA channels. An other PR will add support for CUDA tensors in the Pipe. Differential Revision: D23598033 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: a18ca76e1453f2b5d091a2d0103362818eafa028
Configuration menu - View commit details
-
Copy full SHA for 9646e1a - Browse repository at this point
Copy the full SHA 9646e1aView commit details
There are no files selected for viewing
This file was deleted.
This file was deleted.
This file was deleted.
Large diffs are not rendered by default.