There is a considerable overhead from calling a C function (~15ns). In the case of `Py_IncRef`/`Py_DecRef` we could modify the ref count directly from Julia, only calling `Py_DecRef` once the ref count is <2.