Skip to content

set_type_codec encoder to handle numpy/pandas null types not working with copy_records_to_table #693

Open
@wvolkov

Description

@wvolkov
  • asyncpg version:0.21.0
  • PostgreSQL version: PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    :
  • Python version: 3.7
  • Platform: Ubuntu 18.04
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: yes
  • If you built asyncpg locally, which version of Cython did you use?: -
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : -

I am trying to set up an asyncpg in order to use copy_records_to_table method passing pandas data frame, converted to list of lists.

My table is simple:

create table sg.test("no" int4, dt timestmap)

As pandas/numpy have their own Null types for different types (pd.NaT for dates and np.nan for numerics) I tried to set up an appropriate encoder:

import asyncio
import asyncpg
import pandas as pd
import numpy as np


data = [[1, np.datetime64('NaT')]]
df = pd.DataFrame(data=data, columns=["no", "dt"])


def encoder(val):
    if isinstance(val, (type(pd.NaT), type(np.nan))):
        return None
    return bytes(val)


def decoder(val): # we do not need that for this task
    return val


async def main(df: pd.DataFrame):

    conn: asyncpg.Connection=await asyncpg.connect('postgresql://connection_string')

    try:
        await conn.set_type_codec(
            'timestamp',
            encoder=encoder,
            decoder=decoder,
            schema='pg_catalog',
            format='binary'
        )
        res=await conn.copy_records_to_table('test', records=df.values.tolist(), columns=df.columns.values.tolist(), schema_name='sg')
        print(res)
    except ValueError as e:
        print(e.args)
    finally:
        await conn.close()

asyncio.get_event_loop().run_until_complete(main(df))

I am getting following error:

....
  File "asyncpg/protocol/protocol.pyx", line 482, in copy_in
  File "asyncpg/protocol/protocol.pyx", line 429, in asyncpg.protocol.protocol.BaseProtocol.copy_in
  File "asyncpg/protocol/codecs/base.pyx", line 192, in asyncpg.protocol.protocol.Codec.encode
  File "asyncpg/protocol/codecs/base.pyx", line 178, in asyncpg.protocol.protocol.Codec.encode_in_python
  File "asyncpg/pgproto/./codecs/bytea.pyx", line 19, in asyncpg.pgproto.pgproto.bytea_encode
TypeError: a bytes-like object is required, not 'NoneType'

For me it is unclear what kind of bytes should encoder return in case of Null value, I have tried to investigate it backwards: select a null value from original table with custom decoder, but it seems that asyncpg just amends that method in case of null values at all. So it is still unclear what kind of data should be passed.

I have already tried to return following values at encoder:

  • b''
  • b'\x01'
    another error occurs:
insufficient data left in message

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions