Description
-
asyncpg version: 0.18.3
-
PostgreSQL version: 9.6.8
-
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: AWS Aurora Postgres. Tried connecting to both writer and read replica. Did not try local postgres. -
Python version: 3.7.3
-
Platform: macOS 10.14.5
-
Do you use pgbouncer?: No
-
Did you install asyncpg with pip?: Yes
-
If you built asyncpg locally, which version of Cython did you use?: N/A
-
Can the issue be reproduced under both asyncio and
uvloop?: Did not try.
I'm running into an issue with using cursors on a very large postgres table. The table has approximately 60 million rows. The table has only 8 columns and each row is very small. I'm trying to use a server side cursor to stream the results in (ordered) but it just completely freezes while establishing the con.cursor
. This is my code:
import asyncio
import asyncpg
dsn = 'postgresql://...:[email protected]/...'
async def run():
con = await asyncpg.connect(dsn)
print(con)
async with con.transaction():
async for record in con.cursor('SELECT * FROM pageviews ORDER BY user_id', prefetch=10, timeout=2):
print(record)
await con.close()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
I tried using both the iterable cursor (with explicit prefetch) and the regular cursor and neither worked. The following psycopg2 code using a "named cursor" (server-side cursor) works properly:
import datetime
import psycopg2
from psycopg2.extras import RealDictCursor
cursor_name = datetime.datetime.now().isoformat()
dsn = 'postgresql://...:[email protected]/...'
conn = psycopg2.connect(dsn, cursor_factory=RealDictCursor)
with conn.cursor(cursor_name) as cursor:
sql = "SELECT * FROM pageviews ORDER BY user_id;"
cursor.execute(sql)
for i, row in enumerate(cursor):
print(i, row)
I need to stream the entire ordered table and perform a streaming groupby and then run some computation on each grouped object. I'm looking to asyncpg to decrease the IO time.