cannot stream cursor on very large table

* **asyncpg version**: 0.18.3
* **PostgreSQL version**: 9.6.8

* **Do you use a PostgreSQL SaaS?  If so, which?  Can you reproduce
  the issue with a local PostgreSQL install?**: AWS Aurora Postgres. Tried connecting to both writer and read replica. Did not try local postgres.
* **Python version**: 3.7.3
* **Platform**: macOS 10.14.5
* **Do you use pgbouncer?**: No
* **Did you install asyncpg with pip?**: Yes
* **If you built asyncpg locally, which version of Cython did you use?**: N/A
* **Can the issue be reproduced under both asyncio and
  [uvloop](https://github.com/magicstack/uvloop)?**: Did not try.



I'm running into an issue with using cursors on a very large postgres table. The table has approximately 60 million rows. The table has only 8 columns and each row is very small. I'm trying to use a server side cursor to stream the results in (ordered) but it just completely freezes while establishing the `con.cursor`. This is my code:

```python
import asyncio
import asyncpg

dsn = 'postgresql://...:...@....us-east-1.rds.amazonaws.com/...'

async def run():
    con = await asyncpg.connect(dsn)
    print(con)
    async with con.transaction():
        async for record in con.cursor('SELECT * FROM pageviews ORDER BY user_id', prefetch=10, timeout=2):
            print(record)
    await con.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(run())
```

I tried using both the iterable cursor (with explicit prefetch) and the regular cursor and neither worked. The following psycopg2 code using a "named cursor" (server-side cursor) works properly:

```python
import datetime
import psycopg2
from psycopg2.extras import RealDictCursor

cursor_name = datetime.datetime.now().isoformat()

dsn = 'postgresql://...:...@....us-east-1.rds.amazonaws.com/...'
conn = psycopg2.connect(dsn, cursor_factory=RealDictCursor)

with conn.cursor(cursor_name) as cursor:
    sql = "SELECT * FROM pageviews ORDER BY user_id;"
    cursor.execute(sql)
    for i, row in enumerate(cursor):
        print(i, row)
```

I need to stream the entire ordered table and perform a streaming groupby and then run some computation on each grouped object. I'm looking to asyncpg to decrease the IO time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cannot stream cursor on very large table #462

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cannot stream cursor on very large table #462

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions