Skip to content

Degraded performance on large data set manipulation application #800

Open
@titouanfreville

Description

@titouanfreville

Hello,
I use dependency injector as a bases for my projects in python for some time now and meet an unexpected issue recently.

I am currently building a data analysis software aiming to analyse large set of data (~3Go of data for 20 millions rows) and the process takes an unexpectedly long time to run and has a larger resource consumption.

As a basis, just getting the data take ~3 minutes without injecting dependencies while its not done after 20 minutes using it.

I am mainly using singleton containers and create the base project using wire system.

The test ran on python 12 under Microsoft dev container: mcr.microsoft.com/vscode/devcontainers/python:1-3.12, and a windows server running python 12 (don't have exact version but it can be asked if needed).

I processed the data using SQLAlchemy with pyodbc driver + pandas readsql methods.

I cannot provide the dataset I'm using as its private to the company I'm working.

The application is wrapped behind a Typer client application using async method (though parallelization is not correctly done yet as I'm new to it 😇 )

Any feed back on this or idea is welcome as I don't really see why using DI could impact the code so much on this case.

Thanks for your work and time. <3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions