Skip to content

Commit

Permalink
Merge pull request #84 from Photoroom/blefaudeux-patch-1
Browse files Browse the repository at this point in the history
Update README.md - remove pip reference for now
  • Loading branch information
blefaudeux committed Mar 8, 2025
2 parents 133c3ae + 7bde6b3 commit 8d045dd
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 3 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Rust-py
name: Push to PyPI

on:
push:
Expand Down
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# datago

[![Rust](https://github.com/Photoroom/datago/actions/workflows/rust.yml/badge.svg)](https://github.com/Photoroom/datago/actions/workflows/rust.yml)
[![Rust-py](https://github.com/Photoroom/datago/actions/workflows/rust-py.yml/badge.svg)](https://github.com/Photoroom/datago/actions/workflows/rust-py.yml)
[![Rust-py](https://github.com/Photoroom/datago/actions/workflows/rust-py.yml/badge.svg)](https://github.com/Photoroom/datago/actions/workflows/ci-cd.yml)

A Rust-written data loader which can be used from Python. Compatible with a [soon-to-be open sourced](https://github.com/Photoroom/dataroom) VectorDB-enabled data stack, which exposes HTTP requests, and with a local filesystem, more front-ends are possible. Focused on image data at the moment, could also easily be more generic.

Expand All @@ -21,7 +21,8 @@ Depending on the front ends, datago can be rank and world-size aware, in which c

<details> <summary><strong>Use it</strong></summary>

Using Python 3.11, you can simply install datago with `pip install datago`
~Using Python 3.11, you can simply install datago with `pip install datago`~
See https://github.com/Photoroom/datago/issues/83, needs fixing

## Use the package from Python

Expand All @@ -48,6 +49,30 @@ for _ in range(10):
```

Please note that the image buffers will be passed around as raw pointers, see below.
To test datago while serving local files (jpg, png, ..), code would look like the following

```python
from datago import DatagoClient
import os
import json

config = {
"source_type": "file",
"source_config": {
"root_path": "myPath",
},
"limit": 200,
"rank": 0,
"world_size": 1,
"samples_buffer_size": 32,
}

client = DatagoClient(json.dumps(config))

for _ in range(10):
sample = client.get_sample()
```


## Match the raw exported buffers with typical python types

Expand Down

0 comments on commit 8d045dd

Please sign in to comment.