-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ids and docs added to db, but embeddings only appear after restart of Jupyter kernel #3769
Comments
@jzclever, if I understand your problem correctly, you are trying to read/update data in Chroma from two different processes - a notebook and a python script (e.g. Just as an experiment try running Chroma as a server and use |
Thanks @tazarov, launching a server and then connecting via Insofar as the behavior of the I only have a single directory to which I am connecting to from both the script and notebook (./chroma). So instantiating a This is corroborated by the fact that script-based writes to the persistent dir do result (most of the time) in the notebook being able to see the newly added As for why it works in Colab — I guess the suggestion is that its because a colab session is really all happening within a single event loop that is running in the background? |
Sorry if I cause confusion. What I meant is that users should not expect consistent results if they attempt to access the same persistent dir from two Chroma instances. Here's a diagram to illustrate what each process (assuming both start at the same time) will see:
Colab runs in a single process. Accessing the same persistent dir even if you create a new |
What happened?
Running with Python 3.13.0 on an M3 Macbook
Execute cell 1 of notebook
_t1.ipynb
running in VS Code:From a separate terminal, execute the script
_t2.py
Confirm the expected output in the console:
Run cell 2 of
_t1.ipynb
:Confirm that the
ids
anddocuments
are there, butembeddings
are notNot only is this buggy, but it is also inconsistent.
On a separate trial, I managed to (somehow) get one embedding properly stored, even though
t.get()
was showing 5 total documents (so 4 were still missing).The really baffling part came with
t.query()
warning that I only had 4 existing elements, despite the request forn_results=5
.This behavior occurs for image-based collections as well.
The behavior also occurs with
collection.add
(it is not specific tocollection.upsert
, as with my example).The behavior also occurs if I create the collection for the first time in the notebook cell, as opposed to in the script (as with my current example).
If I restart the kernel and run the notebook cells again, I get the expected output:
I have also confirmed that this issue does not exist in Google Colab.
I can run the first cell (to instantiate the persistent client), then write my
_t2.py
to the local Colab file system and call it with%run _t2.py
, and then verify the existence of the newly inserted content (with embeddings) by running the second cell of the still-active notebook.Versions
Macbook M3
MacOS 15.3.1
Python 3.13.0
VS Code 1.97.0
VS Code Jupyter Extension 2025.1.0
Chroma 0.6.3
Relevant log output
The text was updated successfully, but these errors were encountered: