We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I run the following code:
from quivr_core import Brain brain = Brain.from_files(name = "my smart brain", file_paths = ["/root/workplace/try_use_quivr/qa_file/txtQA/Bible.pdf"], )
Return error: ValueError: can't initialize brain without documents
西游记.pdf Bible.pdf
The pdf file is small and the file format is simple
Maybe the issue label is not appropriate, if so it can be modified.
No response
The text was updated successfully, but these errors were encountered:
CORE-261 [Bug]: Brain.from_files Return error "ValueError: can't initialize brain without documents"(For pdf)
Sorry, something went wrong.
I am able to reproduce this issue on a fresh install on Python 3.11.6.
Tried passing in relative / absolute file paths.
I've encountered the same issue while trying to stuff a bunch of PDFs. Long-story short: MegaParse has dependencies not being installed.
A check script:
from megaparse.core.megaparse import MegaParse from langchain_openai import ChatOpenAI from megaparse.core.parser.unstructured_parser import UnstructuredParser parser = UnstructuredParser() megaparse = MegaParse(parser) response = megaparse.load("./test.pdf") print(response)
And if you get errors like:
Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/usr/local/nltk_data' - '/usr/local/share/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/local/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache' **********************************************************************
Or the same for averaged_perceptron_tagger_eng then you have to manually install those:
averaged_perceptron_tagger_eng
$ python >>> import nltk >>> nltk.download('punkt_tab') >>> nltk.download('averaged_perceptron_tagger_eng')
The automatic download is disabled because of the security issues.
Hope this helps.
No branches or pull requests
What happened?
I run the following code:
Return error: ValueError: can't initialize brain without documents
西游记.pdf
Bible.pdf
The pdf file is small and the file format is simple
Maybe the issue label is not appropriate, if so it can be modified.
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: