A repository dedicated to the development of python programs dealing with data, both big and small. This is also where different databases will be tested to see which is more effective/efficient.
A python program, ran at the command line, built to analyze data mined from Twitter.
You will need to install tweepy to be able to run both versions.
If you wish to use the RethinkDB version, you must install RethinkDB.
- Follow all of the instructions!
- That includes setting up your RethinkDB server.
How to use: Applies to both versions
- Configure
TweetCollector.py
to gather the data you want.- Type in the name of your database, and table.
- Type in the hashtags you wish to track(SQLite version only.).
- If using the CSV method, insert the name of your .csv doc(In the RethinkDB version the csv reader is located in
TweetSifter.py
).
- Run
TweetMiner.py
, but be sure to only run the methods you need.- Unless you run the RethinkDB version, then
TweetMiner.py
is set up to allow you to choose what you want. - Also, if using the RethinkDB version, you will be prompted to type in the keywords you wish to track and the max number of tweets you would like to collect.
- Unless you run the RethinkDB version, then
- Import
TweetWrite.py
into a new script and set an instance ofDataFilter()
to analyze and save the data gathered. (Never runTweetWrite.py
andTweetMiner.py
at the same time)- After an instance of
DataFilter()
is set you can call upon the methods needed to analyze your data as well as store the results.
- After an instance of
If you would like to know more about the TweetMiner project, please refer to the wiki pages. If you wish to contribute, please, feel free to contact me.
In order to run tests for the first time you must install the module in edit mode one time.
pip install -e .
Once this is complete you can run the tests with pytest
python -m pytest tests
Test coverge uses the pytest-cov module and can be run using the following command
python -m pytest --cov-report term-missing --cov=data_py