Determine public opinion on net neutrality issue via sourcing and sentiment analysis of FCC comments.
Based on, plus more from Ragtag volunteers
Uses Elasticsearch to get data from FCC's public API
from regex patternsanalysis.sentiment_sig_terms_ordered
from significant terms- query by source AWS Lambda
Make sure you have python3
Set up a local Elasticsearch server:
Set up fcc-comment-analysis
$ cd server
$ pip install -e .
$ python test
Create the index with mappings:
$ fcc create
Fetch and index some data from the FCC API:
$ fcc index --endpoint=http://localhost:9200/ -g 2017-06-01
Analyze data and add to analysis
section of documents:
$ fcc analyze --endpoint=http://localhost:9200/ --limit 40000
Set up a local Kibana server:
Play in Kibana:
- go to http://localhost:5601
- go to Management / Configure an index pattern
- Index name or pattern:
- Index contains time-based events
- Time-field name:
- Create
- Index name or pattern:
Set up cloud-hosted Elasticsearch:
- read-only user for queries
- read-write user for ingest and analyze
- get ES_URL like https://user:password@hostname:port
Load current dataset into index:
- create index:
fcc create --endpoint=$ES_URL
- fetch comments from FCC and add to index:
fcc index --endpoint=$ES_URL -g 2017-05-01
(restart as needed if/when API times out) - run static analyzers, 100k at a time:
fcc analyze --endpoint=$ES_URL --limit 100000
(repeat until all docs have analyzed:curl '$ES_URL/_count?pretty' -H 'Content-Type: application/json' -d'{"query":{"bool":{"must_not":{"exists":{"field":"analysis"}}}}}')
Create AWS Lambda to refresh data:
Create AWS Lambda to proxy Elasticsearch queries:
cd server/fcc_analysis
zip -r ../ . --exclude experiments/* --exclude *.csv --exclude *.txt
cd $VIRTUAL_ENV/lib/python3.6/site-packages
zip -r path/to/server/ .
- upload to AWS; set handler to
Create AWS API Gateway to proxy Lambda function