Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit cb954e1

Browse files
committedJan 30, 2018
initial commit
1 parent c5ec002 commit cb954e1

File tree

2,446 files changed

+728897
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,446 files changed

+728897
-0
lines changed
 

‎README.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# ElasticIntel
2+
3+
Serverless, low cost, threat intel aggregation for enterprise or personal use, backed by ElasticSearch.
4+
5+
6+
##About
7+
An alternative to expensive threat intel aggregation platforms which ingest the same data feeds you could get for free.
8+
9+
ElasticIntel is designed to provide a central, scalable and easily queryable repository for
10+
threat intelligence of all types.
11+
12+
Utilizes amazon services to allow for minimal support needs while maintaining scalability and
13+
resilience and performance. (aws lambda, elasticsearch, s3, sns)
14+
15+
## Disclaimer.
16+
17+
**Currently documentation for this project is lacking due to time constraints. This is actively
18+
being fixed and should be much more verbose in a few days. Please check back
19+
soon if you're not ready to jump in blind :)**
20+
21+
22+
#### Features
23+
* Serverless - No maintenance required
24+
* Scalable (all services scale via AWS)
25+
* High performance API - API can be used to run extremely high volume queries
26+
* Flexible - Feeds can be added via simple json feed configuration
27+
* Extensible - written in python and extended by new modules
28+
* Cost-effective - Pay only for the backend services - don't worry about API limits
29+
* Automated Deployment - platform can be deployed from a single command
30+
* Works "out of the box" - comes pre-configured with 30+ opensource intel feeds
31+
32+
33+
###Why ElasticIntel
34+
35+
ElasticIntel is the answer to a frustration which arose when evaluating various paid threat intel products and feeds.
36+
After reviewing the data from several of these services, I found that 90% of the data they were returning was data
37+
from publicly (and freely) available sources, simply aggregated into one place.
38+
39+
Even more frustrating, was the fact that nearly all of them wanted to charge insane amounts for API access to this ame data,
40+
which was limited by volume and made it nearly impossible to query the data in any significant volume without
41+
paying even more.
42+
43+
### Enrichments
44+
45+
* Whois enrichment
46+
* Shodan
47+
48+
49+
##Architecture
50+
51+
1. Feed Scheduler lambda - The feed scheduler lambda runs once an hour, just like a cron job. It downloads
52+
the configurations for all feeds, checks their scheduled download times and puts a download job
53+
into an sns queue a feed needs to be downloaded
54+
55+
2. Ingest Feed Lambda - The ingest lambda is triggered by messages arriving to an sns topic. When a message arrives,
56+
the ingest lamda reads the message, parses out the information about the intel feed and downloads the feed itself. Once
57+
downloaded, the ingest lambda stores a copy of the feed in s3 and then parses out the data in the feed. Once
58+
the data is parsed, the ingest lambda puts the data into the intel index in elasticsearch for easy querying.
59+
60+
* intel objects define in set of values (json)
61+
* intel feed objects define the feed itself (url, type(xml, csv, json), schedule)
62+
* intel feeds may be easily added simply by defining a new feed configuration in the feeds
63+
directory.
64+
* for API-based intel feeds, modules may be easily added in the form of python scripts and
65+
imported into the main feed manager
66+
### feed ingestion is done via a series of lambdas
67+
* Feed scheduler:
68+
* the scheduler lambda runs once an hours, reads the various config files and determines if
69+
any feeds need to be pulled in
70+
* If a feed is determined to need refreshing, the scheduler lambda launches a new lambda
71+
to pull down that feed
72+
73+
###feed ingestion
74+
75+
feeds are ingested through the ingestfeed lambda function.
76+
this function is passed a event containing a feed dictionary, as well as the ES index where the indicators
77+
from the feed will be stored.
78+
79+
This function then reads the feed dictionary, downloads the appropriate data from the feed url, saves that data to
80+
an s3 bucket as a timestamped file, parses that
81+
data into intel objects and finally indexes the feed data in teh specified ES index
82+
83+
84+
###Elasticsearch
85+
86+
It is important to note that intel is not unique. Each feed is queried daily and some intel
87+
may appear in a feed across multiple days. This is by designed, to allow a history view of indicators.
88+
89+
However, this may not be your default expected behavior when querying against the data, so it is
90+
important to realize that the number of times an indicator shows up may not be indicative
91+
of a high threat score.
92+
93+
##### setup
94+
95+
96+
97+
98+
## Requirements note
99+
100+
if pip3 fails on crypto install, make sure libssl-dev is installed (sudo apt install libssl-dev)
101+
102+
103+
## Issues
104+
105+
* Elasticsearch, while extremely powerful in its query language, has a very high barrier to entry. For actively slicing and dicing
106+
data, piping or copying data to splunk may yield more maleable data.
107+
108+
* Queries are best written in the developer tools section of kibana
109+
*
110+
111+
## Recommended Reading:
112+
Aws elasticsearch service: http://docs.aws.amazon.com/elasticsearch-service/latest
113+
114+
#### understanding elasticsearch upgrades
115+
aws elasticsearch service takes a large amount of hassle out of running your own elasticsearch cluster
116+
however, it is important to note that because of this abstraction, the variables that
117+
need to be managed by the end user are still important decisions
118+
* Dedicate master
119+
* Dedicated master's are used to control all the operational chores of running
120+
and elasticsearch cluster. They do not hold data, but manage indices, shards, etc. The project ships with
121+
some relatively sane defaults and should be plenty to get you off the ground
122+
and collecting intel. However, as usage and data size grow, it is important to make sure the dedicate master size and count of
123+
dedicated masters also gets increased. This is a manual process and must be managed by changing variables in
124+
the terraform scripts. Further reading: http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html
125+
126+
* Upgrading or modifying the elasticsearch service domain.
127+
* when modifying or changing an elasticsearch domain, a new custer is spun up, data is copied over and
128+
then the old cluster is shut down. in doing this, you will incur charges for running both clusters
129+
for an hour. After the data is copied over, the old cluster is shut down
130+
and you are charged only for the newly running cluster.
131+
132+
* Multi-zone awareness
133+
* Default ships with this disabled. For production it is recommend this be turned on true.
134+
* note: enableding multi-zone awareness requires an even number of instances and master nodes.
135+
136+
137+
* Migrating/Upgrading to a new version: see http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-version-migration.html
138+
139+
* Sizing ElasticSearch Domains: https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html
140+
141+

‎docs/images/Lambdabot 1.jpg

576 KB
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.