Skip to content

Commit 8104eed

Browse files
aravindavkAtin Mukherjee
authored and
Atin Mukherjee
committed
eventsapi: Events APIs for Gluster
Change-Id: I93a4362e99ce590df07feb2f092d6dd9222bfc31 Signed-off-by: Aravinda VK <[email protected]> Reviewed-on: https://review.gluster.org/13115 Tested-by: Atin Mukherjee <[email protected]> Reviewed-by: Atin Mukherjee <[email protected]>
1 parent f7276c1 commit 8104eed

File tree

1 file changed

+349
-0
lines changed

1 file changed

+349
-0
lines changed

under_review/eventsapi.md

+349
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
Feature
2+
-------
3+
Events APIs for Gluster
4+
5+
Summary
6+
-------
7+
Eventing framework will emit notification whenever Gluster Cluster
8+
state changes.
9+
10+
Owners
11+
------
12+
Aravinda VK <[email protected]>
13+
14+
15+
Detailed Description
16+
--------------------
17+
Let us imagine we have a Gluster monitoring system which displays
18+
list of volumes and its state, to show the realtime status, monitoring
19+
app need to query the Gluster in regular interval to check volume
20+
status, new volumes etc. Assume if the polling interval is 5 seconds
21+
then monitoring app has to run gluster volume info command ~17000
22+
times a day!
23+
24+
How about asking Gluster to send notification whenever something is
25+
changed?
26+
27+
28+
How To Test
29+
-----------
30+
Start the eventsdash.py using `python
31+
$SRC/events/tools/eventsdash.py`. Register this dashboard URL as
32+
webhook using `webhook-add` sub command.
33+
34+
gluster-eventsapi webhook-test http://<IP>:9000/listen
35+
36+
Where IP is hostname or IP of the node where eventsdash.py is
37+
running. This IP should be accessible from all Gluster nodes.
38+
39+
If Webhook Test is OK from all nodes, then
40+
41+
gluster-eventsapi webhook-add http://<IP>:9000/listen
42+
43+
eventsdash.py will show the Gluster Events.
44+
45+
User Experience
46+
---------------
47+
Run following command to start/stop Events API server in all Peers,
48+
which will collect the notifications from any Gluster daemon and emits
49+
to configured client.
50+
51+
gluster-eventsapi start|stop|restart|reload
52+
53+
Status of running services can be checked using,
54+
55+
gluster-eventsapi status
56+
57+
Gluster Events can be consumed using Websocket API or by using
58+
Webhooks.
59+
60+
#### Consuming events using Webhooks:
61+
62+
Events listener is a HTTP(S) server which listens to events emitted by
63+
the Gluster. Create a HTTP Server to listen on POST and register that
64+
URL using,
65+
66+
gluster-eventsapi webhook-add <URL> [--bearer-token <TOKEN>]
67+
68+
For example, if HTTP Server running in `http://192.168.122.188:9000`
69+
then add that URL using,
70+
71+
gluster-eventsapi webhook-add http://192.168.122.188:9000
72+
73+
If it expects a Token then specify it using `--bearer-token` or `-t`
74+
We can also test Webhook if all peer nodes can send message or not
75+
using,
76+
77+
gluster-eventsapi webhook-test <URL> [--bearer-token <TOKEN>]
78+
79+
Configurations can be viewed/updated using,
80+
81+
gluster-eventsapi config-get [--name]
82+
gluster-eventsapi config-set <NAME> <VALUE>
83+
gluster-eventsapi config-reset <NAME|all>
84+
85+
If any one peer node was down during config-set/reset or webhook
86+
modifications, Run sync command from good node when a peer node comes
87+
back. Automatic update is not yet implemented.
88+
89+
gluster-eventsapi sync
90+
91+
Eventing Client can be outside of the Cluster, it can be run even on
92+
Windows. But only requirement is the client URL should be accessible
93+
by all peer nodes.(Or tools like ngrok(https://ngrok.com) can be used)
94+
95+
#### Consuming events using Websocket API:
96+
97+
Websocket API will be part of Gluster REST Server, all the
98+
REST related configuration is applicable here.
99+
100+
ws://hostname/v1/events
101+
102+
> WebSocket is a protocol providing full-duplex communication channels
103+
> over a single TCP connection. The WebSocket protocol was
104+
> standardized by the IETF as RFC 6455 in 2011, and the WebSocket API
105+
> in Web IDL is being standardized by the W3C.(Ref:
106+
> [Wikipedia](https://en.wikipedia.org/wiki/WebSocket))
107+
108+
`hostname` can be any one of the node in Cluster. If that node goes
109+
down, application can connect to any other node in Cluster and start
110+
listening to the events.
111+
112+
from websocket import create_connection
113+
import time
114+
115+
ws = create_connection("ws://hostname/v1/events")
116+
117+
while True:
118+
ev = ws.recv()
119+
print "Received Event '{0}'".format(ev)
120+
time.sleep(1)
121+
122+
ws.close()
123+
124+
Register the application using `gluster-rest app-add` command.
125+
126+
Design of Gluster REST Server is discussed
127+
[here](http://review.gluster.org/13214)
128+
129+
Applications can persist the peers list(`GET /v1/peers`) of the
130+
Cluster. If a connected node goes down in Cluster, application can
131+
choose another node to connect and continue listening to the events.
132+
133+
Design
134+
------
135+
136+
#### Event Types
137+
Gluster events can be categorized into different types, for example
138+
139+
1. **User driven events** - When user executes any CLI command which
140+
changes the state of Cluster, Volume or Bricks. For example, Volume
141+
Create, Peer Attach, Volume Start, Snapshot Create, Geo-rep Create,
142+
Geo-rep start etc.
143+
2. **Local events** - The events which can be detected locally and related
144+
to local resources. For example, Geo-rep worker going to Faulty,
145+
Brick process going down etc.
146+
3. **Cluster events** or Events related to multiple nodes - The events
147+
which are not related to single node, but Cluster level
148+
information. Notification may sent out from multiple nodes without
149+
filtering.
150+
151+
Most of the User driven events are also Cluster events, but handled
152+
differently compared to other Cluster events. Notifications for user
153+
driven events will be sent only from node where command is run.
154+
155+
**Note:** All planned Gluster Events are listed in the end.
156+
157+
#### Recording the Events
158+
`gf_event`, new API will be introduced to send message to socket
159+
/var/run/gluster/events.sock. This new API will be available for
160+
C, Python and Go.
161+
162+
This can be called from any component of Gluster. gsyncd will use the
163+
python lib for sending message.
164+
165+
Example format of message(Format may change during implementation)
166+
167+
Volume.Create=gv1
168+
Volume.Start=gv1
169+
Volume.Set=gv1,changelog.changelog,on
170+
Georep.State.Faulty=...
171+
172+
Pseudo code:
173+
174+
gf_event(key, format, values..) ->
175+
if EVENTING_ENABLED{
176+
connect(EVENT_SOCKET) # AF_UNIX
177+
format_and_send(key, value)
178+
}
179+
180+
Example usage in Volume create(`cli/src/cli-cmd-volume.c` file)
181+
182+
#if (USE_EVENTS)
183+
// On successful Volume creation
184+
if (ret == 0) {
185+
gf_event (EVENT_VOLUME_CREATE, "name=%s", volname);
186+
}
187+
#endif
188+
189+
#### Agent - glustereventsd
190+
Agent listens to `events.sock` for any new events from Gluster
191+
processes, gathers additional information required and broadcasts to
192+
all peer nodes by sending HTTP POST to Gluster REST
193+
servers(/v1/listen). REST server will send that message to all
194+
connected applications.
195+
196+
#### Architecture
197+
198+
In each node of Gluster Cluster,
199+
200+
glusterd glusterfsd gsyncd cli ...
201+
| | | | |
202+
| | | | |
203+
(gf_event) (gf_event) (gf_event) (gf_event) (gf_event)
204+
| | | | |
205+
v v v v v
206+
+------------------------------------------------------------+
207+
| events.sock |
208+
| (format + broadcast) |
209+
| |
210+
+------------------------------------------------------------+
211+
212+
`glustereventsd` will be run in all the nodes of Cluster and sends
213+
events independently.(Except for Websocket use case)
214+
215+
Cluster view(Websockets API),
216+
217+
+-------------------+ +-------------------+
218+
| | | |
219+
| Node 1 | | Node 2 |
220+
| REST server<----------+------------>REST server |
221+
| | | | |
222+
| Agent | | | Agent |
223+
| | | | |
224+
+-------------------+ | +-------------------+
225+
|
226+
+-------------------+ | +-------------------+
227+
| | | | |
228+
| Node 3 | | | Node 4 |
229+
| REST server<-+ | +----------->REST server |
230+
| | | | | |
231+
| Agent -------+--------+ | Agent |
232+
| | | |
233+
+-------------------+ +-------------------+
234+
235+
Above figure shows event received in Node 3, which is broadcast to
236+
all the other nodes using REST call(/v1/listen)
237+
238+
Cluster view(Webhooks)
239+
240+
+-------------------+ +-------------------+
241+
| | | |
242+
| Node 1 | | Node 2 |
243+
| | | |
244+
| | | |
245+
| Agent | | Agent |
246+
| | | | +-----------+
247+
+-------------------+ +-------------------+ | Webhook |
248+
+--------------------------------->| |
249+
+-------------------+ | +-------------------+ +-----------+
250+
| | | | |
251+
| Node 3 | | | Node 4 |
252+
| | | | |
253+
| | | | |
254+
| Agent ----------------+ | Agent |
255+
| | | |
256+
+-------------------+ +-------------------+
257+
258+
Above figure shows event received in Node 3, which is sent directly to
259+
the configured Webhook. Each node can directly send events to
260+
configured webhooks.
261+
262+
#### List of Gluster Events
263+
##### User driven Events
264+
265+
1. Volume Create/Start/Stop/Set/Reset/Delete
266+
2. Peer Attach/Detach
267+
3. Bricks Add/Remove/Replace
268+
4. Volume Tier Attach/Detach
269+
5. Rebalance Start/Stop
270+
6. Quota Enable/Disable
271+
7. Self-heal Enable/Disable
272+
8. Geo-rep Create/Start/Config/Stop/Delete/Pause/Resume
273+
9. Bitrot Enable/Disable/Config
274+
10. Sharding Enable/Disable
275+
11. Snapshot Create/Clone/Restore/Config/Delete/Activate/Deactivate
276+
277+
##### Local Events
278+
279+
1. Change in Geo-rep Worker Status Active/Passive/Faulty
280+
2. Brick process Up/Down
281+
3. Socket disconnects
282+
4. Bitrot files
283+
5. Faulty Geo-replication process
284+
285+
##### Cluster Events
286+
287+
1. Quota crosses the limit
288+
2. Gluster Cluster quorum is lost
289+
3. Split brain state
290+
4. Self heal started/Ended
291+
5. Async Task completion(Rebalance, Remove-brick)
292+
6. Snapshot hard limit and soft limit crosses
293+
294+
Cluster events are not planned with `Glusterd 1.0` since we need
295+
distributed store to achieve that.
296+
297+
#### Future
298+
299+
1. Selectively Enable/disable eventing based on the Event keys, for
300+
example `gluster-eventsapi disable Volume.*`
301+
2. Filters for Events(Channels/tag)
302+
3. Integration with other projects like Skyring, Storaged etc.
303+
304+
Status
305+
------
306+
In Development
307+
308+
Current status
309+
--------------
310+
Design is in progress.
311+
312+
Related Feature Requests and Bugs
313+
---------------------------------
314+
None
315+
316+
Benefit to GlusterFS
317+
--------------------
318+
With Events API, we can monitor Gluster effectively.
319+
320+
Scope
321+
-----
322+
323+
#### Nature of proposed change
324+
- New service(`glustereventsd`) to listen to the messages sent from Gluster
325+
326+
#### Implications on manageability
327+
New service `glustereventsd`
328+
329+
#### Implications on presentation layer
330+
None
331+
332+
#### Implications on persistence layer
333+
None
334+
335+
#### Implications on 'GlusterFS' backend
336+
None
337+
338+
#### Modification to GlusterFS metadata
339+
None
340+
341+
#### Implications on 'glusterd'
342+
None
343+
344+
Dependencies
345+
------------
346+
None
347+
348+
Comments and Discussion
349+
-----------------------

0 commit comments

Comments
 (0)