|
| 1 | +Feature |
| 2 | +------- |
| 3 | +Events APIs for Gluster |
| 4 | + |
| 5 | +Summary |
| 6 | +------- |
| 7 | +Eventing framework will emit notification whenever Gluster Cluster |
| 8 | +state changes. |
| 9 | + |
| 10 | +Owners |
| 11 | +------ |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +Detailed Description |
| 16 | +-------------------- |
| 17 | +Let us imagine we have a Gluster monitoring system which displays |
| 18 | +list of volumes and its state, to show the realtime status, monitoring |
| 19 | +app need to query the Gluster in regular interval to check volume |
| 20 | +status, new volumes etc. Assume if the polling interval is 5 seconds |
| 21 | +then monitoring app has to run gluster volume info command ~17000 |
| 22 | +times a day! |
| 23 | + |
| 24 | +How about asking Gluster to send notification whenever something is |
| 25 | +changed? |
| 26 | + |
| 27 | + |
| 28 | +How To Test |
| 29 | +----------- |
| 30 | +Start the eventsdash.py using `python |
| 31 | +$SRC/events/tools/eventsdash.py`. Register this dashboard URL as |
| 32 | +webhook using `webhook-add` sub command. |
| 33 | + |
| 34 | + gluster-eventsapi webhook-test http://<IP>:9000/listen |
| 35 | + |
| 36 | +Where IP is hostname or IP of the node where eventsdash.py is |
| 37 | +running. This IP should be accessible from all Gluster nodes. |
| 38 | + |
| 39 | +If Webhook Test is OK from all nodes, then |
| 40 | + |
| 41 | + gluster-eventsapi webhook-add http://<IP>:9000/listen |
| 42 | + |
| 43 | +eventsdash.py will show the Gluster Events. |
| 44 | + |
| 45 | +User Experience |
| 46 | +--------------- |
| 47 | +Run following command to start/stop Events API server in all Peers, |
| 48 | +which will collect the notifications from any Gluster daemon and emits |
| 49 | +to configured client. |
| 50 | + |
| 51 | + gluster-eventsapi start|stop|restart|reload |
| 52 | + |
| 53 | +Status of running services can be checked using, |
| 54 | + |
| 55 | + gluster-eventsapi status |
| 56 | + |
| 57 | +Gluster Events can be consumed using Websocket API or by using |
| 58 | +Webhooks. |
| 59 | + |
| 60 | +#### Consuming events using Webhooks: |
| 61 | + |
| 62 | +Events listener is a HTTP(S) server which listens to events emitted by |
| 63 | +the Gluster. Create a HTTP Server to listen on POST and register that |
| 64 | +URL using, |
| 65 | + |
| 66 | + gluster-eventsapi webhook-add <URL> [--bearer-token <TOKEN>] |
| 67 | + |
| 68 | +For example, if HTTP Server running in `http://192.168.122.188:9000` |
| 69 | +then add that URL using, |
| 70 | + |
| 71 | + gluster-eventsapi webhook-add http://192.168.122.188:9000 |
| 72 | + |
| 73 | +If it expects a Token then specify it using `--bearer-token` or `-t` |
| 74 | +We can also test Webhook if all peer nodes can send message or not |
| 75 | +using, |
| 76 | + |
| 77 | + gluster-eventsapi webhook-test <URL> [--bearer-token <TOKEN>] |
| 78 | + |
| 79 | +Configurations can be viewed/updated using, |
| 80 | + |
| 81 | + gluster-eventsapi config-get [--name] |
| 82 | + gluster-eventsapi config-set <NAME> <VALUE> |
| 83 | + gluster-eventsapi config-reset <NAME|all> |
| 84 | + |
| 85 | +If any one peer node was down during config-set/reset or webhook |
| 86 | +modifications, Run sync command from good node when a peer node comes |
| 87 | +back. Automatic update is not yet implemented. |
| 88 | + |
| 89 | + gluster-eventsapi sync |
| 90 | + |
| 91 | +Eventing Client can be outside of the Cluster, it can be run even on |
| 92 | +Windows. But only requirement is the client URL should be accessible |
| 93 | +by all peer nodes.(Or tools like ngrok(https://ngrok.com) can be used) |
| 94 | + |
| 95 | +#### Consuming events using Websocket API: |
| 96 | + |
| 97 | +Websocket API will be part of Gluster REST Server, all the |
| 98 | +REST related configuration is applicable here. |
| 99 | + |
| 100 | + ws://hostname/v1/events |
| 101 | + |
| 102 | +> WebSocket is a protocol providing full-duplex communication channels |
| 103 | +> over a single TCP connection. The WebSocket protocol was |
| 104 | +> standardized by the IETF as RFC 6455 in 2011, and the WebSocket API |
| 105 | +> in Web IDL is being standardized by the W3C.(Ref: |
| 106 | +> [Wikipedia](https://en.wikipedia.org/wiki/WebSocket)) |
| 107 | +
|
| 108 | +`hostname` can be any one of the node in Cluster. If that node goes |
| 109 | +down, application can connect to any other node in Cluster and start |
| 110 | +listening to the events. |
| 111 | + |
| 112 | + from websocket import create_connection |
| 113 | + import time |
| 114 | + |
| 115 | + ws = create_connection("ws://hostname/v1/events") |
| 116 | + |
| 117 | + while True: |
| 118 | + ev = ws.recv() |
| 119 | + print "Received Event '{0}'".format(ev) |
| 120 | + time.sleep(1) |
| 121 | + |
| 122 | + ws.close() |
| 123 | + |
| 124 | +Register the application using `gluster-rest app-add` command. |
| 125 | + |
| 126 | +Design of Gluster REST Server is discussed |
| 127 | +[here](http://review.gluster.org/13214) |
| 128 | + |
| 129 | +Applications can persist the peers list(`GET /v1/peers`) of the |
| 130 | +Cluster. If a connected node goes down in Cluster, application can |
| 131 | +choose another node to connect and continue listening to the events. |
| 132 | + |
| 133 | +Design |
| 134 | +------ |
| 135 | + |
| 136 | +#### Event Types |
| 137 | +Gluster events can be categorized into different types, for example |
| 138 | + |
| 139 | +1. **User driven events** - When user executes any CLI command which |
| 140 | + changes the state of Cluster, Volume or Bricks. For example, Volume |
| 141 | + Create, Peer Attach, Volume Start, Snapshot Create, Geo-rep Create, |
| 142 | + Geo-rep start etc. |
| 143 | +2. **Local events** - The events which can be detected locally and related |
| 144 | + to local resources. For example, Geo-rep worker going to Faulty, |
| 145 | + Brick process going down etc. |
| 146 | +3. **Cluster events** or Events related to multiple nodes - The events |
| 147 | + which are not related to single node, but Cluster level |
| 148 | + information. Notification may sent out from multiple nodes without |
| 149 | + filtering. |
| 150 | + |
| 151 | +Most of the User driven events are also Cluster events, but handled |
| 152 | +differently compared to other Cluster events. Notifications for user |
| 153 | +driven events will be sent only from node where command is run. |
| 154 | + |
| 155 | +**Note:** All planned Gluster Events are listed in the end. |
| 156 | + |
| 157 | +#### Recording the Events |
| 158 | +`gf_event`, new API will be introduced to send message to socket |
| 159 | +/var/run/gluster/events.sock. This new API will be available for |
| 160 | +C, Python and Go. |
| 161 | + |
| 162 | +This can be called from any component of Gluster. gsyncd will use the |
| 163 | +python lib for sending message. |
| 164 | + |
| 165 | +Example format of message(Format may change during implementation) |
| 166 | + |
| 167 | + Volume.Create=gv1 |
| 168 | + Volume.Start=gv1 |
| 169 | + Volume.Set=gv1,changelog.changelog,on |
| 170 | + Georep.State.Faulty=... |
| 171 | + |
| 172 | +Pseudo code: |
| 173 | + |
| 174 | + gf_event(key, format, values..) -> |
| 175 | + if EVENTING_ENABLED{ |
| 176 | + connect(EVENT_SOCKET) # AF_UNIX |
| 177 | + format_and_send(key, value) |
| 178 | + } |
| 179 | + |
| 180 | +Example usage in Volume create(`cli/src/cli-cmd-volume.c` file) |
| 181 | + |
| 182 | + #if (USE_EVENTS) |
| 183 | + // On successful Volume creation |
| 184 | + if (ret == 0) { |
| 185 | + gf_event (EVENT_VOLUME_CREATE, "name=%s", volname); |
| 186 | + } |
| 187 | + #endif |
| 188 | + |
| 189 | +#### Agent - glustereventsd |
| 190 | +Agent listens to `events.sock` for any new events from Gluster |
| 191 | +processes, gathers additional information required and broadcasts to |
| 192 | +all peer nodes by sending HTTP POST to Gluster REST |
| 193 | +servers(/v1/listen). REST server will send that message to all |
| 194 | +connected applications. |
| 195 | + |
| 196 | +#### Architecture |
| 197 | + |
| 198 | +In each node of Gluster Cluster, |
| 199 | + |
| 200 | + glusterd glusterfsd gsyncd cli ... |
| 201 | + | | | | | |
| 202 | + | | | | | |
| 203 | + (gf_event) (gf_event) (gf_event) (gf_event) (gf_event) |
| 204 | + | | | | | |
| 205 | + v v v v v |
| 206 | + +------------------------------------------------------------+ |
| 207 | + | events.sock | |
| 208 | + | (format + broadcast) | |
| 209 | + | | |
| 210 | + +------------------------------------------------------------+ |
| 211 | + |
| 212 | +`glustereventsd` will be run in all the nodes of Cluster and sends |
| 213 | +events independently.(Except for Websocket use case) |
| 214 | + |
| 215 | +Cluster view(Websockets API), |
| 216 | + |
| 217 | + +-------------------+ +-------------------+ |
| 218 | + | | | | |
| 219 | + | Node 1 | | Node 2 | |
| 220 | + | REST server<----------+------------>REST server | |
| 221 | + | | | | | |
| 222 | + | Agent | | | Agent | |
| 223 | + | | | | | |
| 224 | + +-------------------+ | +-------------------+ |
| 225 | + | |
| 226 | + +-------------------+ | +-------------------+ |
| 227 | + | | | | | |
| 228 | + | Node 3 | | | Node 4 | |
| 229 | + | REST server<-+ | +----------->REST server | |
| 230 | + | | | | | | |
| 231 | + | Agent -------+--------+ | Agent | |
| 232 | + | | | | |
| 233 | + +-------------------+ +-------------------+ |
| 234 | + |
| 235 | +Above figure shows event received in Node 3, which is broadcast to |
| 236 | +all the other nodes using REST call(/v1/listen) |
| 237 | + |
| 238 | +Cluster view(Webhooks) |
| 239 | + |
| 240 | + +-------------------+ +-------------------+ |
| 241 | + | | | | |
| 242 | + | Node 1 | | Node 2 | |
| 243 | + | | | | |
| 244 | + | | | | |
| 245 | + | Agent | | Agent | |
| 246 | + | | | | +-----------+ |
| 247 | + +-------------------+ +-------------------+ | Webhook | |
| 248 | + +--------------------------------->| | |
| 249 | + +-------------------+ | +-------------------+ +-----------+ |
| 250 | + | | | | | |
| 251 | + | Node 3 | | | Node 4 | |
| 252 | + | | | | | |
| 253 | + | | | | | |
| 254 | + | Agent ----------------+ | Agent | |
| 255 | + | | | | |
| 256 | + +-------------------+ +-------------------+ |
| 257 | + |
| 258 | +Above figure shows event received in Node 3, which is sent directly to |
| 259 | +the configured Webhook. Each node can directly send events to |
| 260 | +configured webhooks. |
| 261 | + |
| 262 | +#### List of Gluster Events |
| 263 | +##### User driven Events |
| 264 | + |
| 265 | +1. Volume Create/Start/Stop/Set/Reset/Delete |
| 266 | +2. Peer Attach/Detach |
| 267 | +3. Bricks Add/Remove/Replace |
| 268 | +4. Volume Tier Attach/Detach |
| 269 | +5. Rebalance Start/Stop |
| 270 | +6. Quota Enable/Disable |
| 271 | +7. Self-heal Enable/Disable |
| 272 | +8. Geo-rep Create/Start/Config/Stop/Delete/Pause/Resume |
| 273 | +9. Bitrot Enable/Disable/Config |
| 274 | +10. Sharding Enable/Disable |
| 275 | +11. Snapshot Create/Clone/Restore/Config/Delete/Activate/Deactivate |
| 276 | + |
| 277 | +##### Local Events |
| 278 | + |
| 279 | +1. Change in Geo-rep Worker Status Active/Passive/Faulty |
| 280 | +2. Brick process Up/Down |
| 281 | +3. Socket disconnects |
| 282 | +4. Bitrot files |
| 283 | +5. Faulty Geo-replication process |
| 284 | + |
| 285 | +##### Cluster Events |
| 286 | + |
| 287 | +1. Quota crosses the limit |
| 288 | +2. Gluster Cluster quorum is lost |
| 289 | +3. Split brain state |
| 290 | +4. Self heal started/Ended |
| 291 | +5. Async Task completion(Rebalance, Remove-brick) |
| 292 | +6. Snapshot hard limit and soft limit crosses |
| 293 | + |
| 294 | +Cluster events are not planned with `Glusterd 1.0` since we need |
| 295 | +distributed store to achieve that. |
| 296 | + |
| 297 | +#### Future |
| 298 | + |
| 299 | +1. Selectively Enable/disable eventing based on the Event keys, for |
| 300 | + example `gluster-eventsapi disable Volume.*` |
| 301 | +2. Filters for Events(Channels/tag) |
| 302 | +3. Integration with other projects like Skyring, Storaged etc. |
| 303 | + |
| 304 | +Status |
| 305 | +------ |
| 306 | +In Development |
| 307 | + |
| 308 | +Current status |
| 309 | +-------------- |
| 310 | +Design is in progress. |
| 311 | + |
| 312 | +Related Feature Requests and Bugs |
| 313 | +--------------------------------- |
| 314 | +None |
| 315 | + |
| 316 | +Benefit to GlusterFS |
| 317 | +-------------------- |
| 318 | +With Events API, we can monitor Gluster effectively. |
| 319 | + |
| 320 | +Scope |
| 321 | +----- |
| 322 | + |
| 323 | +#### Nature of proposed change |
| 324 | +- New service(`glustereventsd`) to listen to the messages sent from Gluster |
| 325 | + |
| 326 | +#### Implications on manageability |
| 327 | +New service `glustereventsd` |
| 328 | + |
| 329 | +#### Implications on presentation layer |
| 330 | +None |
| 331 | + |
| 332 | +#### Implications on persistence layer |
| 333 | +None |
| 334 | + |
| 335 | +#### Implications on 'GlusterFS' backend |
| 336 | +None |
| 337 | + |
| 338 | +#### Modification to GlusterFS metadata |
| 339 | +None |
| 340 | + |
| 341 | +#### Implications on 'glusterd' |
| 342 | +None |
| 343 | + |
| 344 | +Dependencies |
| 345 | +------------ |
| 346 | +None |
| 347 | + |
| 348 | +Comments and Discussion |
| 349 | +----------------------- |
0 commit comments