|
| 1 | +# Automagic unsplit-brain by [ctime|mtime|size|majority] |
| 2 | + |
| 3 | +## Summary |
| 4 | +A new volume option 'cluster.favorite-child-policy' is introduced which will automatically resolve split-brains by |
| 5 | +choosing a particular brick as the good copy based on the value (policy) set. |
| 6 | + |
| 7 | +## Owners |
| 8 | + |
| 9 | +The patch is a rework of the one submitted by Richard Wareing from facebook. |
| 10 | + |
| 11 | +## Current status |
| 12 | +Patch merged in master: http://review.gluster.org/#/c/14026/ |
| 13 | +Patch merged in 3.8 http://review.gluster.org/#/c/14535/ |
| 14 | + |
| 15 | +## Related Feature Requests and Bugs |
| 16 | +3.8 BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1339639 |
| 17 | +Original BZ to which facebook's patch was attached: https://bugzilla.redhat.com/show_bug.cgi?id=1262161 |
| 18 | + |
| 19 | +## Detailed Description |
| 20 | +In a replicate volume, when a file ends up in split-brain, accessing them from the client results in input/output error. |
| 21 | +To resolve split-brains, the user/admin needs to use the gluster CLI commands or use the virtual setfattr commands to choose a particular |
| 22 | +copy of the file as source and trigger heal. Until such manual intervention happens accessing the file fails with EIO. |
| 23 | + |
| 24 | +With the 'cluster.favorite-child-policy', users can set a policy for AFR to automatically pick a source when the file ends in split-brain and do the heal. |
| 25 | +This means they no longer get EIO when trying to acess files and the split-brains get resolved automatically. The various policies available are: |
| 26 | +* none: This is the default value. When set, there is no automatic resolution of split-brains. |
| 27 | +* ctime: Selects the file with the highest ctime as the source. |
| 28 | +* mtime: Selects the file with the highest mtime as the source. |
| 29 | +* size: Selects the file with the biggest file size as the source. |
| 30 | +* majority: Selects a file with identical mtime and size in more than half the number of bricks in the replica as the source. |
| 31 | + |
| 32 | +This is a volume wide option, i.e. the same policy will be applied to all split-brained files of the volume. |
| 33 | + |
| 34 | +## Benefit to GlusterFS |
| 35 | +No manual intervention required to fix split-brains. |
| 36 | + |
| 37 | +## Scope |
| 38 | + |
| 39 | +### Nature of proposed change |
| 40 | +Code changes for handling the various policies for the option is done in AFR. |
| 41 | + |
| 42 | +### Implications on manageability |
| 43 | +New volume option 'cluster.favorite-child-policy' is introduced. |
| 44 | + |
| 45 | +### Implications on presentation layer |
| 46 | +None. |
| 47 | + |
| 48 | +### Implications on persistence layer |
| 49 | +None. |
| 50 | + |
| 51 | +### Implications on 'GlusterFS' backend |
| 52 | +None. |
| 53 | + |
| 54 | +### Modification to GlusterFS metadata |
| 55 | +None. |
| 56 | + |
| 57 | +### Implications on 'glusterd' |
| 58 | +Just the introduction of the volume option. |
| 59 | + |
| 60 | +## How To Test |
| 61 | +Create files in data/ metadata split-brain, use the volume set command to set various policies and see if split-brain heal happens according to the policy. The [.t file](https://github.com/gluster/glusterfs/blob/2f29065/tests/basic/afr/split-brain-favorite-child-policy.t) in the patch contains test cases. |
| 62 | +Here is an example of how the volume option can be used: |
| 63 | + |
| 64 | +### 1. A replica 2 volume that has '/file' in split-brain: |
| 65 | +```[root@dhcp42-116 ~]# gluster v heal testvol info |
| 66 | +Brick 127.0.0.2:/brick/brick1 |
| 67 | +/file - Is in split-brain |
| 68 | +
|
| 69 | +Status: Connected |
| 70 | +Number of entries: 1 |
| 71 | +
|
| 72 | +Brick 127.0.0.2:/brick/brick2 |
| 73 | +<gfid:fa6f2ab2-722e-4cf3-9f75-662c70be3f58> - Is in split-brain |
| 74 | +
|
| 75 | +Status: Connected |
| 76 | +Number of entries: 1 |
| 77 | +``` |
| 78 | + |
| 79 | +### 2. The file size in the backend is different: |
| 80 | +```[root@dhcp42-116 ~]# ll /brick/brick*/file |
| 81 | +-rw-r--r-- 2 root root 1048576 May 30 12:59 /brick/brick1/file |
| 82 | +-rw-r--r-- 2 root root 1024 May 30 12:58 /brick/brick2/file |
| 83 | +``` |
| 84 | + |
| 85 | +### 3. Set the policy to heal based on bigger size: |
| 86 | +``` |
| 87 | +[root@dhcp42-116 ~]# gluster volume set testvol cluster.favorite-child-policy size |
| 88 | +volume set: success |
| 89 | +``` |
| 90 | + |
| 91 | +### 4. Launch heal: |
| 92 | +``` |
| 93 | +[root@dhcp42-116 ~]# gluster volume heal testvol |
| 94 | +Launching heal operation to perform index self heal on volume testvol has been successful |
| 95 | +Use heal info commands to check status |
| 96 | +
|
| 97 | +``` |
| 98 | + |
| 99 | +### 5. Check heal info output again to verify file has been healed: |
| 100 | +``` |
| 101 | +[root@dhcp42-116 ~]# gluster v heal testvol info |
| 102 | +Brick 127.0.0.2:/brick/brick1 |
| 103 | +Status: Connected |
| 104 | +Number of entries: 0 |
| 105 | +
|
| 106 | +Brick 127.0.0.2:/brick/brick2 |
| 107 | +Status: Connected |
| 108 | +Number of entries: 0 |
| 109 | +``` |
| 110 | + |
| 111 | +### 6. Check in the backend that the bigger file has been used as source: |
| 112 | +```[root@dhcp42-116 ~]# ll /brick/brick*/file |
| 113 | +-rw-r--r-- 2 root root 1048576 May 30 12:59 /brick/brick1/file |
| 114 | +-rw-r--r-- 2 root root 1048576 May 30 12:59 /brick/brick2/file |
| 115 | +``` |
| 116 | + |
| 117 | + |
| 118 | +## User Experience |
| 119 | +New CLI volume option 'cluster.favorite-child-policy' |
| 120 | + |
| 121 | +## Dependencies |
| 122 | +None. |
| 123 | + |
| 124 | +## Documentation |
| 125 | +ToDo. |
| 126 | + |
| 127 | +## Status |
| 128 | +Patch merged. |
| 129 | + |
| 130 | +## Comments and Discussion |
| 131 | + |
| 132 | + |
0 commit comments