Skip to content

Commit 8305581

Browse files
kotreshhrnixpanic
authored andcommitted
Add design spec for geo-rep support for sharded volumes
Change-Id: I100823a6d96f225c5b2c96defaa79866d619cee7 Signed-off-by: Kotresh HR <[email protected]> Reviewed-on: http://review.gluster.org/13788 Reviewed-by: Aravinda VK <[email protected]> Reviewed-by: Niels de Vos <[email protected]> Tested-by: Niels de Vos <[email protected]>
1 parent 8e25051 commit 8305581

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed
+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
Feature
2+
-------
3+
Support geo-replication for sharded volumes.
4+
5+
Summary
6+
-------
7+
This features helps geo-replicate the large files stored on sharded volume. The
8+
requirement is that the slave volume also should be sharded.
9+
10+
Owners
11+
------
12+
[Kotresh HR]([email protected])
13+
[Aravinda VK]([email protected])
14+
15+
Current status
16+
--------------
17+
Traditionally changelog xlator, sitting just above the posix records the changes
18+
at brick level and geo-replication picks up these files that are
19+
modified/created and syncs them over gluster mount to slave. This works well as
20+
long as a file in gluster volume is represented by a single file at the brick
21+
level. But with the introduction of sharding in gluster, a file in gluster
22+
volume could be represented by multiple files at the brick level spawning
23+
different bricks. Hence the traditional way syncing files using changelog
24+
results in related files being synced as different files all together. So there
25+
has to be some understanding between geo-replication and sharding to tell all
26+
those sharded files are related. Hence this feature.
27+
28+
Related Feature Requests and Bugs
29+
---------------------------------
30+
1. [Mask sharding translator for geo-replication client](https://bugzilla.redhat.com/show_bug.cgi?id=1275972)
31+
2. [All other related changes for geo-replication](https://bugzilla.redhat.com/show_bug.cgi?id=1284453)
32+
33+
Detailed Description
34+
--------------------
35+
Sharding breaks the file into multiple small files based on agreed upon
36+
shard-size(usually 4MB, 64MB...) and helps distribute the one big file well
37+
across sub-volumes. Let's say 4MB is the shard size, the first 4MB of the file
38+
is saved with the actual filename, say file1. The next 4MB will be it's first
39+
shard with the filename <GFID>.1 and it follows. So shards will be saved as
40+
<GFID>.1, <GFID>.2, <GFID>.3.......<GFID>.n where GFID is the gfid of file1.
41+
42+
The shard xlator is placed just above DHT on client stack, shard determines to
43+
which shard the write/read belongs to based on offset and gives specific
44+
<GFID>.n file to DHT. Each of the sharded files are stored under a special
45+
directory called ".shard" in respective sub-volumes as hashed by DHT.
46+
47+
For more information on Gluster sharding please go through following links.
48+
1. <https://gluster.readthedocs.org/en/release-3.7.0/Features/shard>
49+
2. <http://blog.gluster.org/2015/12/introducing-shard-translator>
50+
3. <http://blog.gluster.org/2015/12/sharding-what-next-2>
51+
52+
To make geo-rep work with sharded files, we got two options.
53+
54+
1. Somehow record only the main gfid and bname on changes to any shard:
55+
This would simplify the design but lacks performance as geo-rep has to
56+
sync all the shards from single brick and rsync might take more time
57+
calculating checksums to find out delta if shards of the file are
58+
placed in different nodes by DHT.
59+
60+
2. Let geo-rep sync each mainfile and each sharded files separately:
61+
This approach overcomes the performance issue but the solution needs
62+
to be implemented carefully considering all the cases. As for this,
63+
geo-rep client is given the access by sharding xlator to sync each
64+
shards as different files, hence the rsync need not calculate
65+
check-sums over wire and sync the shard as if it's a single file.
66+
The xattrs maintained by the main file to track the shard-size and
67+
file-size is also synced. Here multiple bricks participate in syncing
68+
the shard with respect to where the shard is hashed.
69+
70+
Keeping performance in mind, the second approach is chosen!!!
71+
72+
So the key here is that sharding xlator is masked for geo-replication
73+
(gsyncd client). It syncs all the sharded files as separate files as if no
74+
sharding xlator is loaded. Since xattrs of the main file is also synced from
75+
master, while reading from non geo-rep clients from slave, the data is intact.
76+
It could be possible that geo-rep wouldn't have synced all the shards of a file
77+
from master, during which, it is expected to get inconsistent data as any way
78+
geo-rep is eventually consistent model.
79+
80+
So this brings in certain prerequisite configurations:
81+
82+
1. If master is a sharded volume, slave also needs to be sharded volume.
83+
2. Geo-rep sync-engine must be 'rsync'. tarssh is not supported for sharding
84+
configuration.
85+
86+
Benefit to GlusterFS
87+
--------------------
88+
The sharded volumes can be geo-replicated. The main use case is in the
89+
hyperconvergence scenario where the large VM images are stored in sharded
90+
gluster volumes and needs to be geo-replicated for disaster recovery.
91+
92+
Scope
93+
-----
94+
#### Nature of proposed change
95+
No new translators are written as part of this feature.
96+
The modification spawns sharding, gfid-access translators
97+
and geo-replication.
98+
99+
1. <http://review.gluster.org/#/c/12438>
100+
2. <http://review.gluster.org/#/c/12732>
101+
3. <http://review.gluster.org/#/c/12729>
102+
4. <http://review.gluster.org/#/c/12721>
103+
5. <http://review.gluster.org/#/c/12731>
104+
6. <http://review.gluster.org/#/c/13643>
105+
106+
#### Implications on manageability
107+
No implication to manageability. Ther is no change in the way geo-replication
108+
is setup.
109+
110+
#### Implications on presentation layer
111+
No implication to NFS/SAMBA/UFO/FUSE/libglusterfsclient
112+
113+
#### Implications on persistence layer
114+
No implications to LVM/XFS/RHEL.
115+
116+
#### Implications on 'GlusterFS' backend
117+
No implication to brick's data format, layout changes
118+
119+
#### Modification to GlusterFS metadata
120+
No modifications to metatdata. No new extended attributes used,
121+
internal hidden files to keep the metadata
122+
123+
#### Implications on 'glusterd'
124+
None
125+
126+
How To Test
127+
-----------
128+
1. Setup master gluster volume and enable sharding
129+
2. Setup slave gluster volume and enable sharding
130+
3. Create geo-replication session between master and slave volume.
131+
4. Make sure geo-rep config 'use_tarssh' is set to false
132+
5. Make sure geo-rep config 'sync_xattrs' is set to true
133+
6. Start geo-replication
134+
7. Write a large file greater than shard size and check for the same
135+
on slave volume.
136+
137+
User Experience
138+
---------------
139+
Following configuration should be done
140+
1. If master is a sharded volume, slave also needs to be sharded volume.
141+
2. Geo-rep sync-engine must be 'rsync'. tarssh is not supported for sharding
142+
configuration.
143+
3. Geo-replication config option 'sync_xattrs' should be set to true.
144+
145+
Dependencies
146+
------------
147+
No dependencies apart from the sharding feature:)
148+
149+
Documentation
150+
-------------
151+
152+
Status
153+
------
154+
Completed
155+
156+
Comments and Discussion
157+
-----------------------

0 commit comments

Comments
 (0)