Skip to content

Commit 7ffc748

Browse files
committed
netfilter: nft_set_hash: skip duplicated elements pending gc run
rhashtable does not provide stable walk, duplicated elements are possible in case of resizing. I considered that checking for errors when calling rhashtable_walk_next() was sufficient to detect the resizing. However, rhashtable_walk_next() returns -EAGAIN only at the end of the iteration, which is too late, because a gc work containing duplicated elements could have been already scheduled for removal to the worker. Add a u32 gc worker sequence number per set, bump it on every workqueue run. Annotate gc worker sequence number on the expired element. Use it to skip those already seen in this gc workqueue run. Note that this new field is never reset in case gc transaction fails, so next gc worker run on the expired element overrides it. Wraparound of gc worker sequence number should not be an issue with stale gc worker sequence number in the element, that would just postpone the element removal in one gc run. Note that it is not possible to use flags to annotate that element is pending gc run to detect duplicates, given that gc transaction can be invalidated in case of update from the control plane, therefore, not allowing to clear such flag. On x86_64, pahole reports no changes in the size of nft_rhash_elem. Fixes: f6c383b ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Laurent Fasnacht <[email protected]> Tested-by: Laurent Fasnacht <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
1 parent 456f010 commit 7ffc748

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

net/netfilter/nft_set_hash.c

+16
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@
2424
struct nft_rhash {
2525
struct rhashtable ht;
2626
struct delayed_work gc_work;
27+
u32 wq_gc_seq;
2728
};
2829

2930
struct nft_rhash_elem {
3031
struct nft_elem_priv priv;
3132
struct rhash_head node;
33+
u32 wq_gc_seq;
3234
struct nft_set_ext ext;
3335
};
3436

@@ -338,6 +340,10 @@ static void nft_rhash_gc(struct work_struct *work)
338340
if (!gc)
339341
goto done;
340342

343+
/* Elements never collected use a zero gc worker sequence number. */
344+
if (unlikely(++priv->wq_gc_seq == 0))
345+
priv->wq_gc_seq++;
346+
341347
rhashtable_walk_enter(&priv->ht, &hti);
342348
rhashtable_walk_start(&hti);
343349

@@ -355,6 +361,14 @@ static void nft_rhash_gc(struct work_struct *work)
355361
goto try_later;
356362
}
357363

364+
/* rhashtable walk is unstable, already seen in this gc run?
365+
* Then, skip this element. In case of (unlikely) sequence
366+
* wraparound and stale element wq_gc_seq, next gc run will
367+
* just find this expired element.
368+
*/
369+
if (he->wq_gc_seq == priv->wq_gc_seq)
370+
continue;
371+
358372
if (nft_set_elem_is_dead(&he->ext))
359373
goto dead_elem;
360374

@@ -371,6 +385,8 @@ static void nft_rhash_gc(struct work_struct *work)
371385
if (!gc)
372386
goto try_later;
373387

388+
/* annotate gc sequence for this attempt. */
389+
he->wq_gc_seq = priv->wq_gc_seq;
374390
nft_trans_gc_elem_add(gc, he);
375391
}
376392

0 commit comments

Comments
 (0)