Skip to content

Commit 6ae08ae

Browse files
borkmannAlexei Starovoitov
authored and
Alexei Starovoitov
committed
bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
The current bpf_probe_read() and bpf_probe_read_str() helpers are broken in that they assume they can be used for probing memory access for kernel space addresses /as well as/ user space addresses. However, plain use of probe_kernel_read() for both cases will attempt to always access kernel space address space given access is performed under KERNEL_DS and some archs in-fact have overlapping address spaces where a kernel pointer and user pointer would have the /same/ address value and therefore accessing application memory via bpf_probe_read{,_str}() would read garbage values. Lets fix BPF side by making use of recently added 3d70818 ("uaccess: Add non-pagefault user-space read functions"). Unfortunately, the only way to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}() helpers are kept as-is to retain their current behavior. The two *_user() variants attempt the access always under USER_DS set, the two *_kernel() variants will -EFAULT when accessing user memory if the underlying architecture has non-overlapping address ranges, also avoiding throwing the kernel warning via 00c4237 ("x86-64: add warning for non-canonical user access address dereferences"). Fixes: a5e8c07 ("bpf: add bpf_probe_read_str helper") Fixes: 2541517 ("tracing, perf: Implement BPF programs attached to kprobes") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
1 parent eb1b668 commit 6ae08ae

File tree

3 files changed

+299
-126
lines changed

3 files changed

+299
-126
lines changed

include/uapi/linux/bpf.h

+82-40
Original file line numberDiff line numberDiff line change
@@ -563,10 +563,13 @@ union bpf_attr {
563563
* Return
564564
* 0 on success, or a negative error in case of failure.
565565
*
566-
* int bpf_probe_read(void *dst, u32 size, const void *src)
566+
* int bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr)
567567
* Description
568568
* For tracing programs, safely attempt to read *size* bytes from
569-
* address *src* and store the data in *dst*.
569+
* kernel space address *unsafe_ptr* and store the data in *dst*.
570+
*
571+
* Generally, use bpf_probe_read_user() or bpf_probe_read_kernel()
572+
* instead.
570573
* Return
571574
* 0 on success, or a negative error in case of failure.
572575
*
@@ -1428,45 +1431,14 @@ union bpf_attr {
14281431
* Return
14291432
* 0 on success, or a negative error in case of failure.
14301433
*
1431-
* int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
1434+
* int bpf_probe_read_str(void *dst, u32 size, const void *unsafe_ptr)
14321435
* Description
1433-
* Copy a NUL terminated string from an unsafe address
1434-
* *unsafe_ptr* to *dst*. The *size* should include the
1435-
* terminating NUL byte. In case the string length is smaller than
1436-
* *size*, the target is not padded with further NUL bytes. If the
1437-
* string length is larger than *size*, just *size*-1 bytes are
1438-
* copied and the last byte is set to NUL.
1439-
*
1440-
* On success, the length of the copied string is returned. This
1441-
* makes this helper useful in tracing programs for reading
1442-
* strings, and more importantly to get its length at runtime. See
1443-
* the following snippet:
1444-
*
1445-
* ::
1446-
*
1447-
* SEC("kprobe/sys_open")
1448-
* void bpf_sys_open(struct pt_regs *ctx)
1449-
* {
1450-
* char buf[PATHLEN]; // PATHLEN is defined to 256
1451-
* int res = bpf_probe_read_str(buf, sizeof(buf),
1452-
* ctx->di);
1453-
*
1454-
* // Consume buf, for example push it to
1455-
* // userspace via bpf_perf_event_output(); we
1456-
* // can use res (the string length) as event
1457-
* // size, after checking its boundaries.
1458-
* }
1459-
*
1460-
* In comparison, using **bpf_probe_read()** helper here instead
1461-
* to read the string would require to estimate the length at
1462-
* compile time, and would often result in copying more memory
1463-
* than necessary.
1436+
* Copy a NUL terminated string from an unsafe kernel address
1437+
* *unsafe_ptr* to *dst*. See bpf_probe_read_kernel_str() for
1438+
* more details.
14641439
*
1465-
* Another useful use case is when parsing individual process
1466-
* arguments or individual environment variables navigating
1467-
* *current*\ **->mm->arg_start** and *current*\
1468-
* **->mm->env_start**: using this helper and the return value,
1469-
* one can quickly iterate at the right offset of the memory area.
1440+
* Generally, use bpf_probe_read_user_str() or bpf_probe_read_kernel_str()
1441+
* instead.
14701442
* Return
14711443
* On success, the strictly positive length of the string,
14721444
* including the trailing NUL character. On error, a negative
@@ -2777,6 +2749,72 @@ union bpf_attr {
27772749
* restricted to raw_tracepoint bpf programs.
27782750
* Return
27792751
* 0 on success, or a negative error in case of failure.
2752+
*
2753+
* int bpf_probe_read_user(void *dst, u32 size, const void *unsafe_ptr)
2754+
* Description
2755+
* Safely attempt to read *size* bytes from user space address
2756+
* *unsafe_ptr* and store the data in *dst*.
2757+
* Return
2758+
* 0 on success, or a negative error in case of failure.
2759+
*
2760+
* int bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
2761+
* Description
2762+
* Safely attempt to read *size* bytes from kernel space address
2763+
* *unsafe_ptr* and store the data in *dst*.
2764+
* Return
2765+
* 0 on success, or a negative error in case of failure.
2766+
*
2767+
* int bpf_probe_read_user_str(void *dst, u32 size, const void *unsafe_ptr)
2768+
* Description
2769+
* Copy a NUL terminated string from an unsafe user address
2770+
* *unsafe_ptr* to *dst*. The *size* should include the
2771+
* terminating NUL byte. In case the string length is smaller than
2772+
* *size*, the target is not padded with further NUL bytes. If the
2773+
* string length is larger than *size*, just *size*-1 bytes are
2774+
* copied and the last byte is set to NUL.
2775+
*
2776+
* On success, the length of the copied string is returned. This
2777+
* makes this helper useful in tracing programs for reading
2778+
* strings, and more importantly to get its length at runtime. See
2779+
* the following snippet:
2780+
*
2781+
* ::
2782+
*
2783+
* SEC("kprobe/sys_open")
2784+
* void bpf_sys_open(struct pt_regs *ctx)
2785+
* {
2786+
* char buf[PATHLEN]; // PATHLEN is defined to 256
2787+
* int res = bpf_probe_read_user_str(buf, sizeof(buf),
2788+
* ctx->di);
2789+
*
2790+
* // Consume buf, for example push it to
2791+
* // userspace via bpf_perf_event_output(); we
2792+
* // can use res (the string length) as event
2793+
* // size, after checking its boundaries.
2794+
* }
2795+
*
2796+
* In comparison, using **bpf_probe_read_user()** helper here
2797+
* instead to read the string would require to estimate the length
2798+
* at compile time, and would often result in copying more memory
2799+
* than necessary.
2800+
*
2801+
* Another useful use case is when parsing individual process
2802+
* arguments or individual environment variables navigating
2803+
* *current*\ **->mm->arg_start** and *current*\
2804+
* **->mm->env_start**: using this helper and the return value,
2805+
* one can quickly iterate at the right offset of the memory area.
2806+
* Return
2807+
* On success, the strictly positive length of the string,
2808+
* including the trailing NUL character. On error, a negative
2809+
* value.
2810+
*
2811+
* int bpf_probe_read_kernel_str(void *dst, u32 size, const void *unsafe_ptr)
2812+
* Description
2813+
* Copy a NUL terminated string from an unsafe kernel address *unsafe_ptr*
2814+
* to *dst*. Same semantics as with bpf_probe_read_user_str() apply.
2815+
* Return
2816+
* On success, the strictly positive length of the string, including
2817+
* the trailing NUL character. On error, a negative value.
27802818
*/
27812819
#define __BPF_FUNC_MAPPER(FN) \
27822820
FN(unspec), \
@@ -2890,7 +2928,11 @@ union bpf_attr {
28902928
FN(sk_storage_delete), \
28912929
FN(send_signal), \
28922930
FN(tcp_gen_syncookie), \
2893-
FN(skb_output),
2931+
FN(skb_output), \
2932+
FN(probe_read_user), \
2933+
FN(probe_read_kernel), \
2934+
FN(probe_read_user_str), \
2935+
FN(probe_read_kernel_str),
28942936

28952937
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
28962938
* function eBPF program intends to call

kernel/trace/bpf_trace.c

+135-46
Original file line numberDiff line numberDiff line change
@@ -138,24 +138,140 @@ static const struct bpf_func_proto bpf_override_return_proto = {
138138
};
139139
#endif
140140

141-
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
141+
BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
142+
const void __user *, unsafe_ptr)
142143
{
143-
int ret;
144+
int ret = probe_user_read(dst, unsafe_ptr, size);
144145

145-
ret = security_locked_down(LOCKDOWN_BPF_READ);
146-
if (ret < 0)
147-
goto out;
146+
if (unlikely(ret < 0))
147+
memset(dst, 0, size);
148+
149+
return ret;
150+
}
151+
152+
static const struct bpf_func_proto bpf_probe_read_user_proto = {
153+
.func = bpf_probe_read_user,
154+
.gpl_only = true,
155+
.ret_type = RET_INTEGER,
156+
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
157+
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
158+
.arg3_type = ARG_ANYTHING,
159+
};
160+
161+
BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
162+
const void __user *, unsafe_ptr)
163+
{
164+
int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
165+
166+
if (unlikely(ret < 0))
167+
memset(dst, 0, size);
168+
169+
return ret;
170+
}
171+
172+
static const struct bpf_func_proto bpf_probe_read_user_str_proto = {
173+
.func = bpf_probe_read_user_str,
174+
.gpl_only = true,
175+
.ret_type = RET_INTEGER,
176+
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
177+
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
178+
.arg3_type = ARG_ANYTHING,
179+
};
148180

149-
ret = probe_kernel_read(dst, unsafe_ptr, size);
181+
static __always_inline int
182+
bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
183+
const bool compat)
184+
{
185+
int ret = security_locked_down(LOCKDOWN_BPF_READ);
186+
187+
if (unlikely(ret < 0))
188+
goto out;
189+
ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
190+
probe_kernel_read_strict(dst, unsafe_ptr, size);
150191
if (unlikely(ret < 0))
151192
out:
152193
memset(dst, 0, size);
194+
return ret;
195+
}
196+
197+
BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
198+
const void *, unsafe_ptr)
199+
{
200+
return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, false);
201+
}
202+
203+
static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
204+
.func = bpf_probe_read_kernel,
205+
.gpl_only = true,
206+
.ret_type = RET_INTEGER,
207+
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
208+
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
209+
.arg3_type = ARG_ANYTHING,
210+
};
211+
212+
BPF_CALL_3(bpf_probe_read_compat, void *, dst, u32, size,
213+
const void *, unsafe_ptr)
214+
{
215+
return bpf_probe_read_kernel_common(dst, size, unsafe_ptr, true);
216+
}
153217

218+
static const struct bpf_func_proto bpf_probe_read_compat_proto = {
219+
.func = bpf_probe_read_compat,
220+
.gpl_only = true,
221+
.ret_type = RET_INTEGER,
222+
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
223+
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
224+
.arg3_type = ARG_ANYTHING,
225+
};
226+
227+
static __always_inline int
228+
bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
229+
const bool compat)
230+
{
231+
int ret = security_locked_down(LOCKDOWN_BPF_READ);
232+
233+
if (unlikely(ret < 0))
234+
goto out;
235+
/*
236+
* The strncpy_from_unsafe_*() call will likely not fill the entire
237+
* buffer, but that's okay in this circumstance as we're probing
238+
* arbitrary memory anyway similar to bpf_probe_read_*() and might
239+
* as well probe the stack. Thus, memory is explicitly cleared
240+
* only in error case, so that improper users ignoring return
241+
* code altogether don't copy garbage; otherwise length of string
242+
* is returned that can be used for bpf_perf_event_output() et al.
243+
*/
244+
ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
245+
strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
246+
if (unlikely(ret < 0))
247+
out:
248+
memset(dst, 0, size);
154249
return ret;
155250
}
156251

157-
static const struct bpf_func_proto bpf_probe_read_proto = {
158-
.func = bpf_probe_read,
252+
BPF_CALL_3(bpf_probe_read_kernel_str, void *, dst, u32, size,
253+
const void *, unsafe_ptr)
254+
{
255+
return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, false);
256+
}
257+
258+
static const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
259+
.func = bpf_probe_read_kernel_str,
260+
.gpl_only = true,
261+
.ret_type = RET_INTEGER,
262+
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
263+
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
264+
.arg3_type = ARG_ANYTHING,
265+
};
266+
267+
BPF_CALL_3(bpf_probe_read_compat_str, void *, dst, u32, size,
268+
const void *, unsafe_ptr)
269+
{
270+
return bpf_probe_read_kernel_str_common(dst, size, unsafe_ptr, true);
271+
}
272+
273+
static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
274+
.func = bpf_probe_read_compat_str,
159275
.gpl_only = true,
160276
.ret_type = RET_INTEGER,
161277
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
@@ -583,41 +699,6 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = {
583699
.arg2_type = ARG_ANYTHING,
584700
};
585701

586-
BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
587-
const void *, unsafe_ptr)
588-
{
589-
int ret;
590-
591-
ret = security_locked_down(LOCKDOWN_BPF_READ);
592-
if (ret < 0)
593-
goto out;
594-
595-
/*
596-
* The strncpy_from_unsafe() call will likely not fill the entire
597-
* buffer, but that's okay in this circumstance as we're probing
598-
* arbitrary memory anyway similar to bpf_probe_read() and might
599-
* as well probe the stack. Thus, memory is explicitly cleared
600-
* only in error case, so that improper users ignoring return
601-
* code altogether don't copy garbage; otherwise length of string
602-
* is returned that can be used for bpf_perf_event_output() et al.
603-
*/
604-
ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
605-
if (unlikely(ret < 0))
606-
out:
607-
memset(dst, 0, size);
608-
609-
return ret;
610-
}
611-
612-
static const struct bpf_func_proto bpf_probe_read_str_proto = {
613-
.func = bpf_probe_read_str,
614-
.gpl_only = true,
615-
.ret_type = RET_INTEGER,
616-
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
617-
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
618-
.arg3_type = ARG_ANYTHING,
619-
};
620-
621702
struct send_signal_irq_work {
622703
struct irq_work irq_work;
623704
struct task_struct *task;
@@ -697,8 +778,6 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
697778
return &bpf_map_pop_elem_proto;
698779
case BPF_FUNC_map_peek_elem:
699780
return &bpf_map_peek_elem_proto;
700-
case BPF_FUNC_probe_read:
701-
return &bpf_probe_read_proto;
702781
case BPF_FUNC_ktime_get_ns:
703782
return &bpf_ktime_get_ns_proto;
704783
case BPF_FUNC_tail_call:
@@ -725,8 +804,18 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
725804
return &bpf_current_task_under_cgroup_proto;
726805
case BPF_FUNC_get_prandom_u32:
727806
return &bpf_get_prandom_u32_proto;
807+
case BPF_FUNC_probe_read_user:
808+
return &bpf_probe_read_user_proto;
809+
case BPF_FUNC_probe_read_kernel:
810+
return &bpf_probe_read_kernel_proto;
811+
case BPF_FUNC_probe_read:
812+
return &bpf_probe_read_compat_proto;
813+
case BPF_FUNC_probe_read_user_str:
814+
return &bpf_probe_read_user_str_proto;
815+
case BPF_FUNC_probe_read_kernel_str:
816+
return &bpf_probe_read_kernel_str_proto;
728817
case BPF_FUNC_probe_read_str:
729-
return &bpf_probe_read_str_proto;
818+
return &bpf_probe_read_compat_str_proto;
730819
#ifdef CONFIG_CGROUPS
731820
case BPF_FUNC_get_current_cgroup_id:
732821
return &bpf_get_current_cgroup_id_proto;

0 commit comments

Comments
 (0)