Skip to content

Race in PyUnicode_InternFromString under free-threading #128137

Closed
@hawkinsp

Description

@hawkinsp

Bug report

Bug description:

Here's a race reported by thread sanitizer that I haven't been able to find a small reproducer for, but it does look racy reading the code.

WARNING: ThreadSanitizer: data race (pid=1575489)
  Read of size 4 at 0x7fb14614ee00 by thread T130:
    #0 unicode_eq /usr/local/google/home/phawkins/p/cpython/Objects/stringlib/eq.h:13:9 (python3.13+0x25da7a) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #1 compare_unicode_unicode /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:1139:50 (python3.13+0x25da7a)
    #2 do_lookup /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:1063:23 (python3.13+0x25da7a)
    #3 unicodekeys_lookup_unicode /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:1148:12 (python3.13+0x25da7a)
    #4 _Py_dict_lookup /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:1259:22 (python3.13+0x25ea02) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #5 dict_setdefault_ref_lock_held /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:4282:21 (python3.13+0x26a5da) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #6 PyDict_SetDefaultRef /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:4332:11 (python3.13+0x26a221) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #7 intern_common /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15225:19 (python3.13+0x34a02c) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #8 _PyUnicode_InternMortal /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15286:10 (python3.13+0x34a40c) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #9 PyUnicode_InternFromString /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15322:5 (python3.13+0x34a40c)
    #10 nanobind::detail::nb_type_new(nanobind::detail::type_init_data const*) nb_type.cpp (xla_extension.so+0xdcd28aa) (BuildId: e484e79ecc5a6e10)
...

  Previous write of size 4 at 0x7fb14614ee00 by thread T137:
    #0 immortalize_interned /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15141:34 (python3.13+0x34a20e) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #1 intern_common /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15270:9 (python3.13+0x34a20e)
    #2 _PyUnicode_InternMortal /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15286:10 (python3.13+0x34a40c) (BuildId: 9c1c16fb1bb8a435fa6fa4c6944da5d41f654e96)
    #3 PyUnicode_InternFromString /usr/local/google/home/phawkins/p/cpython/Objects/unicodeobject.c:15322:5 (python3.13+0x34a40c)
    #4 nanobind::detail::nb_type_new(nanobind::detail::type_init_data const*) nb_type.cpp (xla_extension.so+0xdcd28aa) (BuildId: e484e79ecc5a6e10)
...

I think the scenario here is:

  • thread A and B are simultaneously interning strings
  • thread A succeeds at inserting that string into the intern dictionary, and is at the end of intern_common immortalizing the string, which sets the .interned field on the string
  • thread B is now trying to intern a string and is performing an equality test during the intern dictionary lookup, which reads the .kind field.

The .kind and .interned fields are bitfields in the same word, so this is a race, and I can't see any synchronization or atomicity that would prevent it.

Perhaps we need to hold the critical section on the intern dictionary longer, until immortalization is complete?

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions