-
-
Notifications
You must be signed in to change notification settings - Fork 34k
gh-144356: Avoid races when computing set_iterator.__length_hint__ under no-gil
#144357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
setiter_len() was reading so->used without atomic access while concurrent mutations update it atomically under Py_GIL_DISABLED. Use an atomic load for so->used to avoid a data race. This preserves the existing semantics of __length_hint__ while making the access thread-safe. Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
| for t in threads: | ||
| t.start() | ||
|
|
||
| stop.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the threads will stop right after they have started. I would prefer the pattern that is used in some other tests in this file: set a constant NUM_LOOPS (determined so that the test < 0.1 seconds, but there still is a decent number of mutations)
Objects/setobject.c
Outdated
| setiterobject *si = (setiterobject*)op; | ||
| Py_ssize_t len = 0; | ||
| if (si->si_set != NULL && si->si_used == si->si_set->used) | ||
| PySetObject *so = si->si_set; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here so is a borrowed reference to si->si_set. But si->si_set can be cleared in setiter_iternext (if the iterator is exhausted) outside the critical section.
This is a different mechanism than the corresponding issue, so maybe something to address in another PR. But solving both together is something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks a lot.
setiter_len() under no-gilset_iterator.__length_hint__ under no-gil
|
Thanks for the review. I’ve decided to address both issues in this PR. I also added a corresponding test case for the issue you pointed out. |
| setiterobject *si = (setiterobject*)op; | ||
| Py_ssize_t len = 0; | ||
| if (si->si_set != NULL && si->si_used == si->si_set->used) | ||
| #ifdef Py_GIL_DISABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might work for setiter_len, but setiter_iternext itself is not yet thread safe (also because of setting si->si_set to zero).
For several other iterations the approach is to keep the reference si->si_set , but use another attribute to signal exhaustion of the iterator. For example for itertools.cycle or the reversed operator.
Note: I tried creating a minimal example where concurrent iteration fails, but I have succeeded yet (the example does not crash, although I have not run thread sanitizer on it yet)
Test for concurrent iteration on set iterator
import unittest
from threading import Thread, Barrier
class TestSetIter(unittest.TestCase):
def test_set_iter(self):
"""Test concurrent iteration over a set"""
NUM_LOOPS = 10_000
NUM_THREADS = 4
for ii in range(NUM_LOOPS):
if ii % 1000 ==0:
print(f'test_set_iter {ii}')
barrier = Barrier(NUM_THREADS)
# make sure the underlying set is unique referenced by the iterator
iterator = iter(set((1,2,)))
def worker():
barrier.wait()
while True:
iterator.__length_hint__()
try:
next(iterator)
except StopIteration:
break
threads = [Thread(target=worker) for _ in range(NUM_THREADS)]
for t in threads:
t.start()
for t in threads:
t.join()
assert iterator.__length_hint__()==0
if __name__ == "__main__":
unittest.main()
Long log:
setiter_len()was readingso->usedwithout atomic access while concurrentmutations update it atomically under Py_GIL_DISABLED.
In free-threaded builds,
setiter_len()could race with concurrent setmutation and iterator exhaustion.
Use an atomic load for
so->usedto avoid a data race. This preserves theexisting semantics of
__length_hint__while making the access thread-safe.Signed-off-by: Yongtao Huang yongtaoh2022@gmail.com