Skip to content

[MRG] Debug linux tests (segmentation fault with torch 2.12 on test and/or doc build)#815

Merged
rflamary merged 7 commits into
masterfrom
debug_mem_linux
May 22, 2026
Merged

[MRG] Debug linux tests (segmentation fault with torch 2.12 on test and/or doc build)#815
rflamary merged 7 commits into
masterfrom
debug_mem_linux

Conversation

@rflamary
Copy link
Copy Markdown
Collaborator

@rflamary rflamary commented May 21, 2026

Types of changes

This is an aweful bug, all linux test and documentation fails to build with cryptic segmentation fault. Looking at teh error it seems like i comes from a triton knob . Coming back to pytorch 2.11 seems to fix the error so we will put that until we can fix or report the porblem with pytorch/triton (very hard to reproduce all scripts and tests pass individually but fail when in full folder tests and doc build)

typical error below where tone of torch tests pass and then at one point something breaks:

test/test_ot.py::test_emd_emd2_types_devices[numpy] PASSED               [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[jax] PASSED                 [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[torch] PASSED               [ 40%]
test/test_ot.py::test_emd_emd2_types_devices[tf] PASSED                  [ 40%]
test/test_ot.py::test_emd_emd2_devices_tf PASSED                         [ 40%]
test/test_ot.py::test_emd2_gradients PASSED                              [ 40%]
test/test_ot.py::test_emd_emd2 PASSED                                    [ 40%]
test/test_ot.py::test_omp_emd2 PASSED                                    [ 40%]
test/test_ot.py::test_emd_empty PASSED                                   [ 40%]
test/test_ot.py::test_emd2_multi PASSED                                  [ 40%]
test/test_ot.py::test_lp_barycenter PASSED                               [ 40%]
test/test_ot.py::test_free_support_barycenter PASSED                     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[numpy] PASSED     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[jax] PASSED       [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[torch] PASSED     [ 40%]
test/test_ot.py::test_free_support_barycenter_backends[tf] PASSED        [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter PASSED         [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[numpy] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[jax] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[torch] PASSED [ 40%]
test/test_ot.py::test_generalised_free_support_barycenter_backends[tf] PASSED [ 40%]
test/test_ot.py::test_free_support_barycenter_generic_costs PASSED       [ 40%]
Fatal Python error: Segmentation fault

Current thread 0x00007fcccc40fb80 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1233 in create_module
  File "<frozen importlib._bootstrap>", line 573 in module_from_spec
  File "<frozen importlib._bootstrap>", line 676 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/knobs.py", line 15 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1232 in _handle_fromlist
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 11 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/runtime/__init__.py", line 1 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/triton/__init__.py", line 8 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/utils/_triton.py", line 10 in has_triton_package
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 2716 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 44 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 54 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 62 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/aot_compile.py", line 17 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1232 in _handle_fromlist
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_dynamo/__init__.py", line 13 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/_compile.py", line 47 in inner
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/optim/optimizer.py", line 405 in __init__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/torch/optim/sgd.py", line 65 in __init__
  File "/home/runner/work/POT/POT/ot/lp/_barycenter_solvers.py", line 687 in ground_bary
  File "/home/runner/work/POT/POT/ot/lp/_barycenter_solvers.py", line 737 in free_support_barycenter_generic_costs
  File "/home/runner/work/POT/POT/test/test_ot.py", line 568 in test_free_support_barycenter_generic_costs_auto_ground_bary
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/python.py", line 1720 in runtest
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 353 in from_call
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 372 in _main
  File "/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/_pytest/main.py", line 318 in wrap_session
  ...

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cyutility, scipy._cyutility, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, charset_normalizer.md, charset_normalizer.cd, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._pcg64, numpy.random._generator, numpy.random._mt19937, numpy.random._philox, numpy.random._sfc64, numpy.random.mtrand, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._batched_linalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cyruntime_ptds, cuda.bindings._bindings.cyruntime, cuda.bindings.cyruntime, cuda.bindings.runtime, jaxlib.cpu_feature_guard, google._upb._message, requests.packages.charset_normalizer.md, requests.packages.chardet.md, requests.packages.charset_normalizer.cd, requests.packages.chardet.cd, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._npystrings, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5o, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5l, h5py._selector, PIL._imaging, kiwisolver._cext, sklearn.__check_build._check_build, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpacklib, scipy.sparse.linalg._propack, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation_cy, scipy.spatial.transform._rigid_transform_cy, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.stats._qmvnt_cy, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn._cyutility, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, ot.lp.emd_wrap, cvxopt.base, cvxopt.blas, cvxopt.lapack, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._fast_dict, sklearn.cluster._hierarchical_fast, sklearn.cluster._k_means_common, sklearn.cluster._k_means_elkan, sklearn.cluster._k_means_lloyd, sklearn.cluster._k_means_minibatch, sklearn.cluster._dbscan_inner, sklearn.neighbors._partition_nodes, sklearn.neighbors._ball_tree, sklearn.neighbors._kd_tree, sklearn.utils.arrayfuncs, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.linear_model._cd_fast, _loss, sklearn._loss._loss, sklearn.linear_model._sag_fast, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, sklearn.cluster._hdbscan._tree, sklearn.cluster._hdbscan._linkage, sklearn.cluster._hdbscan._reachability, sklearn._isotonic, sklearn.tree._utils, sklearn.tree._tree, sklearn.tree._partitioner, sklearn.tree._splitter, sklearn.tree._criterion, sklearn.neighbors._quad_tree, sklearn.manifold._barnes_hut_tsne, sklearn.manifold._utils, ot.partial.partial_cython, _cvxcore, scs._scs_direct, cvxopt.glpk, markupsafe._speedups (total: 213)
/home/runner/work/_temp/56add7cf-2359-408c-a37d-a54f563bc838.sh: line 1: 15989 Segmentation fault      (core dumped) python -m pytest --durations=20 -v test/ ot/ --doctest-modules --color=yes --cov=./ --cov-report=xml
test/test_ot.py::test_free_support_barycenter_generic_costs_auto_ground_bary 

Motivation and context / Related issue

How has this been tested (if it applies)

PR checklist

  • I have read the CONTRIBUTING document.
  • The documentation is up-to-date with the changes I made (check build artifacts).
  • All tests passed, and additional code has been covered with new tests.
  • I have added the PR and Issue fix to the RELEASES.md file.

@github-actions github-actions Bot added the CI label May 21, 2026
@rflamary rflamary changed the title [WIP] Debug linux test core dumped [WIP] Debug linux tests (torch_geometric bug) May 22, 2026
@rflamary rflamary changed the title [WIP] Debug linux tests (torch_geometric bug) [MRG] Debug linux tests (torch_geometric bug) May 22, 2026
@rflamary rflamary changed the title [MRG] Debug linux tests (torch_geometric bug) [MRG] Debug linux tests (segmentation fault with torch 2.12 on test and/or doc build) May 22, 2026
@rflamary rflamary merged commit fbdfe2e into master May 22, 2026
16 of 18 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.86%. Comparing base (41a4d57) to head (7f7b864).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #815      +/-   ##
==========================================
- Coverage   96.87%   96.86%   -0.02%     
==========================================
  Files         113      113              
  Lines       23062    23067       +5     
==========================================
+ Hits        22342    22344       +2     
- Misses        720      723       +3     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant