Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve break mesh by blocks #14456

Closed

Conversation

arovinelli
Copy link
Contributor

This fix the debug mode tests for #12033

@moosebuild
Copy link
Contributor

moosebuild commented Nov 27, 2019

Job Documentation on fc9c23e wanted to post the following:

View the site here

This comment will be updated on new commits.

@arovinelli arovinelli changed the title improve break mesh by blocks [WIP] improve break mesh by blocks Nov 27, 2019
@arovinelli arovinelli changed the title [WIP] improve break mesh by blocks Improve break mesh by blocks Dec 2, 2019
@arovinelli
Copy link
Contributor Author

arovinelli commented Dec 5, 2019

@roystgnr I don't want to annoy you too much with this, but there is something i can't really figure out. Most likely I'm missing something in the logic somewhere.
The new node id subdomain map is. However tehre is something odd because I always get this error in debug distributed mode which is coming from void Elem::make_links_to_me_local(unsigned int n)

it seems when it's trying to make a remote element local, it can't find the neighbdor but the connectivity and neighbor map is right and the same on each process.

nn = 4
neigh->n_sides() = 4

/home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/geom/elem.C, line 1038

you can try to run generate_distributed_exodus.i and then break_mesh_2D_distributed
I'm really confused at this point 😣

@roystgnr
Copy link
Contributor

roystgnr commented Dec 5, 2019

Sorry, but what's the command line sequence I want to run here? I tried

mpirun -np 4 ../../../moose_test-dbg --mesh-only -i generate_distributed_exodus.i
mpirun -np 4 ../../../moose_test-dbg --mesh-only -i break_mesh_2D_distributed.i

and I see a bunch of debugging output but no ...

Oh, wait, I have to add --distributed-mesh. Yeah, I can replicate this. I'll see if I can figure it out.

@arovinelli
Copy link
Contributor Author

Sorry, but what's the command line sequence I want to run here? I tried

mpirun -np 4 ../../../moose_test-dbg --mesh-only -i generate_distributed_exodus.i
mpirun -np 4 ../../../moose_test-dbg --mesh-only -i break_mesh_2D_distributed.i

and I see a bunch of debugging output but no ...

Oh, wait, I have to add --distributed-mesh. Yeah, I can replicate this. I'll see if I can figure it out.

@roystgnr Actually you just want to run the break_mesh_2D_distributed.i in distributed mode the other one is just generating an exodus with a mesh that might is set up to have at least a remote subdomain .
The actual contro sequence I used is
mpirun -np 4 ../../../moose_test-dbg -i break_mesh_2D_distributed.i --mesh-only --distributed-mesh --keep-cout --redirect-stdout

the --keep-cout allows for std out from all ranks and --redirect-stdout generate a text file for each mpi process (but nothing is printed to screen), such that you don't go crazy while checking the output of a processes is correct.

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

the connectivity and neighbor map is right and the same on each process.

Doesn't look right to me. On the failing process, elem 6 ("this") has neighbor link 0 pointing to elem 5 ("neigh"), when that neighbor link should be null, because the two points they share are now both duplicated nodes along the slit between them. Looks like the other processors all have the same problem, but on the failing processor it's actually triggering during a mesh serialization because neigh->neighbor_ptr(2) is a remote_elem link rather than a link back to elem 6, so the neighbor links are actually inconsistent, not just wrong.

I'm guessing BreakMeshByBlockGenerator hasn't actually ever been breaking neighbor links when it splits apart two neighbors, but nothing's been really looking at the neighbor links before (at least not before ReplicatedMesh prepare_for_use could fix them) and so it hasn't caused problems until now?

@arovinelli
Copy link
Contributor Author

the connectivity and neighbor map is right and the same on each process.

Doesn't look right to me. On the failing process, elem 6 ("this") has neighbor link 0 pointing to elem 5 ("neigh"), when that neighbor link should be null, because the two points they share are now both duplicated nodes along the slit between them. Looks like the other processors all have the same problem, but on the failing processor it's actually triggering during a mesh serialization because neigh->neighbor_ptr(2) is a remote_elem link rather than a link back to elem 6, so the neighbor links are actually inconsistent, not just wrong.

Just to clarify the aim of this is modifier is to split he mesh by adding nodes but leaving neighbor map intact, which means that two adjacent element with different block ids will not share the same nodes anymore, but they will still be neighbors. such that i can use a DG approach between them.

I'm guessing BreakMeshByBlockGenerator hasn't actually ever been breaking neighbor links when it splits apart two neighbors, but nothing's been really looking at the neighbor links before (at least not before ReplicatedMesh prepare_for_use could fix them) and so it hasn't caused problems until now?

No because we want the links to remain intact.
Any idea on how to fix this?

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

Oh, yikes. We want to maintain neighbor links between elements that aren't technically neighbors? We surely can support that in libMesh, but currently we don't - so to whatever extent it's been working in the past, that's been dumb luck.

The most likely culprit here might be a find_neighbors() call, then? That typically strips down old neighbor links and then rebuilds them based on identifying patterns of shared nodes, and off the top of my head it's the only thing I can think of that isn't trying to respect existing neighbor links.

The failure is happening when a MeshSerializer gathers the mesh onto proc 0 for Exodus output. Let me try looking at the state of things pre-gather and see if anything looks fishy. It's also possible that the weird non-neighbor-neighbors situation here is a red herring and you've just managed to trip some unrelated corner case.

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

Oh, hell, I get it now. It's not an unrelated bug, but it's also not a failing in find_neighbors.

Inside make_links_to_me_local we're relying on a comparison between side_ptr results (shims to Side elements) to determine which of the neighbor's sides matches our side. But if they're not true neighbors then there is no match. If I receive a copy of previously-remote element A, and the information about that copy says its 3rd neighbor is element B, and element B currently has two remote_elem neighbors, then I have no way of knowing, aside from looking for matching nodes, which of those two remote_elem neighbors is supposed to be A!

I'm not sure what the right fix for this is. Maybe communicate return neighbor pointer side indices for each neighbor when packing and unpacking an Elem? That'd bump up communication size and code size, and I wouldn't dare commit it without doing a whole DistributedMesh sweep in CI, but I can't seem to think of any other option.

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

Ooh, look. We're comparing centroids of sides in BoundaryInfo::get_side_and_node_maps. That might work here too.

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

There's other places where we're doing direct side node comparison, though. AbaqusIO, GmshIO, UNVIO, InfElemBuilder, MeshModification::all_tri, ReplicatedMesh::stitching_helper, find_neighbors of course... there may be other anticipated breakage in your future.

I think in just the Elem unpacking case it'll be safe enough and quick enough to compare centroids, though. Let me try that. Unless you guys ever combine fake neighbor pointers with sliver elements too? In that case I think the safest thing for a nuclear design code to do would be libmesh_error("Just freeze in the dark, Bryukhanov").

@arovinelli
Copy link
Contributor Author

want to maintain neighbor links between elements that aren't technically neighbor?

Yes, one of the aim of BreakMeshByblocks is to allow cohesive zone modeling with a DG approach (breaking the mesh allow for a jump in the displacement field, but we need the neighbor information to compute it).

If I receive a copy of previously-remote element A, and the information about that copy says its 3rd neighbor is element B, and element B currently has two remote_elem neighbors, then I have no way of knowing, aside from looking for matching nodes, which of those two remote_elem neighbors is supposed to be A!
I'm not sure what the right fix for this is. Maybe communicate return neighbor pointer side indices for each neighbor when packing and unpacking an Elem? That'd bump up communication size and code size, and I wouldn't dare commit it without doing a whole DistributedMesh sweep in CI, but I can't seem to think of any other option.

I see your point here, I don't really know how to help with this. If this is done only at the meshgeneration phase (which it doesn't seems the case) comparing node location or face centroids it's ok. but if it's done later (e.g. every time we dump files) then it will not work as the duplicated node might have different coordinates and face might be dispalced. (can you keep a copy of the undeformed mesh?)

I think in just the Elem unpacking case it'll be safe enough and quick enough to compare centroids, though. Let me try that. Unless you guys ever combine fake neighbor pointers with sliver elements too? In that case I think the safest thing for a nuclear design code to do would be libmesh_error("Just freeze in the dark, Bryukhanov").

I"m not sure what you mean by a silver element (an element with a very bad aspect ratio??). Why this should be a problem?
For the type of application we envisioned to use this method elements should all have nice geometries. In general we use

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

Give libMesh/libmesh#2362 a try?

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

then it will not work as the duplicated node might have different coordinates and face might be dispalced

Wait, are you planning to move these element pairs apart after displacement? I threw in a tolerance for the floating point comparison to handle sides computing centroids in different orders, but that's going to work if the elements are O(epsilon) apart, not O(1).

I"m not sure what you mean by a silver element (an element with a very bad aspect ratio??). Why this should be a problem?

If two faces of the same element collapse down onto each other than they can have the same centroid.

@arovinelli
Copy link
Contributor Author

Oh, yikes. We want to maintain neighbor links between elements that aren't technically neighbors? We surely can support that in libMesh, but currently we don't - so to whatever extent it's been working in the past, that's been dumb luck.

By the way, all of this works great for a replicated mesh. I've done pretty big simulations with this. So it should have something to do with the serialization of the mesh when. Would using a nemesis output help?

@roystgnr
Copy link
Contributor

roystgnr commented Dec 6, 2019

Not just serialization, but any redistribution. So Nemesis would probably work (it doesn't need to gather any more ghost elements than we already have, right?) but any kind of repartitioning would fail in the same way.

Can you do refinement (even uniform) on the ReplicatedMesh case? I'd expect find_neighbors to fail there.

@arovinelli
Copy link
Contributor Author

Not just serialization, but any redistribution. So Nemesis would probably work (it doesn't need to gather any more ghost elements than we already have, right?) but any kind of repartitioning would fail in the same way.

Can you do refinement (even uniform) on the ReplicatedMesh case? I'd expect find_neighbors to fail there.

I was willing to try but the new meshgenerator system of moose doesn't seems to allow for uniform refinement in any meshgenerator block (@permcody ??) i could only use it with the mesh block but it conflicts with adding generators later. So i don't know 😅

Also @permcody can I use nemesis output for --mesh-only executions?

@permcody
Copy link
Member

permcody commented Dec 9, 2019 via email

@arovinelli
Copy link
Contributor Author

@roystgnr thanks for finalising libmesh/2363
I think we still have some issue. When I run pirun -np 8 ../../../moose_test-dbg -i break_mesh_3D_polycrystal.i --distributed-mesh --mesh-only

with the update lib mesh, I receive the following errors

Warning, ntags=3, but we currently only support reading 2 flags.
Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

[Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

[Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

[Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

[Assertion `mesh.comm().semiverify (elem ? &all_bcids : nullptr)' failed.

[7] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
[0] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
1] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
2] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
[3] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
[4] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
5] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27
6] /home/arovinelli/projects/NEW_stuff/moose/scripts/../libmesh/src/mesh/mesh_tools.C, line 1561, compiled Dec 12 2019 at 19:25:27

can you replicate this?

@roystgnr
Copy link
Contributor

This is incredibly frustrating: NO, I can't replicate it this time! I even tried a sweep over processor count, from 1 to 16, and everything seemed to run cleanly.

I'm using a --disable-strict-lgpl --enable-everything --enable-parmesh build, but I don't think the first two options should matter, and the third shouldn't matter in this case where there isn't a Mesh somewhere in the code path.

I'll try building on a different system; let me know what your configuration options are so I can match this as closely as possible? That certainly looks like a real error, boundary conditions that aren't correctly matching between local and ghosted copies of the same element.

@permcody
Copy link
Member

permcody commented Dec 13, 2019 via email

@arovinelli arovinelli force-pushed the ROY_improve_break_mesh_by_blocks branch from dc13fc4 to c1f7719 Compare December 13, 2019 17:46
libmesh Outdated
@@ -1 +1 @@
Subproject commit 83ce83b325b7e1e0e4e0ea1b426b4566a3d5a4f5
Subproject commit 22ef4128edc89f2e61de767f0a54afec25360aef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution! This contains a submodule update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required to check everything works, hopefully i update libmesh correctly

Andrea Rovinelli and others added 8 commits March 24, 2020 12:34
There's no guarantee that we're going to prepare later. During
construction, MooseMesh._needs_prepare_for_use is initialized to
`false`; I think this is a fair approach...if at any point during the
MeshGeneration phase someone does something that needs mesh preparation,
they should call `prepare_for_use` on the spot. In that vein, we call
`prepare_for_use` here because we've done an operation that requires
updating the parallel counts.
@lindsayad lindsayad force-pushed the ROY_improve_break_mesh_by_blocks branch from 61ecae3 to fc9c23e Compare March 24, 2020 19:35
@lindsayad lindsayad changed the title Improve break mesh by blocks WIP: Improve break mesh by blocks Mar 24, 2020
lindsayad added a commit to lindsayad/libmesh that referenced this pull request Mar 25, 2020
This fixes currently failing tests on idaholab/moose#14456. When
writing to exodus we serialize the mesh, and this traditionally
has always called to `UnstructuredMesh::find_neighbors`. However,
we may not want to run `find_neighbors` per idaholab/moose#14456.
lindsayad added a commit to lindsayad/libmesh that referenced this pull request Mar 25, 2020
This fixes currently failing tests on idaholab/moose#14456. When
writing to exodus we serialize the mesh, and this traditionally
has always called to `UnstructuredMesh::find_neighbors`. However,
we may not want to run `find_neighbors` per idaholab/moose#14456.
WilkAndy pushed a commit to WilkAndy/moose that referenced this pull request Apr 19, 2020
This makes my test work in replicated mode in dbg mode in parallel
but not in distributed mode. I think I'm going to have to reivive
PR idaholab#14456 for that
@stale
Copy link

stale bot commented Apr 25, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale PRs that have reached or exceeded 90 days with no activity label Apr 25, 2020
@lindsayad
Copy link
Member

lindsayad commented Apr 25, 2020 via email

@stale stale bot removed the stale PRs that have reached or exceeded 90 days with no activity label Apr 25, 2020
@stale
Copy link

stale bot commented May 25, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale PRs that have reached or exceeded 90 days with no activity label May 25, 2020
@lindsayad
Copy link
Member

lindsayad commented May 25, 2020 via email

@stale stale bot removed the stale PRs that have reached or exceeded 90 days with no activity label May 25, 2020
@stale
Copy link

stale bot commented Jun 26, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale PRs that have reached or exceeded 90 days with no activity label Jun 26, 2020
@lindsayad
Copy link
Member

lindsayad commented Jun 26, 2020 via email

@stale stale bot removed the stale PRs that have reached or exceeded 90 days with no activity label Jun 26, 2020
@stale
Copy link

stale bot commented Jul 26, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale PRs that have reached or exceeded 90 days with no activity label Jul 26, 2020
@stale
Copy link

stale bot commented Jul 27, 2020

Closing due to 30 days of inactivity. See http://mooseframework.org/moose/framework_development/patch_to_code.html

@stale stale bot closed this Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale PRs that have reached or exceeded 90 days with no activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants