Blockers
- RPC query response times due to channel state bloat
- neither
hermes
norly
seem to be able to deal with the channel state bloat and successfully close channels.
Action Items
- discuss results with
hermes
team - feature request for
hermes
to batch channel handshakes - possible feature request for
hermes
: only query a batch of channels (e.g. 100), try to complete handshakes, then query the next batch - 2nd testrun: New testnet with even more nodes &
rly
instances
2024/04/25 14:00:00 UTC
: Public testing begins on xion.nomos.ms2024/04/25 15:30:00 UTC
: CC Resolves gas limit config issue on relayers2024/04/25 16:00:00 UTC
: Increased number of relayer instances to 4 (total)
Status | Value |
---|---|
ICA-INIT | 1500 |
ICA-OPEN | ? |
ICA-CLOSED | ? |
2024/04/25 20:00:00 UTC
: Increased number of relayer instances to 10 (total) and RPC instances to 2 (each side)
Status | Value |
---|---|
ICA-INIT | 3800 |
ICA-OPEN | 700 |
ICA-CLOSED | ? |
2024/04/26 11:30:00 UTC
: Note: At this point 10 instances of hermes seemed to be able to keep up with the channel handshakes
Status | Value |
---|---|
ICA-INIT | 208 |
ICA-OPEN | 5487 |
ICA-CLOSED | 221 |
2024/04/26 14:30:00 UTC
: Traffic surge
Status | Value |
---|---|
ICA-INIT | 3241 |
ICA-OPEN | 6141 |
ICA-CLOSED | 224 |
FE is reporting 52,000 unique visitors in the past 24h
2024/04/26 17:30:00 UTC
: Relayers get slower and slower due to growing channel state, can't keep up with traffic. Starting tests withgo-rly
, Runninghermes
profiling tests
Status | Value |
---|---|
ICA-INIT | 7809 |
ICA-OPEN | 7329 |
ICA-CLOSED | 256 |
2024/04/27 15:00:00 UTC
: It is at this point where the relayers really start to struggle with the state bloat. Pending channel handshakes aren't completed anymore, but some new channels in INIT state are still completed. Scaledgo-rly
to 2. Due to the massive amount of channel state hermes crashes upon startup with channel_workers enabled.
Status | Value |
---|---|
ICA-INIT | 30971 |
ICA-OPEN | 13951 |
ICA-CLOSED | 546 |
2024/05/2 18:00:00 UTC
: Tests confinue throughout the weekend and the beginning of May
Status | Value |
---|---|
ICA-INIT | 205662 |
ICA-OPEN | 17484 |
ICA-CLOSED | 972 |
At this point the FE is closed and public tests are stopped to give relayers the chance to deal with the channel handshake backlog.
- Number of instances
hermes_channel_worker
: 1 / 2 / 8 / 8 - Number of instances
hermes_packet_worker
: 1 / 2 / 2 / 2 - Number of instances
rly
: 0 / 0 / 1 / 2 - Number of instances
xiontestnet_node
: 1 / 1 / 2 / 2 - Number of instances
injectivetestnet_node
: 1 / 1 / 2 / 2
hermes
isn't optimized for channel handshakes (no batching)rly
has excessive querying / isn't performance optimized (probably can get a bit more RPC stableness out of config but RPC is generally less stable)- injectivetestnet is comparably fragile, nodes often falling behind with 1-2 relayer instances
- hitting global file limits, OOM, RPC/gRPC response lenght limit, RPC/gRPC response timeout, block gas limit
- DOSed injectivetestnet
rly
successfully batches channel handshakes: https://testnet.explorer.injective.network/transaction/5F3837871E5B18F58F61BDACCD3228BBEB3CF9806072785C7A0C6073916BB440/hermes
doesn't scan channels on startup when channel workers aren't active
-
profiled samples of
hermes
instances withpacket_workers
andchannel_workers
enabled -
profiles table: hermes_profile.txt
hermes
memory overflow:
2024-05-01T13:15:50.268991Z INFO ThreadId(01) spawn:chain{chain=injective-888}:client{client=07-tendermint-239}:connection{connection=connection-220}: done spawning channel workers chain=injective-888 channel=channel-46988
2024-05-01T13:15:50.269002Z INFO ThreadId(01) spawn:chain{chain=injective-888}:client{client=07-tendermint-239}:connection{connection=connection-220}:channel{channel=channel-46989}: channel is TRYOPEN, state on destination chain is INIT chain=injective-888 counterparty_chain=xion-testnet-1 channel=channel-46989
thread '<unnamed>' panicked at 'failed to set up alternative stack guard page: Cannot allocate memory (os error 12)', library/std/src/sys/unix/stack_overflow.rs:147:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
thread '<unnamed>' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
stack backtrace:
0: 0x55aeda790d41 - <unknown>
1: 0x55aeda7be5ff - <unknown>
2: 0x55aeda78c6b7 - <unknown>
3: 0x55aeda790b55 - <unknown>
4: 0x55aeda7924a3 - <unknown>
5: 0x55aeda792234 - <unknown>
6: 0x55aeda792a29 - <unknown>
7: 0x55aeda7928e1 - <unknown>
8: 0x55aeda7911a6 - <unknown>
9: 0x55aeda792672 - <unknown>
10: 0x55aed8f27ad3 - <unknown>
11: 0x55aed8f27b77 - <unknown>
12: 0x55aed8f27c13 - <unknown>
13: 0x55aeda795952 - <unknown>
14: 0x7f8d02b6cac3 - <unknown>
15: 0x7f8d02bfe850 - <unknown>
16: 0x0 - <unknown>
thread caused non-unwinding panic. aborting.
hermes-burnt-wildcard.service: Main process exited, code=killed, status=6/ABRT
hermes-burnt-wildcard.service: Failed with result 'signal'.
hermes-burnt-wildcard.service: Consumed 3min 8.339s CPU time.
hermes-burnt-wildcard.service: Scheduled restart job, restart counter is at 7.
hermes
dies with101/n/a
:
hermes-channel.service: Main process exited, code=exited, status=101/n/a
hermes-channel.service: Failed with result 'exit-code'.
hermes-channel.service: Consumed 8.870s CPU time.
hermes
crashes on startup during channel scans (log level: error)
May 07 20:11:20 TESTNET-RELAYER systemd[1]: Started hermes burnt.
░░
░░ The job identifier is 10678982.
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.793783Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-100}:connection{connection=connection-41}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-41
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.793965Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-101}:connection{connection=connection-42}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-42
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.799902Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-105}:connection{connection=connection-43}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-43
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.803176Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-106}:connection{connection=connection-44}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-44
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.803405Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-107}:connection{connection=connection-45}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-45
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.806158Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-113}:connection{connection=connection-49}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-49
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.807582Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-114}:connection{connection=connection-50}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-50
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.807717Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-116}:connection{connection=connection-52}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-52
May 07 22:08:05 TESTNET-RELAYER hermes[1646718]: 2024-05-07T22:08:05.807913Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-119}:connection{connection=connection-55}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-55
May 07 22:08:06 TESTNET-RELAYER hermes[1646718]: thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 07 22:08:06 TESTNET-RELAYER hermes[1646718]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 07 22:08:08 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=exited, status=101/n/a
░░ Subject: Unit process exited
░░
░░ The job identifier is 11254207.
May 07 22:08:11 TESTNET-RELAYER hermes[2126613]: 2024-05-07T22:08:11.671947Z ERROR ThreadId(01) health_check{chain=injective-888}: skipping health check, reason: failed to spawn chain runtime with error: relayer error: RPC error to endpoint http://127.0.0.1:2221/: HTTP error: error sending request for url (http://127.0.0.1:2221/): error trying to connect: tcp connect error: Connection refused (os error 111)
May 07 22:09:02 TESTNET-RELAYER hermes[2126613]: 2024-05-07T22:09:02.226473Z ERROR ThreadId(01) scan.chain{chain=xion-testnet-1}:scan.client{client=07-tendermint-72}:scan.connection{connection=connection-27}: failed to fetch connection channels: query: gRPC call `query_connection_channels` failed with status: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
May 07 22:10:51 TESTNET-RELAYER hermes[2126613]: 2024-05-07T22:10:51.547247Z ERROR ThreadId(01) spawn: failed to spawn worker for a chain, reason: query: error in underlying transport when making gRPC call: transport error
May 07 22:10:52 TESTNET-RELAYER hermes[2126613]: thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 07 22:10:52 TESTNET-RELAYER hermes[2126613]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 07 22:10:53 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=exited, status=101/n/a
░░ Subject: Unit process exited
░░
░░ The job identifier is 11268172.
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.927036Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-100}:connection{connection=connection-41}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-41
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.927201Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-101}:connection{connection=connection-42}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-42
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.943092Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-105}:connection{connection=connection-43}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-43
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.946426Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-106}:connection{connection=connection-44}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-44
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.946730Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-107}:connection{connection=connection-45}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-45
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.949462Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-113}:connection{connection=connection-49}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-49
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.950996Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-114}:connection{connection=connection-50}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-50
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.951153Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-116}:connection{connection=connection-52}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-52
May 08 00:08:04 TESTNET-RELAYER hermes[2153804]: 2024-05-08T00:08:04.951331Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-119}:connection{connection=connection-55}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-55
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: thread '<unnamed>' panicked at 'failed to set up alternative stack guard page: Cannot allocate memory (os error 12)', library/std/src/sys/unix/stack_overflow.rs:147:13
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: thread '<unnamed>' panicked at 'failed to allocate an alternative stack: Cannot allocate memory (os error 12)', library/std/src/sys/unix/stack_overflow.rsthread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: :143:13
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: thread '<unnamed>' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: stack backtrace:
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: thread '<unnamed>' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 0: 0x556ef26e6131 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 1: 0x556ef27139ef - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 2: 0x556ef26e1aa7 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 3: 0x556ef26e5f45 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 4: 0x556ef26e7893 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 5: 0x556ef26e7624 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 6: 0x556ef26e7e19 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 7: 0x556ef26e7cd1 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 8: 0x556ef26e6596 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 9: 0x556ef26e7a62 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 10: 0x556ef0e2e213 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 11: 0x556ef0e2e2b7 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 12: 0x556ef0e2e353 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 13: 0x556ef26ead42 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 14: 0x7ff112c71ac3 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 15: 0x7ff112d03850 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: 16: 0x0 - <unknown>
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: thread caused non-unwinding panic. aborting.
May 08 00:08:05 TESTNET-RELAYER hermes[2153804]: stack backtrace:
May 08 00:08:06 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=killed, status=6/ABRT
░░
░░ The job identifier is 11845662.
May 08 00:08:09 TESTNET-RELAYER hermes[2635854]: 2024-05-08T00:08:09.175059Z ERROR ThreadId(01) health_check{chain=injective-888}: skipping health check, reason: failed to spawn chain runtime with error: relayer error: RPC error to endpoint http://127.0.0.1:2221/: HTTP error: error sending request for url (http://127.0.0.1:2221/): error trying to connect: tcp connect error: Connection refused (os error 111)
May 08 00:09:01 TESTNET-RELAYER hermes[2635854]: 2024-05-08T00:09:01.586450Z ERROR ThreadId(01) scan.chain{chain=xion-testnet-1}:scan.client{client=07-tendermint-72}:scan.connection{connection=connection-27}: failed to fetch connection channels: query: gRPC call `query_connection_channels` failed with status: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
May 08 00:13:21 TESTNET-RELAYER hermes[2635854]: 2024-05-08T00:13:21.422483Z ERROR ThreadId(01) spawn: failed to spawn worker for a chain, reason: query: error in underlying transport when making gRPC call: transport error
May 08 00:13:22 TESTNET-RELAYER hermes[2635854]: thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 08 00:13:22 TESTNET-RELAYER hermes[2635854]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 00:13:23 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=exited, status=101/n/a
░░ Subject: Unit process exited
░░
░░ The job identifier is 11872396.
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.199591Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-100}:connection{connection=connection-41}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-41
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.199839Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-101}:connection{connection=connection-42}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-42
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.209681Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-105}:connection{connection=connection-43}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-43
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.213224Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-106}:connection{connection=connection-44}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-44
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.213453Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-107}:connection{connection=connection-45}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-45
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.216718Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-113}:connection{connection=connection-49}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-49
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.218441Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-114}:connection{connection=connection-50}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-50
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.218563Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-116}:connection{connection=connection-52}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-52
May 08 02:08:04 TESTNET-RELAYER hermes[2673295]: 2024-05-08T02:08:04.218714Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-119}:connection{connection=connection-55}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-55
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: thread '<unnamed>' panicked at 'failed to set up alternative stack guard page: Cannot allocate memory (os error 12)', library/std/src/sys/unix/stack_overflow.rs:147:13
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: thread '<unnamed>thread '' panicked at 'failed to allocate an alternative stack: Cannot allocate memory (os error 12)main', ' panicked at 'library/std/src/sys/unix/stack_overflow.rsfailed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }:', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs143::686:1329
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: thread '<unnamed>' panicked at 'thread 'panic in a function that cannot unwind<unnamed>', ' panicked at 'library/core/src/panicking.rspanic in a function that cannot unwind:', 126library/core/src/panicking.rs::5126
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: :5
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: stack backtrace:
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 0: 0x55fa90e37131 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 1: 0x55fa90e649ef - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 2: 0x55fa90e32aa7 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 3: 0x55fa90e36f45 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 4: 0x55fa90e38893 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 5: 0x55fa90e38624 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 6: 0x55fa90e38e19 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 7: 0x55fa90e38cd1 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 8: 0x55fa90e37596 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 9: 0x55fa90e38a62 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 10: 0x55fa8f57f213 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 11: 0x55fa8f57f2b7 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 12: 0x55fa8f57f353 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 13: 0x55fa90e3bd42 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 14: 0x7f9b5fc2eac3 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 15: 0x7f9b5fcc0850 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: 16: 0x0 - <unknown>
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: thread caused non-unwinding panic. aborting.
May 08 02:08:05 TESTNET-RELAYER hermes[2673295]: stack backtrace:
May 08 02:08:05 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=killed, status=6/ABRT
░░ Subject: Unit process exited
░░
░░ The job identifier is 12437114.
May 08 02:08:08 TESTNET-RELAYER hermes[3145738]: 2024-05-08T02:08:08.427221Z ERROR ThreadId(01) health_check{chain=injective-888}: skipping health check, reason: failed to spawn chain runtime with error: relayer error: RPC error to endpoint http://127.0.0.1:2221/: HTTP error: error sending request for url (http://127.0.0.1:2221/): error trying to connect: tcp connect error: Connection refused (os error 111)
May 08 02:09:02 TESTNET-RELAYER hermes[3145738]: 2024-05-08T02:09:02.069381Z ERROR ThreadId(01) scan.chain{chain=xion-testnet-1}:scan.client{client=07-tendermint-72}:scan.connection{connection=connection-27}: failed to fetch connection channels: query: gRPC call `query_connection_channels` failed with status: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
May 08 02:11:06 TESTNET-RELAYER hermes[3145738]: 2024-05-08T02:11:06.276719Z ERROR ThreadId(01) spawn: failed to spawn worker for a chain, reason: query: error in underlying transport when making gRPC call: transport error
May 08 02:11:07 TESTNET-RELAYER hermes[3145738]: thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 08 02:11:07 TESTNET-RELAYER hermes[3145738]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 02:11:08 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=exited, status=101/n/a
░░ Subject: Unit process exited
░░
░░ The job identifier is 12452542.
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.183755Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-100}:connection{connection=connection-41}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-41
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.183909Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-101}:connection{connection=connection-42}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-42
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.198700Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-105}:connection{connection=connection-43}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-43
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.202262Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-106}:connection{connection=connection-44}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-44
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.202532Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-107}:connection{connection=connection-45}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-45
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.205330Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-113}:connection{connection=connection-49}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-49
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.206819Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-114}:connection{connection=connection-50}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-50
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.206981Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-116}:connection{connection=connection-52}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-52
May 08 04:08:06 TESTNET-RELAYER hermes[3174315]: 2024-05-08T04:08:06.207166Z ERROR ThreadId(01) spawn:chain{chain=xion-testnet-1}:client{client=07-tendermint-119}:connection{connection=connection-55}: skipped connection workers, reason: relayer error: error in underlying transport when making gRPC call: transport error chain=xion-testnet-1 connection=connection-55
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: thread '<unnamed>' panicked at 'failed to set up alternative stack guard page: Cannot allocate memory (os error 12)', library/std/src/sys/unix/stack_overflow.rs:147:13
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: thread '<unnamed>' panicked at 'failed to allocate an alternative stack: Cannot allocate memory (os error 12)thread '', mainlibrary/std/src/sys/unix/stack_overflow.rs' panicked at ':failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }143', :/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs13:
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 686:29
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: thread '<unnamed>' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: thread 'stack backtrace:
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: <unnamed>' panicked at 'panic in a function that cannot unwind', library/core/src/panicking.rs:126:5
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 0: 0x55c442450131 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 1: 0x55c44247d9ef - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 2: 0x55c44244baa7 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 3: 0x55c44244ff45 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 4: 0x55c442451893 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 5: 0x55c442451624 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 6: 0x55c442451e19 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 7: 0x55c442451cd1 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 8: 0x55c442450596 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 9: 0x55c442451a62 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 10: 0x55c440b98213 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 11: 0x55c440b982b7 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 12: 0x55c440b98353 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 13: 0x55c442454d42 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 14: 0x7fb119621ac3 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 15: 0x7fb1196b3850 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: 16: 0x0 - <unknown>
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: thread caused non-unwinding panic. aborting.
May 08 04:08:07 TESTNET-RELAYER hermes[3174315]: stack backtrace:
May 08 04:08:07 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=killed, status=6/ABRT
░░ Subject: Unit process exited
░░
░░ The job identifier is 13028565.
May 08 04:08:10 TESTNET-RELAYER hermes[3655765]: 2024-05-08T04:08:10.421674Z ERROR ThreadId(01) health_check{chain=injective-888}: skipping health check, reason: failed to spawn chain runtime with error: relayer error: RPC error to endpoint http://127.0.0.1:2221/: HTTP error: error sending request for url (http://127.0.0.1:2221/): error trying to connect: tcp connect error: Connection refused (os error 111)
May 08 04:09:01 TESTNET-RELAYER hermes[3655765]: 2024-05-08T04:09:01.655552Z ERROR ThreadId(01) scan.chain{chain=xion-testnet-1}:scan.client{client=07-tendermint-72}:scan.connection{connection=connection-27}: failed to fetch connection channels: query: gRPC call `query_connection_channels` failed with status: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
May 08 04:11:10 TESTNET-RELAYER hermes[3655765]: 2024-05-08T04:11:10.630034Z ERROR ThreadId(01) spawn: failed to spawn worker for a chain, reason: query: error in underlying transport when making gRPC call: transport error
May 08 04:11:11 TESTNET-RELAYER hermes[3655765]: thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/mod.rs:686:29
May 08 04:11:11 TESTNET-RELAYER hermes[3655765]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 08 04:11:12 TESTNET-RELAYER systemd[1]: hermes-burnt-wildcard.service: Main process exited, code=exited, status=101/n/a
░░ Subject: Unit process exited
░░
rly
's query pressure is generally so high that the injective node immediately starts to fall back
rpc error: code = Unknown desc = failed to execute message; message index: 0: failed to verify header: invalid header: new header has a time from the future 2024-05-01 12:31:05.687617525 +0000 UTC (now: 2024-05-01 12:29:52.508240025 +0000 UTC; max clock drift: 40s)
rly
file limit overflow:
Flush not complete {"error": "failed to query packet commitments: post failed: Post \"http://127.0.0.1:2131\": dial tcp 127.0.0.1:2131: socket: too many open files"}
xiond
gRPC response length overflow response inhermes
:
2024-05-01T14:28:22.725182Z ERROR ThreadId(01) scan.chain{chain=xion-testnet-1}:scan.client{client=07-tendermint-119}:scan.connection{connection=connection-55}: failed to fetch connection channels: query: gRPC call `query_connection_channels` failed with status: status: OutOfRange, message: "Error, message length too large: found 59542869 bytes, the limit is: 33554432 bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "7662623"} }
^ bumped gRPC limits in app.toml
:
# MaxRecvMsgSize defines the max message size in bytes the server can receive.
# The default value is 10MB.
# default value
# max-recv-msg-size = "10485760"
# increased x10 for xiontestnet
max-recv-msg-size = "104857600"
# MaxSendMsgSize defines the max message size in bytes the server can send.
# The default value is math.MaxInt32.
# default value
# max-send-msg-size = "2147483647"
# increased x10 for xiontestnet
max-send-msg-size = "21474836470"
^ set max_grpc_decoding_size
in hermes config.toml
:
max_grpc_decoding_size = 648251000